OMS Alerting and Remediation

A few weeks ago, Microsoft released the OMS Alerting feature in preview, which includes some really useful features like alert notifications and remediation.  We are now able to set up alerts for any saved search query we create in OMS, which gives us the capability to alert on all of our solutions (Security and Audit, Alert Management, custom logs, performance data, etc.).  Additionally, we are given the capability to select Azure Automation runbooks during OMS Alert configuration to remediate our alerts both on-premises and in the cloud.  Very cool!

In my last post, SCOM + OMS + Azure Automation here, I discussed a custom solution using custom fields, the OMS Search API and Azure Automation to automate remediation tasks.  The concepts in this post certainly still apply, but now we have a built in feature which uses webhooks and does not require the OMS Search API logic to be included in our alert remediation runbooks.  Let’s check it out…..

Overview:

Tao Yang has an excellent deep dive post here where he explains the OMS Alerting and remediation components in great detail, but I want to highlight another example where we need to create custom fields in order to execute our remediation runbooks.

In this post, I am configuring a solution which 1) utilizes the new OMS Alerting feature to send an email notification when a SCOM service monitor alert is triggered from my on-premises management group and 2) remediate the issue using Azure Automation and the OMS Alerting alert remediation feature.

High level configuration process:

  1. Use custom fields to extract the server name and service name values from OMS search query output to meet remediation runbook parameter requirements.
  2. Create an Azure Automation remediation runbook.
  3. Create an OMS alert and link the runbook to automate alert remediation.

The first step is to configure the alert parameters for the remediation  runbook.  In this example, the default OMS search result fields do not provide the values required to execute the remediation runbook, so I need to manually create custom fields that provide these values.

 

OMS Alerting Parameters:

It is important to understand how to identify the search result fields made available by the OMS search query being used to configure our OMS alert.  In most cases, these fields will be used as parameters in the remediation runbook code.  In this example, we need to identify the computer name and service name specified in each of instance of SCOM service monitor alert.  To do this, we need to 1) identify whether the default fields provided by our search result supply these values, and 2) configure custom fields for any required values not provided by default.

  1. Identify fields which can be used as parameters in the runbook code from the OMS search query output:
    1. Navigate to the OMS workspace.
    2. Select the Log Search blade.
    3. Enter your search query in the search field.
    4. Once the search result(s) populate, expand the results to expose all available fields.  In most cases, there will be a field available which will isolate a value we can use to target our runbook tasks.  In this example, none of the default search result fields isolate the computer name or service name, both of which are needed to execute the remediation task to start the failed service.   Blah
  2. Configure OMS Custom Fields:                                                                    To work around not having the necessary OMS search result fields to execute our remediation tasks, we can use the OMS Custom Fields feature to create custom fields based on extracted values from the existing search result data.  For the purpose of this demo, we will need custom fields to identify the service name and server name where the service has stopped.  I won’t dive too deep into the custom field configuration process here, but I have created a step by step guide explaining how to configure OMS Custom Fields in a previous post here.   BLAH2.pngNotice in the screenshot above that I’ve created 2 new custom fields named SourceServer_CF and ServiceName_CF which did not exist in the original search result parameters.  These new custom fields allow me to isolate the computer and service names needed to execute the remediation runbook for this alert. In the screenshot below I’ve highlighted how these new custom fields are being utilized in the code.   blah3

 

Remediation Runbook Configuration:

  1. Navigate to your OMS workspace.
  2. Select the Automation blade.
  3. Select the Runbooks blade.  This will redirect us to the Azure Automation portal where we will be configuring the remediation runbook.
  4. On the Runbooks page, select “Add a runbook”.RemediationRunbook
  5. After selecting create, you will be redirected to the newly created runbook page.  Select “Edit” to configure the runbook code.Runbook3
  6. The screenshot below contains the remediation runbook code with high level explanations.  I have posted a link to the sample code below if you would like to dig deeper and test the solution yourself.   NewCode2
  7. Let’s go ahead and save and publish the runbook.blah4 NOTE: There are a few items in the runbook code which require further explanation, in particular the WebHookData parameter and the Input processing sections.  I will dig much deeper into these items below.

 

OMS Alert Configuration:

Now that the remediation runbook has been published, we can move on to the OMS Alert configuration.

  1. Navigate to your OMS workspace
  2. On the Overview page, select Log Search
  3. Enter your search query in the search field for the exact output you would like to alert on. In this case, I am alerting when the ASP NET State service stops.  This can be anything.                                         Sample Query: Type:Alert AlertSeverity:Error AlertState!=Closed AlertName=”ENTER ALERT NAME HERE”
  4. If you have not already done so, configure your custom fields for the computer and service names (see OMS Alerting Parameters above).
  5. At the bottom of the page, select the Alert icon.Alert Info 2
  6. Fill in the Alert fields and schedule information.
    1. Enter the name of the alert.
    2. Using the Saved Search dropdown, you can either choose the current search query or a previously saved query.
    3. Enter the schedule interval.  During preview, the only option is 15 minutes.
    4. Fill out the “Generate alert when” section to configure the search window.  This is the search time window when OMS checks for the alert.
    5. Enable email notification and enter the email and subject information.
    6. Enable remediation and choose your runbook.                                    NOTE:  While in preview, you cannot edit an alert after it is created.  This will certainly change after general availability, but until that time you will need to create and publish your Azure Automation runbook first so that it populates in the “Select a runook” dropdown.   Alert Info 1NOTE: When you create an OMS alert and enable remediation, a webhook (learn more here) is created.  This webhook is used to start the runbook and pass in the OMS Alert search results using an input parameter called WEBHOOKDATA. The webhook is created once you create the OMS Alert, and the WEBHOOKDATA input parameter is created once the runbook has been triggered by the alert.

 

Configure the Remediation Runbook to run on the Hybrid Worker:

Now that the OMS Alert has been configured, we can access the webhook to change the run settings so that the runbook will execute against our on-premises Hybrid Runbook Worker.  This step cannot be configured until the OMS Alert has been configured as the webhook is created during the OMS alert configuration process.  This step is mandatory to execute remediation runbooks against on-premises resources.

  1. Navigate to the Azure portal and select the runbook.
  2. Select the “Webhook” blade.
  3. Select “OMS Alert Remediation…”.
  4. Select “Parameters and Run Settings”.
  5. Change the “Run Settings” from Azure to Hybrid Worker and select the appropriate Hybrid Worker group.
  6. Select OK and save.   wbhook

 

Under The Hood

In order to use the custom field values covered in the OMS Alerting Parameters section above, we need to process the WEBHOOKDATA input parameter and convert the data from JSON to PowerShell.  Once this data is passed to the runbook and converted from JSON to PowerShell, we can utilize the SourceServer_CF and ServiceName_CF values to populate our remediation workflow parameters.  To better understand the code we are using for this process, it may help to take a closer look at the WEBHOOKDATA input parameter:

    1. Select the remediation runbook in the Azure portal.
    2. Select “Jobs”.  NOTE:  If the alert has not been triggered at least once, the “Jobs” data will be empty.
    3. Select the most recent job.   Jobs
    4. Select the “Input” blade.
    5. Select the “OMS Alert Remediation..” input and copy the WEBHOOKDATA output.   Jobs1
    6. To take a closer look at the data, let’s paste the WEBHOOKDATA output to Notepad++ and format it with JSON Viewer or another JSON formatting plugin.  JSON
    7. The section we are interested in is “RequestBody” as it contains all of our field values.  If you look closely at this line of output, you will notice that each field value from our search result, including the custom fields, are present in this data following the “value” entry.    JSON2
    8. To put this into context, let’s take a look at the code and show how the JSON data is processed and converted to use as our remediation workflow parameters. NewCode1

 

Now that you have a little more background on where the data is coming from, take a look at the full runbook code (download here) and the configuration process should start to make more sense.  If you have any questions, feel free to ping me and I’ll be happy to help out!

ALL CODE IS FOR TESTING ONLY AND PROVIDED AS IS WITHOUT WARRANTY.

 

 

 

 

 

Advertisements