I was recently working with a customer where I was engaged to assist with troubleshooting several errors that were occurring in the Operations Manager event log across a SCOM Management Group. Because there were multiple errors occurring across several agents, the customer was having a difficult time tracking down which errors were occurring and which server they were occurring on. It proved very difficult to parse alerts and log into individual servers and event logs to try and identify issues, much less correlate the issues across the environment. After taking some time to understand the challenge, it seemed to me like a great opportunity to use OMS!
OMS Log Search gives us the ability to 1) identify all error events occurring in the Operations Manager event log (or almost any other log) across all servers in the environment and 2) allows us to then show all computers where each error is occurring without ever logging into a server. Very cool!
In this case, OMS was already deployed, so we were set up perfectly. If you do not have OMS deployed, visit the “Get Started with Log Analytics” link here.
The first step was to enable collection of the Operations Manager event log on all servers in the environment.
Enable Event Log Collection:
- On the main OMS overview page, select Settings.
- Select Data on the Settings page ribbon.
- Select Windows Event Logs.
- In the “Collect events from the following event logs” search field, enter Operations Manager and select the + sign.
- Select save.
Verify Log Collection:
- Navigate to the Log Search page.
- Enter the following query in the log search bar: Type=Event EventLog=”Operations Manager”.
- If we have configured everything correctly, we should see results.
Utilize Log Search to Troubleshoot a SCOM Management Group (or any other application):
Now that we are collecting the Operations Manager event log, the next step is to utilize the Log Search capability to organize and correlate errors by Event ID and computer to identify which errors are occurring in the environment, and where they are occurring.
On the Log Search page, enter the Operations Manager Log Count by EventID query: Type=Event EventLog=”Operations Manager” EventLevelName=Error | measure count(EventID) by EventID
We now have all of the errors occurring in the Operations Manager event log neatly organized by Event ID and number of occurrences with one simple query!
Now that we have identified the errors occurring across our SCOM environment and organized them by Event ID, let’s identify which computers they are occurring on so we can remediate the issues. For this example, I will use Event ID 4506.
On the Log Search page, enter the Operations Manager Log EventID Count by Computer query: Type=Event EventLog=”Operations Manager” EventLevelName=Error EventID=4506 | measure count(EventID) by Computer
We can now see all computers where Event ID 4506 is occurring in our environment and how many times the error has occurred. We were able to use this simple event correlation process to identity all SCOM event errors in the customer’s environment without having to log into a single server or event log, which saved us a ton of time! Of course, this same process can be used for any other application where log or other data is being collected by OMS.
OMS log search is a powerful tool, and this is only one of many examples demonstrating how we can utilize OMS log search capabilities to query and correlate data from data sources across the environment in one central location. More to come!