Monitor and Recover Stopped Automatic Services with OMS – Part 1

I was working with a customer recently and one of the asks was to configure OMS to monitor for stopped automatic services on servers throughout the environment.  My first thought was that we could easily use the data collected by the Configuration Tracking solution and configure queries to alert when a service is stopped.  Unfortunately, although Configuration Tracking is a great solution, for this purpose it did not meet the requirements due to the 1 hour data collection interval.  We needed to be notified of the critical service stopping as close to real-time as possible.  Plan B was to utilize Event ID 7024 and custom fields as we were already collecting the Application log.  However, during my testing on Windows Server 2012 R2, the only event logged to the Application log when a service was stopped manually was Event ID 1.  Further, what if a service just doesn’t start after reboot?   Once again, there may be no events logged as technically there could be no error. 

SO…although technically both of the other options could work in certain scenarios, in this particular case we needed something a bit more granular.  Time for some fun with PowerShell, Azure Automation and the Data Collector API!

Continue reading

Schedule an Azure Automation Runbook Using Minutes

I was working with a customer recently and we realized that when using the “Schedule” functionality the most granular re-occurrence interval available is 1 hour.  In this particular case, we needed to check service status every 5 minutes and send the data to OMS to alert and trigger a remediation runbook, so 1 hour would not suffice.  I had recently spoken with a member of the product group around a custom OMS solution and specifically remembered him saying that his runbook was running every 5 minutes….so I was off to investigate!

After a few minutes of searching, I was able to find some good information here.  Although the post referenced the Classic portal, the same basic process still applies and worked in my scenario.  By using a Scheduler Collection we can schedule at a much more granular interval.  I’ve outlined the process in the new portal below.

Continue reading

New OMS Health State Threshold in View Designer

I was working with a customer today developing a custom solution with View Designer and noticed a new feature. We can now use thresholds for queries in the List section of View Designer blades which map to color indicators, which is very cool! The thresholds and colors are both editable, but the default colors are the standard red, yellow and green. This new feature makes identifying critical and warning values in OMS performance solutions much more simple, in addition to adding aesthetic value.

To access the Thresholds feature, simply click on your solution and select Edit.  Once in edit, click on the blade where you would like to add thresholds, check the “Enable Thresholds” box, and configure your thresholds.

viewdesinger

viewdesignerthresh

There is one limitation that I’ve seen so far, and that is the inability to use < in the threshold settings. For counters like % free, we need the ability to use < instead of the hard-coded > value.

Update:  Until < is available in threshold settings, Stanislav Zhelyazkov points out a simple workaround using reverse logic here.

Create OMS Computer Groups Based on Operating System and Versions

I was working with a customer today and I was asked the question: “Why aren’t computers automatically grouped by OS (Linux and Windows) in OMS?  Further, if computers are not  automatically grouped by OS, is there an easy way to create this grouping?  Like SCOM, OMS is very flexible so my canned answer to questions like these is always “we can most likely make it happen”.  That said, I had not looked at his particular scenario so I was anxious to dig in and discover the answers for myself.

After a bit of investigation, I was happy to see that the new Heartbeat data does include fields for OSType, OSMajorVersion, and OSMinorVersion.  Using these fields we can very easily create computer groups based on OS and OS versions.  Nice!

Continue reading

New SCOM Assessment Coming Soon in OMS!

I was doing some development work in OMS this morning and noticed that there is a new SCOM Assessment solution in the gallery marked as “coming soon”.  Nice!  There were already a ton of reasons for companies using SCOM to roll out OMS, and this is certainly another one.  I can’t count the number of SCOM environments that I’ve worked with that simply didn’t have the expertise and/or time to track down every issue with configuration, performance, administration, etc.  This solution appears to be a great step towards resolving those issues.

From the solution description:

“The recommendations are based on the knowledge and experiences gained by Microsoft engineers across thousands of customer engagements.”

Continue reading

Using OMS Log Search to Troubleshoot SCOM (or any other application)

I was recently working with a customer where I was engaged to assist with troubleshooting several errors that were occurring in the Operations Manager event log across a SCOM Management Group.  Because there were multiple errors occurring across several agents, the customer was having a difficult time tracking down which errors were occurring and which server they were occurring on.  It proved very difficult to parse alerts and log into individual servers and event logs to try and identify issues, much less correlate the issues across the environment.  After taking some time to understand the challenge, it seemed to me like a great opportunity to use OMS!

OMS Log Search gives us the ability to 1) identify all error events occurring in the Operations Manager event log (or almost any other log) across all servers in the environment and 2) allows us to then show all computers where each error is occurring without ever logging into a server.  Very cool!

Continue reading

SCOM – Create a Dashboard To Show Whether an Alert Is Generated By a Monitor or Rule

In a blog I posted last year called “What’s the difference between a rule and a monitor…the simple version“, I ended my post alluding to a follow up to show some options to identify whether an alert is being generated by a rule or a monitor. I have recently been reminded that I never followed through on my next post, so here it is!  This is nothing new and has been blogged about in the past, but I still find quite a few engineers that are not aware of this capability so hopefully this will help some folks out.

Continue reading