SCOM – Custom Process % Processor Time Collection Rule

I am currently working on a performance management pack requirement which includes a request to collect Process % Processor Time for all processes on all servers in the environment at a 30 second interval.  First, this is a really bad idea from a performance perspective, as it would result in an  extremely large increase of data insertions to the OpsDB and DW which could severely impact performance (of course, this depends on I/O capability, etc.).  Second, why collect every process for each sample interval when we are only interested in processes that are consuming high CPU?

So…after much debate, we settled on a plan to test a collection rule (in the lab of course) that would be disabled by default and enabled ONLY for a specific group, with a collection interval of 2 minutes.  Servers which require deeper root cause analysis will be added to this group temporarily and removed once troubleshooting as been concluded.  Additionally,  I added filters in the script to only collect data when the Process % Processor Time utilization exceeds 20% and the process name is not “Idle” or “_Total”.

Ok…time for some scripting!

The first step was to figure out the logic.  In this particular scenario, the requirement was to collect Total Process % Processor Time, not utilization per CPU.  Most scripts I’ve seen collect the CPU utilization of each process, but do not add the logic to divide the utilization by the total amount of processors on the server.  For example, if I collect Process % Processor Time utilization without this logic, the output on a server running multiple processors could show 1 individual process running at 145% utilization.  This occurs because the process utilization is spread across multiple processors.  As you can see, although technically accurate and useful, this can get confusing from a monitoring and alerting perspective, especially when the requirement is for total utilization.  So…back to the script.

Originally, I created the script to simply grab each process, filter by the criteria listed above and then pass the output to the property bag.  However, after configuring and testing my initial script, I came across a post by Tao Yang (click here) which gave me the idea to pass the actual service name for each process to the property bag rather than just passing the process name.  Using this method, I can output the individual service(s) running under the generic service host (svchost.exe) instead of listing multiple svchost instances.  Nice!

VBScript: 

**SAMPLE ONLY – DO NOT USE IN PRODUCTION WITHOUT TESTING**

Option Explicit
Dim WshShell,WshSysEnv
Dim objWMIService,colService, objService, ComputerName
Dim ProcessID, ProcessName, SvcName, colProcess, objProcess, colPerfData, objPerfData,Procs,TotalProcs
Dim PercentProcessorTime
Dim oAPI, oBag, oInst

Set WshShell = WScript.CreateObject(“WScript.Shell”)
Set WshSysEnv = WshShell.Environment(“System”)
TotalProcs = WshSysEnv(“NUMBER_OF_PROCESSORS”)

Set oAPI = CreateObject(“MOM.ScriptAPI”)
Set objWMIService = GetObject(“winmgmts:\\.\root\cimv2”)
Set colService = objWMIService.ExecQuery(“Select * from Win32_Service”)
For Each objService in colService
If objService.ProcessID <> 0 Then
ProcessID = objService.ProcessID
SvcName = objService.Name
Set colProcess = objWMIService.ExecQuery(“Select * from Win32_Process Where ProcessID = ” & ProcessID)
For Each objProcess in colProcess
ProcessName = objProcess.Name
Set colPerfData = objWMIService.ExecQuery(“Select * from Win32_PerfFormattedData_PerfProc_Process Where IDProcess = ” & ProcessID)
For Each objPerfData in colPerfData
Procs=objPerfData.PercentProcessorTime/TotalProcs
If Procs > 20 AND ProcessName <> “_Total” AND ProcessName <> “Idle” Then
PercentProcessorTime = Procs
‘FOR TESTING:  Wscript.echo “Proc time = ” & SvcName & “,” & ProcessName & “,” & PercentProcessorTime
Set oBag = oAPI.CreatePropertyBag()
Call oBag.AddValue(“ProcessName”,SvcName)
Call oBag.AddValue(“Value”,PercentProcessorTime)
Call oAPI.AddItem(oBag)
End IF
Next
Next
End IF
Next
oAPI.ReturnItems()

Management Pack Code:  **SAMPLE ONLY**

I am running my script using the Microsoft.Windows.TimedScript.PropertyBagProvider as my data source in the DataSourceModule.  Notice that I have added the System.CommandExecuterSchema.  I added this code to utilize the EventPolicy functionality, which allows me to suppress warning events that trigger in the Operations Manager event log on an agent when the script exits with error code 0 without returning property bag data.  Due to the filtering in this script, there will be instances where the script will run without returning property bag data.  Because this is expected, I am using EventPolicy to suppress these warnings.  Additionally, I wrapped the script with the CDATA tag to ensure that operators in the script are processed correctly (i.e. <>).

DataSourceModuleType:

<DataSourceModuleType ID=”Performance.Processor.DataSource.ModuleType” Accessibility=”Internal” Batching=”false”>
<Configuration>
<IncludeSchemaTypes>
<SchemaType>System!System.CommandExecuterSchema</SchemaType>
</IncludeSchemaTypes>
<xsd:element minOccurs=”1″ name=”IntervalSeconds” type=”xsd:integer” />
<xsd:element minOccurs=”1″ name=”TimeoutSeconds” type=”xsd:integer” />
<xsd:element minOccurs=”0″ maxOccurs=”1″ name=”EventPolicy” type=”CommandExecuterEventPolicyType” />
</Configuration>
<OverrideableParameters>
<OverrideableParameter ID=”IntervalSeconds” Selector=”$Config/IntervalSeconds$” ParameterType=”int” />
<OverrideableParameter ID=”TimeoutSeconds” Selector=”$Config/TimeoutSeconds$” ParameterType=”int” />
</OverrideableParameters>
<ModuleImplementation Isolation=”Any”>
<Composite>
<MemberModules>
<DataSource ID=”DSTS” TypeID=”Windows!Microsoft.Windows.TimedScript.PropertyBagProvider”>
<IntervalSeconds>$Config/IntervalSeconds$</IntervalSeconds>
<SyncTime />
<ScriptName>ProcessPercentProcessor.vbs</ScriptName>
<Arguments />
<!–[CDATA[Option Explicit
Dim WshShell,WshSysEnv
Dim objWMIService,colService, objService, ComputerName
Dim ProcessID, ProcessName, SvcName, colProcess, objProcess, colPerfData, objPerfData,Procs,TotalProcs
Dim PercentProcessorTime
Dim oAPI, oBag, oInst
Set WshShell = WScript.CreateObject(“WScript.Shell”)
Set WshSysEnv = WshShell.Environment(“System”)
TotalProcs = WshSysEnv(“NUMBER_OF_PROCESSORS”)
Set oAPI = CreateObject(“MOM.ScriptAPI”)
Set objWMIService = GetObject(“winmgmts:\\.\root\cimv2”)
Set colService = objWMIService.ExecQuery(“Select * from Win32_Service”)
For Each objService in colService
If objService.ProcessID <> 0 Then
ProcessID = objService.ProcessID
SvcName = objService.Name
Set colProcess = objWMIService.ExecQuery(“Select * from Win32_Process Where ProcessID = ” & ProcessID)
For Each objProcess in colProcess
ProcessName = objProcess.Name
Set colPerfData = objWMIService.ExecQuery(“Select * from Win32_PerfFormattedData_PerfProc_Process Where IDProcess = ” & ProcessID)
For Each objPerfData in colPerfData
Procs=objPerfData.PercentProcessorTime/TotalProcs
If Procs > 20 AND ProcessName <> “_Total” AND ProcessName <> “Idle” Then
PercentProcessorTime = Procs
‘FOR TESTING SCRIPT
‘Wscript.echo “Proc time = ” & SvcName & “,” & ProcessName & “,” & PercentProcessorTime
Set oBag = oAPI.CreatePropertyBag()
Call oBag.AddValue(“ProcessName”,SvcName)
Call oBag.AddValue(“Value”,PercentProcessorTime)
Call oAPI.AddItem(oBag)
End IF
Next
Next
End IF
Next
oAPI.ReturnItems()]]></ScriptBody>
<TimeoutSeconds>$Config/TimeoutSeconds$</TimeoutSeconds>
<EventPolicy>
<StdOutMatches />
<StdErrMatches>\a+</StdErrMatches>
<ExitCodeMatches>[^0]+</ExitCodeMatches>
</EventPolicy>
</DataSource>
</MemberModules>
<Composition>
<Node ID=”DSTS”>
</Node>
</Composition>
</Composite>
</ModuleImplementation>
<OutputType>System!System.PropertyBagData</OutputType>
</DataSourceModuleType>

Rule:

<Rule ID=”Performance.Percent.ProcessorTime.Collection” Enabled=”false” Target=”Windows!Microsoft.Windows.Computer” ConfirmDelivery=”false” Remotable=”true” Priority=”Normal” DiscardLevel=”100″>
<Category>PerformanceCollection</Category>
<DataSources>
<DataSource ID=”DataSource” TypeID=”Performance.Processor.DataSource.ModuleType”>
<IntervalSeconds>120</IntervalSeconds>
<TimeoutSeconds>60</TimeoutSeconds>
</DataSource>
</DataSources>
<ConditionDetection ID=”System.Performance.DataGenericMapper” TypeID=”Perf!System.Performance.DataGenericMapper”>
<ObjectName>Process</ObjectName>
<CounterName>Percent Processor Time</CounterName>
<InstanceName>$Data/Property[@Name=’ProcessName’]$</InstanceName>
<Value>$Data/Property[@Name=’Value’]$</Value>
</ConditionDetection>
<WriteActions>
<WriteAction ID=”Microsoft.SystemCenter.CollectPerformanceData” TypeID=”SC!Microsoft.SystemCenter.CollectPerformanceData” />
<WriteAction ID=”Microsoft.SystemCenter.DataWarehouse.PublishPerformanceData” TypeID=”SCDW!Microsoft.SystemCenter.DataWarehouse.PublishPerformanceData” />
</WriteActions>
</Rule>

This posting is provided “AS IS” with no warranties.

Advertisements