SCOM – CSV Log File Monitor – Consecutive Samples

I recently worked with a customer who’s requirement was to monitor a CSV log and  an alert when a specific log entry repeated 10 times in a row.  This particular application consisted of two sites, which register log entries every minute or so on a pretty consistent basis.  When an issue is occurring, one of the sites no longer registers log entries, resulting in one site repeating until the issue is resolved.

To me, this seemed like a pretty straight forward solution, and after looking at the OOB log file monitor templates and seeing the “Repeated Event Detection” option under the CSV log file template…done deal!

LogMonitor

Well, it turns out that the name is quite deceiving.  The “Repeated Event Detection” monitor alerts when x number of the specified log entries occur in y number of minutes, regardless of what other log entries occur during this interval.  In my case, if the log entry for site 1 (Cali) occurs 12 times in 15 minutes, a monitor using this template would alert even if entries from site 2 (DC) occurred during the same interval.  To me…this is NOT a repeat event detection, and was of no use to me for this requirement.

My solution:

To work around this issue, I decided to create my own PowerShell based monitor from scratch.  First, let me break down the script…

The first part of the script uses the get-date cmdlet to ensure the log file being parsed is the current month CSV file.  I Then uses this data to import the CSV file.  In this scenario, I am parsing the last 20 log entries as when an issue occurs it will repeat consistently and I only need 10 entries to trigger the alert.

$Date=Get-Date -Format yyyyMM

#Import the csv log file
$File = Import-csv C:\Path\Delivery-$Date.csv  | Select site,’log date’ -Last 20

Once I successfully import the CSV log file, it’s time for the core logic.. 

High level, I am using a for each loop to parse the last 20 entries of the log file for repeat entries of the “Cali” site.  In this scenario, when the application is broken, the DC site will stop logging entries, which results in repeating entries from the Cali site.  To accomplish this logic in the code, I utilize $i variable to measure occurrences of these log entries.  For each entry, if the site property name is “Cali”, the $i variable increments by 1.  If the site property name is “DC”, the $i variable resets to 1.  At the end of the loop, the $i variable value is passed to the “RepeatCount” variable via property bag, which is then compared against the thresholds defined in the monitor type expression filter in the management pack code (see below).

#Set value for integer variable.  This variable will increment when the Dallas entry repeats consecutively
$i=1

$api = New-Object -ComObject ‘MOM.ScriptAPI’

#For each loop to either increment the $i variable if the Cali entry repeats or set the variable to 1 if a DC entry occurs
ForEach ($Entry in $File)
 {

If ($Entry.site -like “Cali*”
 {
 $i++
 $time=” `
 Last Log Entry Date: ” + $entry.’Log Date’ `
 + “”
        $site=$Entry.site
 }

 #Reset the variable value to 1 if a DC entry occurs
 If ($Entry.site -like “*DC*”)
  {
  $i=1
  clear-variable site
  $site=$Entry.site
  }
 }
$bag=$api.CreatePropertyBag()
$bag.addvalue(‘RepeatCount’,$i)
$bag.addvalue(‘Site Name’,$site)
$bag.addvalue(‘Time’,$time)

$bag

Expression Filter logic:

I set the value expressions in the Expression Filter to “LessEqual” to 1 in the “UnderErrorFilter” Condition Detection and “Greater” than 10 in the “OverErrorFilter” Condition Detection.  These values will be used to measure healthy and critical health states of the monitor.  Using the logic in the script above, if the “Cali” site log entry repeats over 10 times, the monitor will enter a critical state.  When an entry from the “DC” site is logged, the monitor will return to a healthy state.  For my requirement, this logic works great!  If your requirement is a bit different, this script can easily be tweaked to meet most needs.

<ConditionDetection ID=”UnderErrorFilter” TypeID=”System!System.ExpressionFilter”>
<Expression>
<SimpleExpression>
<ValueExpression>
<XPathQuery Type=”Integer”>Property[@Name=’RepeatCount’]
</ValueExpression>
<Operator>LessEqual</Operator>
<ValueExpression>
<Value Type=”Integer”>1</Value>
</ValueExpression>
</SimpleExpression>
</Expression>
<!–ConditionDetection>
<ConditionDetection ID=”OverErrorFilter” TypeID=”System!System.ExpressionFilter”>
<Expression>
<SimpleExpression>
<ValueExpression>
<XPathQuery Type=”Integer”>Property[@Name=’RepeatCount’]
</ValueExpression>
<Operator>Greater</Operator>
<ValueExpression>
<Value Type=”Integer”>10</Value>
</ValueExpression>
</SimpleExpression>
</Expression>
<!–ConditionDetection>

Full management pack code:

**SAMPLE ONLY – USE AT YOUR OWN RISK**

<!–?xml version=”1.0″ encoding=”utf-8″?>
<Manifest>
<Identity>
<ID>Test.Log</ID>
<Version>1.0.0.1</Version>
</Identity>
<Name>Test Log</Name>
<References>
<Reference Alias=”Windows”>
<ID>Microsoft.Windows.Library</ID>
<Version>6.1.7221.0</Version>
<PublicKeyToken>31bf3856ad364e35</PublicKeyToken>
</Reference>
<Reference Alias=”System”>
<ID>System.Library</ID>
<Version>6.1.7221.0</Version>
<PublicKeyToken>31bf3856ad364e35</PublicKeyToken>
</Reference>
<Reference Alias=”Health”>
<ID>System.Health.Library</ID>
<Version>6.1.7221.0</Version>
<PublicKeyToken>31bf3856ad364e35</PublicKeyToken>
</Reference>
</References>
</Manifest>
<TypeDefinitions>
<ModuleTypes>
<DataSourceModuleType ID=”Test.Log.DataSourceModuleType” Accessibility=”Internal” Batching=”false”>
<Configuration>
<xsd:element type=”xsd:integer” name=”TimeoutSeconds” xmlns:xsd=”http://www.w3.org/2001/XMLSchema&#8221; />
<xsd:element type=”xsd:integer” name=”IntervalSeconds” xmlns:xsd=”http://www.w3.org/2001/XMLSchema&#8221; />
</Configuration>
<OverrideableParameters>
<OverrideableParameter ID=”TimeoutSeconds” Selector=”$Config/TimeoutSeconds$” ParameterType=”int” />
<OverrideableParameter ID=”IntervalSeconds” Selector=”$Config/IntervalSeconds$” ParameterType=”int” />
</OverrideableParameters>
<ModuleImplementation Isolation=”Any”>
<Composite>
<MemberModules>
<DataSource ID=”Scheduler” TypeID=”System!System.Scheduler”>
<Scheduler>
<SimpleReccuringSchedule>
$Config/IntervalSeconds$
</SimpleReccuringSchedule>
<ExcludeDates />
</Scheduler>
<!–DataSource>
<ProbeAction ID=”Probe” TypeID=”Test.Log.ProbeActionModuleType”>
$Config/TimeoutSeconds$
<!–ProbeAction>
</MemberModules>
<Composition>
<Node ID=”Probe”>
<Node ID=”Scheduler” />
</Node>
</Composition>
</Composite>
<!–ModuleImplementation>
<OutputType>System!System.PropertyBagData</OutputType>
</DataSourceModuleType>
<ProbeActionModuleType ID=”Test.Log.ProbeActionModuleType” Accessibility=”Internal” Batching=”false” PassThrough=”false”>
<Configuration>
<xsd:element type=”xsd:integer” name=”TimeoutSeconds” xmlns:xsd=”http://www.w3.org/2001/XMLSchema&#8221; />
</Configuration>
<OverrideableParameters>
<OverrideableParameter ID=”TimeoutSeconds” Selector=”$Config/TimeoutSeconds$” ParameterType=”int” />
</OverrideableParameters>
<ModuleImplementation Isolation=”Any”>
<Composite>
<MemberModules>
<ProbeAction ID=”Probe” TypeID=”Windows!Microsoft.Windows.PowerShellPropertyBagTriggerOnlyProbe”>
<ScriptName>Log_File_Script_Monitor_WithSCOM.ps1</ScriptName>
<ScriptBody>
$Date=Get-Date -Format yyyyMM

#Import the csv log file
$File = Import-csv C:\Path\Delivery-$Date.csv  | Select site,’log date’ -Last 20

#Set value for integer variable.  This variable will increment when the Cali entry repeats consecutively
$i=1

$api = New-Object -ComObject ‘MOM.ScriptAPI’
#For each loop to either increment the $i variable if the Cali entry repeats or clear the variable if a DC entry occurs
ForEach ($Entry in $File)
{

If ($Entry.site -like “Cali*”)
{
$i++
$time=” `
Last Log Entry Date: ” + $entry.’Log Date’ `
+ “”
$site=$Entry.site
}

#Remove the variable value is a DC entry occurs
If ($Entry.site -like “*DC*”)
{
$i=1
clear-variable site
$site=$Entry.site
}
}
$bag=$api.CreatePropertyBag()
$bag.addvalue(‘RepeatCount’,$i)
$bag.addvalue(‘Site Name’,$site)
$bag.addvalue(‘Time’,$time)
$bag
</ScriptBody>
$Config/TimeoutSeconds$
</ProbeAction>
</MemberModules>
<Composition>
<Node ID=”Probe” />
</Composition>
</Composite>
</ModuleImplementation>
<OutputType>System!System.PropertyBagData</OutputType>
<TriggerOnly>true</TriggerOnly>
</ProbeActionModuleType>
</ModuleTypes>
<MonitorTypes>
<UnitMonitorType ID=”Test.Log.UnitMonitorType” Accessibility=”Internal”>
<MonitorTypeStates>
<MonitorTypeState ID=”UnderError” NoDetection=”false” />
<MonitorTypeState ID=”OverError” NoDetection=”false” />
</MonitorTypeStates>
<Configuration>
<xsd:element type=”xsd:integer” name=”TimeoutSeconds” xmlns:xsd=”http://www.w3.org/2001/XMLSchema&#8221; />
<xsd:element type=”xsd:integer” name=”IntervalSeconds” xmlns:xsd=”http://www.w3.org/2001/XMLSchema&#8221; />
</Configuration>
<OverrideableParameters>
<OverrideableParameter ID=”TimeoutSeconds” Selector=”$Config/TimeoutSeconds$” ParameterType=”int” />
<OverrideableParameter ID=”IntervalSeconds” Selector=”$Config/IntervalSeconds$” ParameterType=”int” />
</OverrideableParameters>
<MonitorImplementation>
<MemberModules>
<DataSource ID=”DataSource” TypeID=”Test.Log.DataSourceModuleType”>
<TimeoutSeconds>$Config/TimeoutSeconds$</TimeoutSeconds>
<IntervalSeconds>$Config/IntervalSeconds$</IntervalSeconds>
</DataSource>
<ProbeAction ID=”Probe” TypeID=”Test.Log.ProbeActionModuleType”>
<TimeoutSeconds>$Config/TimeoutSeconds$</TimeoutSeconds>
</ProbeAction>
<ConditionDetection ID=”UnderErrorFilter” TypeID=”System!System.ExpressionFilter”>
<Expression>
<SimpleExpression>
<ValueExpression>
<XPathQuery Type=”Integer”>Property[@Name=’RepeatCount’]</XPathQuery>
</ValueExpression>
<Operator>LessEqual</Operator>
<ValueExpression>
<Value Type=”Integer”>1</Value>
</ValueExpression>
</SimpleExpression>
</Expression>
</ConditionDetection>
<ConditionDetection ID=”OverErrorFilter” TypeID=”System!System.ExpressionFilter”>
<Expression>
<SimpleExpression>
<ValueExpression>
<XPathQuery Type=”Integer”>Property[@Name=’RepeatCount’]</XPathQuery>
</ValueExpression>
<Operator>Greater</Operator>
<ValueExpression>
<Value Type=”Integer”>10</Value>
</ValueExpression>
</SimpleExpression>
</Expression>
</ConditionDetection>
</MemberModules>
<RegularDetections>
<RegularDetection MonitorTypeStateID=”UnderError”>
<Node ID=”UnderErrorFilter”>
<Node ID=”DataSource” />
</Node>
<!–RegularDetection>
<RegularDetection MonitorTypeStateID=”OverError”>
<Node ID=”OverErrorFilter”>
<Node ID=”DataSource” />
</Node>
</RegularDetection>
</RegularDetections>
<OnDemandDetections>
<OnDemandDetection MonitorTypeStateID=”UnderError”>
<Node ID=”UnderErrorFilter”>
<Node ID=”DataSource” />
</Node>
<!–OnDemandDetection>
<OnDemandDetection MonitorTypeStateID=”OverError”>
<Node ID=”OverErrorFilter”>
<Node ID=”DataSource” />
</Node>
</OnDemandDetection>
</OnDemandDetections>
</MonitorImplementation>
</UnitMonitorType>
</MonitorTypes>
</TypeDefinitions>
<Monitoring>
<Monitors>
<UnitMonitor ID=”Test.Log.Monitor” Accessibility=”Public” Enabled=”false” Target=”Windows!Microsoft.Windows.Computer” ParentMonitorID=”Health!System.Health.PerformanceState” Remotable=”true” Priority=”Normal” TypeID=”Test.Log.UnitMonitorType” ConfirmDelivery=”false”>
<Category>PerformanceHealth</Category>
<AlertSettings AlertMessage=”Log.Monitor.Alert”>
<AlertOnState>Error</AlertOnState>
<AutoResolve>true</AutoResolve>
<AlertPriority>Low</AlertPriority>
<AlertSeverity>Information</AlertSeverity>
<AlertParameters>
<AlertParameter1>$Target/Property[Type=”Windows!Microsoft.Windows.Computer”]/DNSName$</AlertParameter1>
<AlertParameter2>$Data/Context/Property[@Name=’RepeatCount’]$</AlertParameter2>
<AlertParameter3>$Data/Context/Property[@Name=’Site Name’]$</AlertParameter3>
<AlertParameter4>$Data/Context/Property[@Name=’Time’]$</AlertParameter4>
</AlertParameters>
</AlertSettings>
<OperationalStates>
<OperationalState ID=”Success” MonitorTypeStateID=”UnderError” HealthState=”Success” />
<OperationalState ID=”Error” MonitorTypeStateID=”OverError” HealthState=”Error” />
</OperationalStates>
<Configuration>
<TimeoutSeconds>60</TimeoutSeconds>
<IntervalSeconds>600</IntervalSeconds>
</Configuration>
</UnitMonitor>
</Monitors>
</Monitoring>
<Presentation>
<StringResources>
<StringResource ID=”Log.Monitor.Alert” />
</StringResources>
</Presentation>
<LanguagePacks>
<LanguagePack ID=”ENU” IsDefault=”true”>
<DisplayStrings>
<DisplayString ElementID=”Test.Log”>
<Name>Test Log</Name>
<Description>Management pack for all monitoring for x application</Description>
<!–DisplayString>
<DisplayString ElementID=”Test.Log.Monitor.Alert”>
<Name>Test Log Monitor: Failure</Name>
<Description>The log for x is not logging events for Main on {0}

Repeat count: {1}
Log Entry: {2}
Log Entry Time: {3}

Please see the alert context for details</Description>
<!–DisplayString>
<DisplayString ElementID=”Test.Log.ProbeActionModuleType”>
<Name>Test Log ProbeActionModuleType</Name>
<Description>Probe Action Module Type for Unit Monitor: Test.Log.Monitor</Description>
<!–DisplayString>
<DisplayString ElementID=”Test.Log.DataSourceModuleType”>
<Name>Test Log DataSourceModuleType</Name>
<Description>Data Source Module Type for Unit Monitor: Test.Log.Monitor</Description>
<!–DisplayString>
<DisplayString ElementID=”Test.Log.UnitMonitorType”>
<Name>Test Log UnitMonitorType</Name>
<Description>Unit Monitor Type for Unit Monitor: Test.Log.Monitor</Description>
</DisplayString>
<DisplayString ElementID=”Test.Log.Monitor”>
<Name>Test Log Monitor</Name>
<Description>The log for x is not logging events for DC.  Please see alert context for more details.</Description>
</DisplayString>
<DisplayString ElementID=”Test.Log.Monitor” SubElementID=”Error”>
<Name>Error</Name>
</DisplayString>
<DisplayString ElementID=”Test.Log.Monitor” SubElementID=”Success”>
<Name>Success</Name>
</DisplayString>
</DisplayStrings>
</LanguagePack>
</LanguagePacks>
</ManagementPack>

This posting is provided “AS IS” with no warranties.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s