WMSMon DB-Analyzer

The DB analyzer is a daemon that periodically checks the wmsmon database looking for new data.
It keeps track of the status of any monitored instances and notify this status to nagios, that should be configured in order to accept this notifications.
The nagios instance at cnaf is then able to send mail and sms if triggered by the notification sent by the db-analyzer.
This is a cheap way to implement a notification service for WMSMonitor.

It is also possible  to specify groups of instances so that special notification to nagios are sent regarding the whole group and not only the single instances.
This is particularly convenient when a site has multiple instances dedicated to one VO.

All the executable needed to start the DB analyzer are already present on the data_collector, under the usual directory /root/wmsmon
It uses the same wmsmon_site-info.def file to obtain the db connection paramenters.

Configuration and start of the DB Analyzer

DB analyzer is implemented in python and many parameters are still hardcoded in the python executable, but they can simply modified editing the executable itself.

The analyzer sends to nagios notification for any instance (WMS or LB) for a service  MON-WMS or MON-LB, that should be configured in nagios as a service of the WMS(LB) host.

In example a typical notification is the following:

For an LB: echo "lb010;MON-LB;0;lb010.cnaf.infn.it STATUS is OK - " | send_nsca -H gstore.cnaf.infn.it -d ';' -c /etc/nagios/send_nsca.cfg

For a WMS: echo "devel14;MON-WMS;2;devel14.cnaf.infn.it STATUS is CRITICAL - At least daemon LM is dead!" | send_nsca -H gstore.cnaf.infn.it -d ';' -c /etc/nagios/send_nsca.cfg

There are four kind of notification that can be sent for any single instance: OK, WARNING, CRITICAL, UNKNOWN defined as follow:

OK: no problem found in the DB for that specific instance

WARNING: problems are found but they are not critical, i.e. internal wms/lb components queues are forming but not too high or a file system occupancy is between 80% and 90%

CRITICAL: something bad was found in the db about the isntance: i.e. internal wms/lb components queues are greater than 3000 entries or a file system occupancy greater than 90%

NOTE that the analyzer is able to associate an LB to a WMS from the information stored into the DB.  The status of the LB affects the status of the WMS, but not vice versa. If the LB is in WARNING and the WMS itself is OK the notification for the WMS will be WARNING.  The worst status between the WMS and LB are notified for the WMS.

UNKNOWN: the latest data about an instance are too old to have a reliable status

Nagios should be configured to handle all this notification.  In example the cnaf nagios is configured to notify via mail every status change on any instance.

The DB analyzer send notifications also about groups of instances.

Groups

Before starting the analyzer you should create a file containing the definition of the groups of instances.

-- DanieleCesini - 13 Feb 2009


This topic: WMSMonitor > WMSMonDBAnalyzer
Topic revision: r1 - 2009-02-13 - DanieleCesini
 
TWIKI.NET
This site is powered by the TWiki collaboration platformCopyright © 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback