WMSMonitor DB-Analyzer

The DB analyzer is a daemon that periodically checks the WMSMonitor database looking for new data and keeps track of the status of any monitored instances and notifies their status to a NAGIOS server, that should be configured in order to accept WMSMonitor notifications.

The main purpose of the DB analyzer is to send notification to NAGIOS that successively is able to send email and to interact with the SMS gateway keeping a database for the alarms history.
This is a cheap way to implement a robust notification service for WMSMonitor. We are however working to implement a stand alone notification service for the db analyzer.

It is also possible  to specify groups of instances so that special notifications to nagios are sent about the whole group and not only the single instances.
This is particularly convenient when a site has multiple instances dedicated to one VO.

All the executable needed to start the DB analyzer are already present on the data_collector, under the usual directory /root/wmsmon
It uses the same wmsmon_site-info.def file to obtain the db connection parameters.

Configuration and start of the DB Analyzer

DB analyzer is implemented in python and many parameters are still hardcoded in the python executable, so they must be modified editing the executable itself.

We will provide an installation script in the next WMSMonitor releases.

The analyzer sends to NAGIOS notifications for a service  MON-WMS or MON-LB, that should be configured in NAGIOS as a service of the WMS(LB) host.

A typical notification is the following (where gstore.cnaf.infn.it is the NAGIOS server):

In case of an LB: echo "lb010;MON-LB;0;lb010.cnaf.infn.it STATUS is OK - " | send_nsca -H gstore.cnaf.infn.it -d ';' -c /etc/nagios/send_nsca.cfg

In case of a WMS: echo "devel14;MON-WMS;2;devel14.cnaf.infn.it STATUS is CRITICAL - At least daemon LM is dead!" | send_nsca -H gstore.cnaf.infn.it -d ';' -c /etc/nagios/send_nsca.cfg

There are four kinds of notification that can be sent for any single instance: OK, WARNING, CRITICAL, UNKNOWN defined as follow:

OK: no problem found in the DB for that specific instance

WARNING: problems are found but they are not critical, i.e. internal WMS/LB components queues are increasing but are not too high or a file system occupancy is between 80% and 90%

CRITICAL: something bad was found in the db about the instance: i.e. internal WMS/LB components queues are greater than 3000 entries or a file system occupancy greater than 90%

NOTE that the analyzer is able to associate an LB to a WMS from the information stored into the DB.  The status of the LB affects the status of the WMS, but not vice versa. If the LB is in WARNING and the WMS itself is OK the notification for the WMS will be WARNING.  The worst status between the WMS and LB are notified for the WMS.

UNKNOWN: the latest data about an instance are too old to have a reliable status

NAGIOS should be configured to handle all these notification.  In example the CNAF NAGIOS is configured to notify via mail every status change on any instance.

The DB analyzer send notifications also about groups of instances.

Groups are discovered from the WMSMonitor DB, they reflect the group reported in the third coloumn of the wmslist.conf file.

Notifications are sent to NAGIOS for each group following these rules:

OK: no problem found in the DB for that specific group

WARNING: less than 50% of the group instances are in critical status.

CRITICAL: more than 50% of the group instances are in critical status. UNKNOWN: the latest data about an instance are too old to have a reliable status.

It is possible to configure subgroups for any group editing the file /root/groupfile. I.e. to create the groups ANALYSIS and PROD for the CMS VO the groupfile looks like:

#cat /root/groupfile
wms001.yuor_domain cms PROD
wms002.your_domain cms ANALYSIS
wms003.your_domain cms ANALYSIS
wms004.your_domain cms PROD
wms005.your_domain cms ANALYSIS

In this way notifications are sent for each subgroups and not for the groups itself and by default notification are sent for NAGIOS-services called GROUP-SUBGROUP-WMS belonging to the WMSMonitor server host.

As for single instances NAGIOS should be configured to handle (sub)groups notifications.

Before starting the analyzer you should configure the hostname of the NAGIOS server.  This must be done by hand editing the file /root/wmsmon/bin/analyzer-utils.py substituting the string "gstore.cnaf.infn.it" with your  NAGIOS server hostname.

Now you are ready to start the analyzer as a normal Linux backgroud process:

#/root/wmsmon/bin/wmsmon-db-analyzer.py > /var/log/wmsmon-db-analyzer.log 2>&1 &

NOTE that the analyzer logs to stdout.

In case of problems running the analyzer please contact wmsmon<at>cnaf.infn.it.

This topic: WMSMonitor > WMSMonDBAnalyzer
Topic revision: r3 - 2009-03-04 - DanieleCesini
This site is powered by the TWiki collaboration platformCopyright © 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback