Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
WMSMonitor DB-AnalyzerThe DB analyzer is a daemon that periodically checks the WMSMonitor database looking for new data and keeps track of the status of any monitored instances and notifies their status to a NAGIOS server, that should be configured in order to accept WMSMonitor notifications. | ||||||||
Line: 66 to 66 | ||||||||
NOTE that the analyzer logs to stdout. In case of problems running the analyzer please contact wmsmon<at>cnaf.infn.it. | ||||||||
Deleted: | ||||||||
< < | -- DanieleCesini - 13 Feb 2009 |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
Changed: | ||||||||
< < | WMSMon DB-Analyzer | |||||||
> > | WMSMonitor DB-Analyzer | |||||||
Changed: | ||||||||
< < | The DB analyzer is a daemon that periodically checks the wmsmon database looking for new data. It keeps track of the status of any monitored instances and notify this status to nagios, that should be configured in order to accept this notifications. The nagios instance at cnaf is then able to send mail and sms if triggered by the notification sent by the db-analyzer. This is a cheap way to implement a notification service for WMSMonitor. | |||||||
> > | The DB analyzer is a daemon that periodically checks the WMSMonitor database looking for new data and keeps track of the status of any monitored instances and notifies their status to a NAGIOS server, that should be configured in order to accept WMSMonitor notifications. | |||||||
Changed: | ||||||||
< < | It is also possible to specify groups of instances so that special notification to nagios are sent regarding the whole group and not only the single instances. This is particularly convenient when a site has multiple instances dedicated to one VO. | |||||||
> > | The main purpose of the DB analyzer is to send notification to NAGIOS that successively is able to send email and to interact with the SMS gateway keeping a database for the alarms history. This is a cheap way to implement a robust notification service for WMSMonitor. We are however working to implement a stand alone notification service for the db analyzer. | |||||||
Changed: | ||||||||
< < | All the executable needed to start the DB analyzer are already present on the data_collector, under the usual directory /root/wmsmon It uses the same wmsmon_site-info.def file to obtain the db connection paramenters. Configuration and start of the DB Analyzer | |||||||
> > | It is also possible to specify groups of instances so that special notifications to nagios are sent about the whole group and not only the single instances. This is particularly convenient when a site has multiple instances dedicated to one VO. | |||||||
Changed: | ||||||||
< < | DB analyzer is implemented in python and many parameters are still hardcoded in the python executable, but they can simply modified editing the executable itself. | |||||||
> > | All the executable needed to start the DB analyzer are already present on the data_collector, under the usual directory /root/wmsmon It uses the same wmsmon_site-info.def file to obtain the db connection parameters. Configuration and start of the DB Analyzer | |||||||
Changed: | ||||||||
< < | The analyzer sends to nagios notification for any instance (WMS or LB) for a service MON-WMS or MON-LB, that should be configured in nagios as a service of the WMS(LB) host. | |||||||
> > | DB analyzer is implemented in python and many parameters are still hardcoded in the python executable, so they must be modified editing the executable itself. | |||||||
Changed: | ||||||||
< < | In example a typical notification is the following: | |||||||
> > | We will provide an installation script in the next WMSMonitor releases. | |||||||
Changed: | ||||||||
< < | For an LB: echo "lb010;MON-LB;0;lb010.cnaf.infn.it STATUS is OK - " | send_nsca -H gstore.cnaf.infn.it -d ';' -c /etc/nagios/send_nsca.cfg | |||||||
> > | The analyzer sends to NAGIOS notifications for a service MON-WMS or MON-LB, that should be configured in NAGIOS as a service of the WMS(LB) host. | |||||||
Changed: | ||||||||
< < | For a WMS: echo "devel14;MON-WMS;2;devel14.cnaf.infn.it STATUS is CRITICAL - At least daemon LM is dead!" | send_nsca -H gstore.cnaf.infn.it -d ';' -c /etc/nagios/send_nsca.cfg | |||||||
> > | A typical notification is the following (where gstore.cnaf.infn.it is the NAGIOS server): | |||||||
Changed: | ||||||||
< < | There are four kind of notification that can be sent for any single instance: OK, WARNING, CRITICAL, UNKNOWN defined as follow: | |||||||
> > | In case of an LB: echo "lb010;MON-LB;0;lb010.cnaf.infn.it STATUS is OK - " | send_nsca -H gstore.cnaf.infn.it -d ';' -c /etc/nagios/send_nsca.cfg | |||||||
Changed: | ||||||||
< < | OK: no problem found in the DB for that specific instance | |||||||
> > | In case of a WMS: echo "devel14;MON-WMS;2;devel14.cnaf.infn.it STATUS is CRITICAL - At least daemon LM is dead!" | send_nsca -H gstore.cnaf.infn.it -d ';' -c /etc/nagios/send_nsca.cfg | |||||||
Changed: | ||||||||
< < | WARNING: problems are found but they are not critical, i.e. internal wms/lb components queues are forming but not too high or a file system occupancy is between 80% and 90% | |||||||
> > | There are four kinds of notification that can be sent for any single instance: OK, WARNING, CRITICAL, UNKNOWN defined as follow: | |||||||
Changed: | ||||||||
< < | CRITICAL: something bad was found in the db about the isntance: i.e. internal wms/lb components queues are greater than 3000 entries or a file system occupancy greater than 90% | |||||||
> > | OK: no problem found in the DB for that specific instance | |||||||
Changed: | ||||||||
< < | NOTE that the analyzer is able to associate an LB to a WMS from the information stored into the DB. The status of the LB affects the status of the WMS, but not vice versa. If the LB is in WARNING and the WMS itself is OK the notification for the WMS will be WARNING. The worst status between the WMS and LB are notified for the WMS. | |||||||
> > | WARNING: problems are found but they are not critical, i.e. internal WMS/LB components queues are increasing but are not too high or a file system occupancy is between 80% and 90% | |||||||
Changed: | ||||||||
< < | UNKNOWN: the latest data about an instance are too old to have a reliable status | |||||||
> > | CRITICAL: something bad was found in the db about the instance: i.e. internal WMS/LB components queues are greater than 3000 entries or a file system occupancy greater than 90% | |||||||
Changed: | ||||||||
< < | Nagios should be configured to handle all this notification. In example the cnaf nagios is configured to notify via mail every status change on any instance. | |||||||
> > | NOTE that the analyzer is able to associate an LB to a WMS from the information stored into the DB. The status of the LB affects the status of the WMS, but not vice versa. If the LB is in WARNING and the WMS itself is OK the notification for the WMS will be WARNING. The worst status between the WMS and LB are notified for the WMS. | |||||||
Changed: | ||||||||
< < | The DB analyzer send notifications also about groups of instances. | |||||||
> > | UNKNOWN: the latest data about an instance are too old to have a reliable status | |||||||
Changed: | ||||||||
< < | Groups | |||||||
> > | NAGIOS should be configured to handle all these notification. In example the CNAF NAGIOS is configured to notify via mail every status change on any instance. | |||||||
Changed: | ||||||||
< < | Before starting the analyzer you should create a file containing the definition of the groups of instances. | |||||||
> > | The DB analyzer send notifications also about groups of instances.
Groups are discovered from the WMSMonitor DB, they reflect the group reported in the third coloumn of the wmslist.conf file.
Notifications are sent to NAGIOS for each group following these rules:
OK: no problem found in the DB for that specific group
WARNING: less than 50% of the group instances are in critical status.
CRITICAL: more than 50% of the group instances are in critical status.
UNKNOWN: the latest data about an instance are too old to have a reliable status.
It is possible to configure subgroups for any group editing the file /root/groupfile. I.e. to create the groups ANALYSIS and PROD for the CMS VO the groupfile looks like:
#cat /root/groupfile wms001.yuor_domain cms PROD wms002.your_domain cms ANALYSIS wms003.your_domain cms ANALYSIS wms004.your_domain cms PROD wms005.your_domain cms ANALYSIS In this way notifications are sent for each subgroups and not for the groups itself and by default notification are sent for NAGIOS-services called GROUP-SUBGROUP-WMS belonging to the WMSMonitor server host. As for single instances NAGIOS should be configured to handle (sub)groups notifications. Before starting the analyzer you should configure the hostname of the NAGIOS server. This must be done by hand editing the file /root/wmsmon/bin/analyzer-utils.py substituting the string "gstore.cnaf.infn.it" with your NAGIOS server hostname. Now you are ready to start the analyzer as a normal Linux backgroud process: #/root/wmsmon/bin/wmsmon-db-analyzer.py > /var/log/wmsmon-db-analyzer.log 2>&1 & NOTE that the analyzer logs to stdout. In case of problems running the analyzer please contact wmsmon<at>cnaf.infn.it. | |||||||
-- DanieleCesini - 13 Feb 2009 |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
Added: | ||||||||
> > | WMSMon DB-AnalyzerThe DB analyzer is a daemon that periodically checks the wmsmon database looking for new data.It keeps track of the status of any monitored instances and notify this status to nagios, that should be configured in order to accept this notifications. The nagios instance at cnaf is then able to send mail and sms if triggered by the notification sent by the db-analyzer. This is a cheap way to implement a notification service for WMSMonitor. It is also possible to specify groups of instances so that special notification to nagios are sent regarding the whole group and not only the single instances. This is particularly convenient when a site has multiple instances dedicated to one VO. All the executable needed to start the DB analyzer are already present on the data_collector, under the usual directory /root/wmsmon It uses the same wmsmon_site-info.def file to obtain the db connection paramenters. Configuration and start of the DB Analyzer DB analyzer is implemented in python and many parameters are still hardcoded in the python executable, but they can simply modified editing the executable itself. The analyzer sends to nagios notification for any instance (WMS or LB) for a service MON-WMS or MON-LB, that should be configured in nagios as a service of the WMS(LB) host. In example a typical notification is the following: For an LB: echo "lb010;MON-LB;0;lb010.cnaf.infn.it STATUS is OK - " | send_nsca -H gstore.cnaf.infn.it -d ';' -c /etc/nagios/send_nsca.cfg For a WMS: echo "devel14;MON-WMS;2;devel14.cnaf.infn.it STATUS is CRITICAL - At least daemon LM is dead!" | send_nsca -H gstore.cnaf.infn.it -d ';' -c /etc/nagios/send_nsca.cfg There are four kind of notification that can be sent for any single instance: OK, WARNING, CRITICAL, UNKNOWN defined as follow: OK: no problem found in the DB for that specific instance WARNING: problems are found but they are not critical, i.e. internal wms/lb components queues are forming but not too high or a file system occupancy is between 80% and 90% CRITICAL: something bad was found in the db about the isntance: i.e. internal wms/lb components queues are greater than 3000 entries or a file system occupancy greater than 90% NOTE that the analyzer is able to associate an LB to a WMS from the information stored into the DB. The status of the LB affects the status of the WMS, but not vice versa. If the LB is in WARNING and the WMS itself is OK the notification for the WMS will be WARNING. The worst status between the WMS and LB are notified for the WMS. UNKNOWN: the latest data about an instance are too old to have a reliable status Nagios should be configured to handle all this notification. In example the cnaf nagios is configured to notify via mail every status change on any instance. The DB analyzer send notifications also about groups of instances. Groups Before starting the analyzer you should create a file containing the definition of the groups of instances. -- DanieleCesini - 13 Feb 2009 |