Difference: WebDocumentation (14 vs. 15)

Revision 152009-01-30 - DaniloDongiovanni

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Overview

WMSMonitor is a monitoring tool collecting information from a cluster of distributed WMS instances about service status and Job Flow. Collected information is aggregated and presented exploiting two main keys: single WMS instance information and Virtual Organization (VO) Job Flow information.
These two information aggregation keys lead to two parallel branches of metrics presentation in the GUI:
Line: 15 to 15
 
  • WMSMonitor SERVER: WMSMonitor runs on a dedicated server machine with a Mysql database to store data, Php, Apache and Python on it.
  • WMSLB INSTANCES: A cron on the WMSMonitor server collects data from each WMS/LB instance executing python compiled functions on them. This python executable on each WMS/LB instance implements sensors and send data to the WMSMonitor server using snmp. Job flow rates are calculated using direct queries to the LB database.
  • WEB PUBLISHING: Exploiting php based functions, data are retrieved from a local Mysql DB, aggregated and published on a graphical user interface through a secure https protocol. Charts are implemented exploiting "Open Flash Chart", an open source flash based set of libraries.
Changed:
<
<
  • DB ANALYZER: a python daemon that check the latest status of the cluster and send notification to a nagios server providing mail and/or sms allert notofications.
  • DB RE-FILLER: I can happen for various reasons (i.e. high dg20 file queue on the wms) that data for some jobs are not available on the LB database when the sensors are triggered on the LB servers, resulting in a loss of data in teh wmsmonitor sevrver db. The data collector does its best trying to fix the situation, but if data does not reach the lb server in a reasonable time (about ten hours by default, a configurable timeout) the collector gives up and a manula recostruction is need. This utility perform such a reconstruction.
>
>
  • DB ANALYZER: a python daemon that checks the latest status of the cluster and sends notification to a nagios server providing mail and/or sms alert notifications.
  • DB RE-FILLER: I can happen for various reasons (i.e. high dg20 file queue on the wms) that data for some jobs are not available on the LB database when the sensors are triggered on the LB servers, resulting in a loss of data in the wmsmonitor server db. The data collector does its best trying to fix the situation, but if data does not reach the lb server in a reasonable time (about ten hours by default, a configurable timeout) the collector gives up and a manual reconstruction is needed. This utility perform such a reconstruction.
 

Working On

  • Implementation of estimates of job latency
  • ICE sensors
Added:
>
>
  • WMS Load Balancing system
 \ No newline at end of file
 
TWIKI.NET
This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback