You are here: TWiki> WMSMonitor Web>WebDocumentation (revision 13)


WMSMonitor is a monitoring tool collecting information from a cluster of distributed WMS instances about service status and Job Flow. Collected information is aggregated and presented exploiting two main keys: single WMS instance information and Virtual Organization (VO) Job Flow information.
These two information aggregation keys lead to two parallel branches of metrics presentation in the GUI:
  1. WMS view focusing on WMS cluster status and performance monitoring. In the first branch we find a main page WMSMonitor main reporting an overview of most important status variables for all WMS instances monitored through which we can access a Specific WMSLB Instance Detailed data page with detailed data for each monitored WMS instance including Job flow rates between internal components other than system status indicator measuremnts.
  2. VOView providing Job flow statistics using the VOs as aggregation criterion. VOView and VO Stats Page

User Requirements

  • You need the certificate and flash installed on your browser
  • Basic Knowledge of WMSLB architecture

Tool Architecture Overview

  • WMSMonitor SERVER: WMSMonitor runs on a dedicated server machine with a Mysql database to store data, Php, Apache and Python on it.
  • WMSLB INSTANCES: A cron on the WMSMonitor server collects data from each WMS/LB instance executing python compiled functions on them. This python executable on each WMS/LB instance implements sensors and send data to the WMSMonitor server using snmp. Job flow rates are calculated using direct queries to the LB database.
  • WEB PUBLISHING: Exploiting php based functions, data are retrieved from a local Mysql DB, aggregated and published on a graphical user interface through a secure https protocol. Charts are implemented exploiting "Open Flash Chart", an open source flash based set of libraries.
  • DB ANALYZER: a python daemon that check the latest status of the cluster and send notification to a nagios server providing mail and/or sms allert notofications.
  • DB RE-FILLER: I can happen for various reasons (i.e. high dg20 file queue on the wms) that data for some jobs are not available on the LB database when the sensors are triggered on the LB servers, resulting in a loss of data in teh wmsmonitor sevrver db. The data collector does its best trying to fix the situation, but if data does not reach the lb server in a reasonable time (about ten hours by default, a configurable timeout) the collector gives up and a manula recostruction is need. This utility perform such a reconstruction.

Working On

  • Implementation of estimates of job latency
  • DB restructuring
  • Job flow rate measurements using wms data only in case of hgh dg20 queues, in order to avoid data loss in LB queries
  • ICE sensors
Edit | Attach | Print version | History: r19 | r15 < r14 < r13 < r12 | Backlinks | Raw View | More topic actions...
Topic revision: r13 - 2009-01-29 - DaniloDongiovanni
Edit Attach

This site is powered by the TWiki collaboration platformCopyright © 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback