You are here: TWiki> WMSMonitor Web>WebDocumentation (revision 11)

Overview

WMSMonitor is composed by a main page reporting an overview of most important status variables for all WMS instances monitored
and a page with detailed data for each specific instance.
Deatails for a wms instance includes Job flow rates between internal components other than system status indicator measuremnts. 
A VOView is also provided to obtain usage statistics by the VOs.
Follow the following links for further details on what is provided by the monitor.

User Requirements

  • You need the certificate and flash installed on your browser
  • Basic Knowledge of WMSLB architecture

Tool Architecture Overview

  • WMSMonitor SERVER: WMSMonitor runs on a dedicated server machine with a Mysql database to store data, Php, Apache and Python on it.
  • WMSLB INSTANCES: A cron on the WMSMonitor server collects data from each WMS/LB instance executing python compiled functions on them. This python executable on each WMS/LB instance implements sensors and send data to the WMSMonitor server using snmp. Job flow rates are calculated using direct queries to the LB database.
  • WEB PUBLISHING: Exploiting php based functions, data are retrieved from a local Mysql DB, aggregated and published on a graphical user interface through a secure https protocol. Charts are implemented exploiting "Open Flash Chart", an open source flash based set of libraries.
  • DB ANALYZER: a python daemon that check the latest status of the cluster and send notification  to a nagios server providing mail and/or sms allert notofications.
  • DB RE-FILLER: I can happen for various reasons (i.e. high dg20 file queue on the wms) that data for some jobs are not available on the LB database when the sensors are triggered on the LB servers, resulting in a loss of data in teh wmsmonitor sevrver db.  The data collector does its best trying to fix the situation, but if data does not reach the lb server in a reasonable time (about ten hours by default, a configurable timeout)  the collector gives up and a manula recostruction is need. This utility perform such a reconstruction.

Working On

  • Implementation of estimates of job latency
  • Old lcg-rb monitoring (probably will not be done because of the decommissioning of lcg-rb )
  • DB restructoring
  • Job flow rate measurements using wms data only in case of hgh dg20 queues, in order to avoid data loss in LB queries
  • ICE sensors
  • JOBdir sensor
  • Geographically distrubuted WMS/LB cluster monitoring
Edit | Attach | Print version | History: r19 | r13 < r12 < r11 < r10 | Backlinks | Raw View | More topic actions...
Topic revision: r11 - 2008-09-16 - DanieleCesini
 
Edit Attach

TWIKI.NET
This site is powered by the TWiki collaboration platformCopyright © 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback