Difference: WebDocumentation (17 vs. 18)

Revision 182009-02-13 - DanieleCesini

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Overview

WMSMonitor is a monitoring tool collecting information from a cluster of distributed WMS instances about service status and Job Flow. Collected information is aggregated and presented exploiting two main keys: single WMS instance information and Virtual Organization (VO) information.
These two information aggregation keys lead to two parallel branches of metrics presentation in the GUI:
Line: 10 to 10
 
  • You need the certificate and flash installed on your browser
  • Basic Knowledge of WMSLB architecture
Changed:
<
<

Tool Architecture Overview

>
>

Tool Architecture

 WMSMonitor is composed by the following main components:
  • WMSMonitor SERVER: WMSMonitor runs on a dedicated server machine with a Mysql database to store data, Php, Apache and Python on it.
Added:
>
>
 
  • WMSLB INSTANCES: A cron on the WMSMonitor server collects data from each WMS/LB instance executing sensors on them. These sensors send data to the WMSMonitor server using snmp. Job flow rates are calculated using direct queries to the LB database.
Added:
>
>
 
  • WEB PUBLISHING: Exploiting php based functions, data are retrieved from a local Mysql DB, aggregated and published on a graphical user interface through a (secure) http protocol. Charts are implemented exploiting "Open Flash Chart", an open source flash based set of libraries.
Added:
>
>
 
  • DB ANALYZER: a python daemon that checks the latest status of the cluster and sends notification to a nagios server providing mail and/or sms alert notifications.
Added:
>
>
 
  • DB RE-FILLER: I can happen for various reasons (i.e. high dg20 file queue on the wms) that data for some jobs are not available on the LB database when the sensors are triggered on the LB servers, resulting in a loss of data in the wmsmonitor server db. The data collector does its best trying to fix the situation, but if data does not reach the lb server in a reasonable time (about ten hours by default, a configurable timeout) the collector gives up and a manual reconstruction is needed. This utility perform such a reconstruction.

 
TWIKI.NET
This site is powered by the TWiki collaboration platformCopyright © 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback