You are here: TWiki> WMSMonitor Web>WebDocumentation (2009-06-15, DanieleCesini)

Overview

WMSMonitor is a monitoring tool collecting information from a cluster of distributed WMS instances about service status and Job Flow. Collected information is aggregated and presented exploiting two main keys: single WMS instance information and Virtual Organization (VO) information.
These two information aggregation keys lead to two parallel branches of metrics presentation in the GUI:
  1. WMS view focusing on WMS cluster status and performance monitoring. In the first branch we find a main page WMSMonitor main reporting an overview of most important status variables for all WMS instances monitored through which we can access a Specific WMSLB Instance Detailed data page with detailed data for each monitored WMS instance including Job flow rates between internal components other than system status indicator measuremnts.
  2. VOView providing Job flow statistics using the VOs as aggregation criterion. VOView and VO Stats Page
Starting from WMSMonitor version 2.0, clicking on "WMS view" or "VO View" tabs, a window menu let you navigate through all available WMSmonitor web pages for each WMS instance or VO respectively.

User Requirements

  • You need the certificate and flash installed on your browser
  • Basic Knowledge of WMSLB architecture

Tool Architecture

WMSMonitor is composed by the following main components:
  • WMSMonitor SERVER: WMSMonitor runs on a dedicated server machine with a Mysql database to store data, Php, Apache and Python on it.

  • WMSLB INSTANCES: A cron on the WMSMonitor server collects data from each WMS/LB instance executing sensors on them. These sensors send data to the WMSMonitor server using snmp. Job flow rates are calculated using direct queries to the LB database.

  • WEB PUBLISHING: Exploiting php based functions, data are retrieved from a local Mysql DB, aggregated and published on a graphical user interface through a (secure) http protocol. Charts are implemented exploiting "Open Flash Chart", an open source flash based set of libraries.

  • DB ANALYZER: a python daemon that checks the latest status of the cluster and sends notification to a nagios server providing mail and/or sms alert notifications.

  • DB RE-FILLER: I can happen for various reasons (i.e. high dg20 file queue on the wms) that data for some jobs are not available on the LB database when the sensors are triggered on the LB servers, resulting in a loss of data in the wmsmonitor server db. The data collector does its best trying to fix the situation, but if data does not reach the lb server in a reasonable time (about ten hours by default, a configurable timeout) the collector gives up and a manual reconstruction is needed. This utility perform such a reconstruction.
wmsmon-arch.png

Working On

  • Implementation of estimates of job latency
  • ICE sensors 
  • Apache ActiveMQ communication system
  • DB redesigna
  • Performance improvement
Topic attachments
I Attachment Action Size Date Who Comment
PNGpng wmsmon-arch.png manage 49.1 K 2009-06-15 - 09:16 DanieleCesini  
PNGpng wmsmon_arch_1.4.png manage 39.4 K 2009-02-13 - 13:36 DanieleCesini  
Topic revision: r19 - 2009-06-15 - DanieleCesini
 
TWIKI.NET
This site is powered by the TWiki collaboration platformCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback