Difference: WebDocumentationz (1 vs. 2)

Revision 22008-01-30 - DaniloDongiovanni

Line: 1 to 1
 
META TOPICPARENT name="WebDownloadz"
Changed:
<
<

Main Features of the tool

WMSMonitor monitors two kinds of variables:
- Instant values of WMSLB service and HW status
- Mean job flow rates between wms components (WMproxy, Workload Manager, Job Controller, Condor) across a given time interval (15 min)


1) data collector and sensor written in python with a mysql db as backend on a dedicated server, web pages written in php
2) data are collected by sensors on wms and lb and sent to the server using a pyhton soap module
3) flow rates are based on data deriving from lb queries and reported as mean values on 15 mins intervals.
4) daily cumulative values are also reported (click on the tab "Daily Statistics" )
5) short automatic help on mouse pointer positioning over main buttons and variables
6) passing the mouse pointer on charts a window will pop up with the correspondent variable values and date of measurement.
7) link to lemon monitoring of the specific instance

>
>

Overview

Main Page Open the CNAF instance at
https://cert-wms-01:8443/wmsmon/main/main.php
Notice that you need a certificate in your browser in order to access it.
Once in the main page you will see a table with 1 row for each WMSLB instance monitored with a sintethic overview of instance status across columns
Notice that a filter to find VO dedicated WMSLB instances is provided
The following columns are reported:
WMS: specific WMS instance hostname. Clicking on it you will access the relative detailed data page, described below
DATE: date/time of last measurement
RUNNING
JOBS: IDLE JOBS WM QUEUE JC QUEUE VO VIEWS LB EVENTS QUEUE CPU LOAD SANDBOX PARTITION GENERAL STATUS DAEMONS STATUS

Specific WMSLB Instance Detailed data page

;;;;;;;;;;;;

Short Description of the tool

WMSMonitor monitors two kinds of variables:
- WMSLB service and HW status variables (such as daemons status, condor jobs status statistics, File descriptors opened by main processes..) for which the value at the time of measurement is shown
- Mean job flow rates between wms components (WMproxy, Workload Manager, Job Controller, Condor) across a given time interval reported as well (across past 15 min is the default)

*UNDERSTANDING CHARTS*
For most relevant variables, charts with recent history (whose time interval can be selected) are made available
*Components Report History*
Four charts are available under the tag "Components Report History" reporting info respectively on :

1 -Condor Statistics: i.e. the number of jobs in the status "Running", "Idle"
2 Component Queues: i.e. the number of jobs enqueued to and waiting to be processed correspondent wmslb component
In particular the number of jobs enqueued to the Workload Manager and Job Controller are reported.
Also the number of events still to be transferred/processed by the LB (and therefore not yet accounted for in LB queries from users or wmsmon itself), is reported as "LB events queue"
*3*-Job Flow Rates: i.e. the number of jobs processed by the correspondent component reported as mean value in Hz. Jobs->WMProxy: for each point reports the mean job submission rate since previous measurement (ex. 900 jobs successfully submitted to the WMS between 2008-20-01 10:00:00 and 2008-20-01 10:15:00 , i.e. 15 min = 900sec, will produce a 1 Hz point at time 20-01 10:15)
Jobs->WM: for each point reports the mean rate of jobs enqueued to Workload Manager from WMproxy since previous measurement (ex. 900 jobs successfully enqueued to Workload Manager from WMproxy between 2008-20-01 10:00:00 and 2008-20-01 10:15:00 , i.e. 15 min = 900sec, will produce a 1 Hz point at time 20-01 10:15)
Jobs Resub->WM: for each point reports the mean rate of jobs enqueued to Workload Manager from JobController, i.e. resubmitted after failure, since previous measurement (ex. 900 jobs successfully enqueued to Workload Manager from JobController between 2008-20-01 10:00:00 and 2008-20-01 10:15:00 , i.e. 15 min = 900sec, will produce a 1 Hz point at time 20-01 10:15)
*4*-Job Flow Rates: i.e. the number of jobs processed by the correspondent component reported as mean value in Hz. Total Jobs->WM: for each point reports the mean rate of jobs enqueued to Workload Manager from both WMproxy and Job Controller (i.e. both Submitted and Resubmitted jobs) since previous measurement (ex. 900 jobs successfully enqueued to Workload Manager from both WMproxy and Job Controller between 2008-20-01 10:00:00 and 2008-20-01 10:15:00 , i.e. 15 min = 900sec, will produce a 1 Hz point at time 20-01 10:15)
Jobs->JC: for each point reports the mean rate of jobs enqueued to Job Controller from Workload Manager since previous measurement (ex. 900 jobs successfully enqueued to Job Controller from Workload Manage between 2008-20-01 10:00:00 and 2008-20-01 10:15:00 , i.e. 15 min = 900sec, will produce a 1 Hz point at time 20-01 10:15)
Jobs->Condor: for each point reports the mean rate of jobs enqueued to Condor from Job Controller since previous measurement (ex. 900 jobs successfully enqueued to Condor from Job Controller between 2008-20-01 10:00:00 and 2008-20-01 10:15:00 , i.e. 15 min = 900sec, will produce a 1 Hz point at time 20-01 10:15)

*Daily Statistics*
Three charts are available under the tag "Daily Statistics" reporting info respectively on :
*1*-Jobs Flow: for each day in the selected interval the total number of Jobs processed respectively by each component are reported.
In particular in the chart are reported:
Jobs->WMProxy: for each point reports the total number of jobs successfully submitted to the WMS for correspondent day.
Jobs ->WM: for each point reports the total number of jobs successfully enqueued to Workload Manager from both WMproxy for correspondent day.
Jobs Resub->WM: for each point reports the total number of jobs enqueued to Workload Manager from JobController (i.e. total number of resubmissions) for correspondent day.

*2*-Jobs Final State:for each day in the selected interval the total number of Jobs with final state "Done Successfully" and "Aborted" respectively are reported.

*3*-Jobs Flow: for each day in the selected interval the total number of Jobs processed respectively by each component are reported.
In particular in the chart are reported:
Total Jobs->WM: for each point reports the total number of jobs enqueued to Workload Manager from both WMproxy and Job Controller (i.e. both Submitted and Resubmitted jobs) for correspondent day.
Jobs->JC: for each point reports the total number of jobs enqueued to Job Controller from Workload Manager for correspondent day.
Jobs->Condor: for each point reports the total number of jobs enqueued to Condor from Job Controller for correspondent day.

Notice that: - A short automatic help on mouse pointer positioning over main buttons and variables
- Passing the mouse pointer on charts a window will pop up with the correspondent variable values and date of measurement.
- A link to lemon monitoring of the specific instance (both WMS and LB if on separate machines) is available

 

User Documentation

Added:
>
>
2) data are collected by sensors on wms and lb and sent to the server using a pyhton soap module
-->
 

Interaction with Related Tools

Changed:
<
<

Future work:

-Packaging for current release deployment
- Implementation of estimates of job latency
- cumulative stats plot reporting on all wms in use by a specific VO
- old RB monitoring
- mail notifications system
>
>

Future work:

-Packaging for current release deployment
- Implementation of estimates of job latency
- cumulative stats plot reporting on all wms in use by a specific VO
- old RB monitoring
- mail notifications system
  -- DaniloDongiovanni - 25 Jan 2008

Revision 12008-01-25 - DaniloDongiovanni

Line: 1 to 1
Added:
>
>
META TOPICPARENT name="WebDownloadz"

Main Features of the tool

WMSMonitor monitors two kinds of variables:
- Instant values of WMSLB service and HW status
- Mean job flow rates between wms components (WMproxy, Workload Manager, Job Controller, Condor) across a given time interval (15 min)


1) data collector and sensor written in python with a mysql db as backend on a dedicated server, web pages written in php
2) data are collected by sensors on wms and lb and sent to the server using a pyhton soap module
3) flow rates are based on data deriving from lb queries and reported as mean values on 15 mins intervals.
4) daily cumulative values are also reported (click on the tab "Daily Statistics" )
5) short automatic help on mouse pointer positioning over main buttons and variables
6) passing the mouse pointer on charts a window will pop up with the correspondent variable values and date of measurement.
7) link to lemon monitoring of the specific instance

User Documentation

<--

Installation

  • Requirements

User Requirements

  • You need the certificate and flash installed on your browser

Architecture

-->

Interaction with Related Tools

Future work:

-Packaging for current release deployment
- Implementation of estimates of job latency
- cumulative stats plot reporting on all wms in use by a specific VO
- old RB monitoring
- mail notifications system

-- DaniloDongiovanni - 25 Jan 2008

 
TWIKI.NET
This site is powered by the TWiki collaboration platformCopyright © 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback