You are here:
TWiki
>
WMSMonitor Web
>
WebLeftBar
>
WebHomez
>
WebAboutz
>
WebDownloadz
>
WebDocumentationz
(revision 2) (raw view)
---+ Overview <b>Main Page </b>Open the CNAF instance at<br /> [[https://cert-wms-01:8443/wmsmon/main/main.php<br>][https://cert-wms-01:8443/wmsmon/main/main.php<br />]] Notice that you need a certificate in your browser in order to access it.<br /> Once in the main page you will see a table with 1 row for each WMSLB instance monitored with a sintethic overview of instance status across columns<br /> Notice that a filter to find VO dedicated WMSLB instances is provided<br /> The following columns are reported:<br /> WMS: specific WMS instance hostname. Clicking on it you will access the relative detailed data page, described below<br /> DATE: date/time of last measurement<br /> RUNNING<br />JOBS: IDLE JOBS WM QUEUE JC QUEUE VO VIEWS LB EVENTS QUEUE CPU LOAD SANDBOX PARTITION GENERAL STATUS DAEMONS STATUS *Specific WMSLB Instance Detailed data page* ;;;;;;;;;;;; ---+ Short Description of the tool WMSMonitor monitors two kinds of variables:<br /> - WMSLB service and HW status variables (such as daemons status, condor jobs status statistics, File descriptors opened by main processes..) for which the value at the time of measurement is shown<br /> - Mean job flow rates between wms components (WMproxy, Workload Manager, Job Controller, Condor) across a given time interval reported as well (across past 15 min is the default)<br /> *UNDERSTANDING CHARTS*<br /> For most relevant variables, charts with recent history (whose time interval can be selected) are made available<br /> *Components Report History*<br /> Four charts are available under the tag "Components Report History" reporting info respectively on :<br /> *1* -Condor Statistics: i.e. the number of jobs in the status "Running", "Idle"<br /> 2 Component Queues: i.e. the number of jobs enqueued to and waiting to be processed correspondent wmslb component <br /> In particular the number of jobs enqueued to the Workload Manager and Job Controller are reported.<br /> Also the number of events still to be transferred/processed by the LB (and therefore not yet accounted for in LB queries from users or wmsmon itself), is reported as "LB events queue"<br /> *3*-Job Flow Rates: i.e. the number of jobs processed by the correspondent component reported as mean value in Hz. Jobs->WMProxy: for each point reports the mean job submission rate since previous measurement (ex. 900 jobs successfully submitted to the WMS between 2008-20-01 10:00:00 and 2008-20-01 10:15:00 , i.e. 15 min = 900sec, will produce a 1 Hz point at time 20-01 10:15) <br /> Jobs->WM: for each point reports the mean rate of jobs enqueued to Workload Manager from WMproxy since previous measurement (ex. 900 jobs successfully enqueued to Workload Manager from WMproxy between 2008-20-01 10:00:00 and 2008-20-01 10:15:00 , i.e. 15 min = 900sec, will produce a 1 Hz point at time 20-01 10:15) <br /> Jobs Resub->WM: for each point reports the mean rate of jobs enqueued to Workload Manager from JobController, i.e. resubmitted after failure, since previous measurement (ex. 900 jobs successfully enqueued to Workload Manager from JobController between 2008-20-01 10:00:00 and 2008-20-01 10:15:00 , i.e. 15 min = 900sec, will produce a 1 Hz point at time 20-01 10:15) <br /> *4*-Job Flow Rates: i.e. the number of jobs processed by the correspondent component reported as mean value in Hz. Total Jobs->WM: for each point reports the mean rate of jobs enqueued to Workload Manager from both WMproxy and Job Controller (i.e. both Submitted and Resubmitted jobs) since previous measurement (ex. 900 jobs successfully enqueued to Workload Manager from both WMproxy and Job Controller between 2008-20-01 10:00:00 and 2008-20-01 10:15:00 , i.e. 15 min = 900sec, will produce a 1 Hz point at time 20-01 10:15) <br /> Jobs->JC: for each point reports the mean rate of jobs enqueued to Job Controller from Workload Manager since previous measurement (ex. 900 jobs successfully enqueued to Job Controller from Workload Manage between 2008-20-01 10:00:00 and 2008-20-01 10:15:00 , i.e. 15 min = 900sec, will produce a 1 Hz point at time 20-01 10:15) <br /> Jobs->Condor: for each point reports the mean rate of jobs enqueued to Condor from Job Controller since previous measurement (ex. 900 jobs successfully enqueued to Condor from Job Controller between 2008-20-01 10:00:00 and 2008-20-01 10:15:00 , i.e. 15 min = 900sec, will produce a 1 Hz point at time 20-01 10:15) *Daily Statistics*<br /> Three charts are available under the tag "Daily Statistics" reporting info respectively on :<br /> *1*-Jobs Flow: for each day in the selected interval the total number of Jobs processed respectively by each component are reported. <br /> In particular in the chart are reported:<br /> Jobs->WMProxy: for each point reports the total number of jobs successfully submitted to the WMS for correspondent day.<br /> Jobs ->WM: for each point reports the total number of jobs successfully enqueued to Workload Manager from both WMproxy for correspondent day.<br /> Jobs Resub->WM: for each point reports the total number of jobs enqueued to Workload Manager from JobController (i.e. total number of resubmissions) for correspondent day.<br /> *2*-Jobs Final State:for each day in the selected interval the total number of Jobs with final state "Done Successfully" and "Aborted" respectively are reported. *3*-Jobs Flow: for each day in the selected interval the total number of Jobs processed respectively by each component are reported. <br /> In particular in the chart are reported:<br /> Total Jobs->WM: for each point reports the total number of jobs enqueued to Workload Manager from both WMproxy and Job Controller (i.e. both Submitted and Resubmitted jobs) for correspondent day.<br /> Jobs->JC: for each point reports the total number of jobs enqueued to Job Controller from Workload Manager for correspondent day.<br /> Jobs->Condor: for each point reports the total number of jobs enqueued to Condor from Job Controller for correspondent day.<br /> Notice that: - A short automatic help on mouse pointer positioning over main buttons and variables<br /> - Passing the mouse pointer on charts a window will pop up with the correspondent variable values and date of measurement.<br /> - A link to lemon monitoring of the specific instance (both WMS and LB if on separate machines) is available <br /> ---+ User Documentation * UserManual <!-- - Charts usage - ---+ Installation * Requirements ---+ User Requirements * You need the certificate and flash installed on your browser --> 2) data are collected by sensors on wms and lb and sent to the server using a pyhton soap module<br /> --> ---+ Interaction with Related Tools ---+ Future work:<br /> -Packaging for current release deployment<br /> - Implementation of estimates of job latency <br /> - cumulative stats plot reporting on all wms in use by a specific VO<br /> - old RB monitoring<br /> - mail notifications system<br /> -- Main.DaniloDongiovanni - 25 Jan 2008
Edit
|
Attach
|
P
rint version
|
H
istory
:
r4
<
r3
<
r2
<
r1
|
B
acklinks
|
V
iew topic
|
More topic actions...
Topic revision: r2 - 2008-01-30
-
DaniloDongiovanni
Home
Documentation
WebMainPage
WebDetailsPage
WebCustomPlot
WebUsersStat
WebResUsage
WebLoadBal
WebVOStatsPage
WMS Load Balancing Arbiter
DBAnalyzer
Download/Install version 2.1
Download/Install version 3.0
Publications
Screenshots
VideoTour
Credits & Contacts
TWIKI.NET
WMSMonitor
Edit
Attach
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback