Line: 1 to 1 | |||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Overview | |||||||||||||||||||||||||
Changed: | |||||||||||||||||||||||||
< < | WMSMON Main Page
Open the CNAF instance at https://cert-wms-01:8443/wmsmon/main/main.php Notice that you need a certificate in your browser in order to access it. | ||||||||||||||||||||||||
> > | WMSMON Main Page
Open the CNAF instance at https://cert-wms-01:8443/wmsmon/main/main.php Notice that you need a certificate in your browser in order to access it. | ||||||||||||||||||||||||
Once in the main page you will see a table with 1 row for each WMS instance monitored with a sintethic overview of instance status across columns Also a filter to find VO dedicated WMS instances is provided. A short automatic help on mouse pointer positioning over main buttons and variables | |||||||||||||||||||||||||
Changed: | |||||||||||||||||||||||||
< < | The following columns are reported: | ||||||||||||||||||||||||
> > | The following columns are reported: | ||||||||||||||||||||||||
| |||||||||||||||||||||||||
Changed: | |||||||||||||||||||||||||
< < |
Clicking on WMS hostname or one of the STATUS flags you will access the relative detailed data page, described below. | ||||||||||||||||||||||||
> > |
| ||||||||||||||||||||||||
Deleted: | |||||||||||||||||||||||||
< < | Specific WMSLB Instance Detailed data page | ||||||||||||||||||||||||
Here, more info are made available on the specific WMS instance. | |||||||||||||||||||||||||
Changed: | |||||||||||||||||||||||||
< < | Two kinds of data are collected: - WMSLB service and HW status variables (such as daemons status, condor jobs status statistics, File descriptors opened by main processes..) for which the value at the time of measurement is shown - Mean job flow rates between wms components (WMproxy, Workload Manager, Job Controller, Condor) across a given time interval reported as well (across past 15 min is the default) | ||||||||||||||||||||||||
> > | You can go back to the main page clicking on the wmsmon log or on the correspondent part of the navigation bar
17<EX. _moz-userdefined="" details::cert-rb-01.cnaf.infn.it*="" >>="" main="" *wmsmonitor="">18
Before describing the available info in detail, notice that two kinds of data are collected: - WMSLB service and HW status variables (such as daemons status, condor jobs status statistics, File descriptors opened by main processes..) for which the value at the time of measurement is shown. These data are collected by the mean of a client application in python running on the WMS and LB instance. - Mean job flow rates between wms components (WMproxy, Workload Manager, Job Controller, Condor) across a given time interval reported as well. Time interval is past 15 min by default. These data are collected by the mean of a client application in python running a mysql query on specific LB instance. Components Details_ BOX In this BOX info about each single WMSLB component are presented. Here again, a short automatic help on mouse pointer positioning over main buttons and variables. A flag (Green=OK / Red=Error) beside each component's label reports the correspondent daemon status. Components Details from t1 to t2 (Reports the exact time interval in the LB query used to collect data) WM Proxy [WMproxy daemon status] Jobs -> WMProxy : number of jobs submitted within reported time interval Collections submitted : number of collections of jobs submitted within reported time interval Mean nodes per coll. : mean number of nodes per collection within reported time interval Std nodes per coll.: standard deviation from the mean of the number of nodes per collection in reported time interval Proxy Reneval [PX daemon status] Workload Manager [WM daemon status] WM file descriptors: Number of file descriptors opened by process Work Load Manager at t2 time WM queue: Number of entries in input.fl at t2 time Jobs -> WM: Number of jobs enqueued to Workload Manager from WMproxy within reported time interval Jobs Resub -> WM: Number of jobs enqueued to Workload Manager from JobController, i.e. resubmitted after failure VO Views: Number of VO views available for the WMS Match Making. This is either parsed in the workload_manager.log looking for the number of VO Views used in last Match making from workload Manager or ( if last MM is older than 1 hour) as the number of entries in ismdump.fl. None is returned in case none of the two measures above is successful. Log Monitor [LM daemon status] | ||||||||||||||||||||||||
Changed: | |||||||||||||||||||||||||
< < | Notice that:
- A short automatic help on mouse pointer positioning over main buttons and variables - Passing the mouse pointer on charts a window will pop up with the correspondent variable values and date of measurement. - A link to lemon monitoring of the specific instance (both WMS and LB if on separate machines) is available | ||||||||||||||||||||||||
> > | LM file descriptors:Number of file descriptors opened by process Log Monitor at t2 time | ||||||||||||||||||||||||
Added: | |||||||||||||||||||||||||
> > | Job Controller [JC daemon status]
JC queue: Number of entries in queue.fl at t2 time JC file descriptors: Number of file descriptors opened by process Job Controller at t2 time Jobs -> JC : Number of jobs enqueued to Job Controller from Workload Manager within reported time interval Jobs JC -> Condor: Number of jobs enqueued to Condor from Job Controller within reported time interval Local Logger [LL daemon status] LB events queue: Number of dg20logd_* files in directory /var/tmp t2 time. These are events not yet stored in the LB server DB, hence not available for LB queries which maybe affected by this. LL file descriptor: Number of file descriptors opened by process Job Controller at t2 time LB Proxy [LBPX daemon status] Tranfers [FTPD daemon status] gftp: number of gftp sessions opened at t2 time CHARTS BOX | ||||||||||||||||||||||||
Deleted: | |||||||||||||||||||||||||
< < | *UNDERSTANDING CHARTS* | ||||||||||||||||||||||||
For most relevant variables, charts with recent history (whose time interval can be selected) are made available | |||||||||||||||||||||||||
Changed: | |||||||||||||||||||||||||
< < | *Components Report History* | ||||||||||||||||||||||||
> > | Notice that passing the mouse pointer on charts a window will pop up with the correspondent variable values and date of measurement. Two tags of charts are available: the 'Components Report History' and the 'Daily Statistics' one. Components Report History | ||||||||||||||||||||||||
Four charts are available under the tag "Components Report History" reporting info respectively on : | |||||||||||||||||||||||||
Added: | |||||||||||||||||||||||||
> > |
| ||||||||||||||||||||||||
Deleted: | |||||||||||||||||||||||||
< < | 1 -Condor Statistics: i.e. the number of jobs in the status "Running", "Idle" 2 Component Queues: i.e. the number of jobs enqueued to and waiting to be processed correspondent wmslb component In particular the number of jobs enqueued to the Workload Manager and Job Controller are reported. Also the number of events still to be transferred/processed by the LB (and therefore not yet accounted for in LB queries from users or wmsmon itself), is reported as "LB events queue" *3*-Job Flow Rates: i.e. the number of jobs processed by the correspondent component reported as mean value in Hz. Jobs->WMProxy: for each point reports the mean job submission rate since previous measurement (ex. 900 jobs successfully submitted to the WMS between 2008-20-01 10:00:00 and 2008-20-01 10:15:00 , i.e. 15 min = 900sec, will produce a 1 Hz point at time 20-01 10:15) Jobs->WM: for each point reports the mean rate of jobs enqueued to Workload Manager from WMproxy since previous measurement (ex. 900 jobs successfully enqueued to Workload Manager from WMproxy between 2008-20-01 10:00:00 and 2008-20-01 10:15:00 , i.e. 15 min = 900sec, will produce a 1 Hz point at time 20-01 10:15) Jobs Resub->WM: for each point reports the mean rate of jobs enqueued to Workload Manager from JobController, i.e. resubmitted after failure, since previous measurement (ex. 900 jobs successfully enqueued to Workload Manager from JobController between 2008-20-01 10:00:00 and 2008-20-01 10:15:00 , i.e. 15 min = 900sec, will produce a 1 Hz point at time 20-01 10:15) *4*-Job Flow Rates: i.e. the number of jobs processed by the correspondent component reported as mean value in Hz. Total Jobs->WM: for each point reports the mean rate of jobs enqueued to Workload Manager from both WMproxy and Job Controller (i.e. both Submitted and Resubmitted jobs) since previous measurement (ex. 900 jobs successfully enqueued to Workload Manager from both WMproxy and Job Controller between 2008-20-01 10:00:00 and 2008-20-01 10:15:00 , i.e. 15 min = 900sec, will produce a 1 Hz point at time 20-01 10:15) Jobs->JC: for each point reports the mean rate of jobs enqueued to Job Controller from Workload Manager since previous measurement (ex. 900 jobs successfully enqueued to Job Controller from Workload Manage between 2008-20-01 10:00:00 and 2008-20-01 10:15:00 , i.e. 15 min = 900sec, will produce a 1 Hz point at time 20-01 10:15) Jobs->Condor: for each point reports the mean rate of jobs enqueued to Condor from Job Controller since previous measurement (ex. 900 jobs successfully enqueued to Condor from Job Controller between 2008-20-01 10:00:00 and 2008-20-01 10:15:00 , i.e. 15 min = 900sec, will produce a 1 Hz point at time 20-01 10:15) *Daily Statistics* Three charts are available under the tag "Daily Statistics" reporting info respectively on : *1*-Jobs Flow: for each day in the selected interval the total number of Jobs processed respectively by each component are reported. In particular in the chart are reported: Jobs->WMProxy: for each point reports the total number of jobs successfully submitted to the WMS for correspondent day. Jobs ->WM: for each point reports the total number of jobs successfully enqueued to Workload Manager from both WMproxy for correspondent day. Jobs Resub->WM: for each point reports the total number of jobs enqueued to Workload Manager from JobController (i.e. total number of resubmissions) for correspondent day. *2*-Jobs Final State:for each day in the selected interval the total number of Jobs with final state "Done Successfully" and "Aborted" respectively are reported. *3*-Jobs Flow: for each day in the selected interval the total number of Jobs processed respectively by each component are reported. In particular in the chart are reported: Total Jobs->WM: for each point reports the total number of jobs enqueued to Workload Manager from both WMproxy and Job Controller (i.e. both Submitted and Resubmitted jobs) for correspondent day. Jobs->JC: for each point reports the total number of jobs enqueued to Job Controller from Workload Manager for correspondent day. Jobs->Condor: for each point reports the total number of jobs enqueued to Condor from Job Controller for correspondent day. User Documentation | ||||||||||||||||||||||||
Line: 70 to 115 | |||||||||||||||||||||||||
| |||||||||||||||||||||||||
Changed: | |||||||||||||||||||||||||
< < | -->
Interaction with Related Tools | ||||||||||||||||||||||||
> > | |||||||||||||||||||||||||
Future work:
-Packaging for current release deployment |
Line: 1 to 1 | |||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Overview | |||||||||||||||||||||||||
Changed: | |||||||||||||||||||||||||
< < | Main Page
Open the CNAF instance at https://cert-wms-01:8443/wmsmon/main/main.php Notice that you need a certificate in your browser in order to access it. Once in the main page you will see a table with 1 row for each WMSLB instance monitored with a sintethic overview of instance status across columns Notice that a filter to find VO dedicated WMSLB instances is provided The following columns are reported: WMS: specific WMS instance hostname. Clicking on it you will access the relative detailed data page, described below DATE: date/time of last measurement RUNNING JOBS: IDLE JOBS WM QUEUE JC QUEUE VO VIEWS LB EVENTS QUEUE CPU LOAD SANDBOX PARTITION GENERAL STATUS DAEMONS STATUS | ||||||||||||||||||||||||
> > | WMSMON Main Page
Open the CNAF instance at https://cert-wms-01:8443/wmsmon/main/main.php Notice that you need a certificate in your browser in order to access it. Once in the main page you will see a table with 1 row for each WMS instance monitored with a sintethic overview of instance status across columns Also a filter to find VO dedicated WMS instances is provided. A short automatic help on mouse pointer positioning over main buttons and variables The following columns are reported:
Clicking on WMS hostname or one of the STATUS flags you will access the relative detailed data page, described below. | ||||||||||||||||||||||||
Specific WMSLB Instance Detailed data page | |||||||||||||||||||||||||
Changed: | |||||||||||||||||||||||||
< < |
;;;;;;;;;;;;
Short Description of the toolWMSMonitor monitors two kinds of variables: | ||||||||||||||||||||||||
> > | Here, more info are made available on the specific WMS instance. Two kinds of data are collected: | ||||||||||||||||||||||||
- WMSLB service and HW status variables (such as daemons status, condor jobs status statistics, File descriptors opened by main processes..) for which the value at the time of measurement is shown - Mean job flow rates between wms components (WMproxy, Workload Manager, Job Controller, Condor) across a given time interval reported as well (across past 15 min is the default) | |||||||||||||||||||||||||
Added: | |||||||||||||||||||||||||
> > | Notice that:
- A short automatic help on mouse pointer positioning over main buttons and variables - Passing the mouse pointer on charts a window will pop up with the correspondent variable values and date of measurement. - A link to lemon monitoring of the specific instance (both WMS and LB if on separate machines) is available | ||||||||||||||||||||||||
*UNDERSTANDING CHARTS* For most relevant variables, charts with recent history (whose time interval can be selected) are made available *Components Report History* | |||||||||||||||||||||||||
Line: 64 to 58 | |||||||||||||||||||||||||
Jobs->JC: for each point reports the total number of jobs enqueued to Job Controller from Workload Manager for correspondent day. Jobs->Condor: for each point reports the total number of jobs enqueued to Condor from Job Controller for correspondent day. | |||||||||||||||||||||||||
Deleted: | |||||||||||||||||||||||||
< < | Notice that:
- A short automatic help on mouse pointer positioning over main buttons and variables - Passing the mouse pointer on charts a window will pop up with the correspondent variable values and date of measurement. - A link to lemon monitoring of the specific instance (both WMS and LB if on separate machines) is available | ||||||||||||||||||||||||
User Documentation |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
| ||||||||
Changed: | ||||||||
< < | Main Features of the toolWMSMonitor monitors two kinds of variables:- Instant values of WMSLB service and HW status - Mean job flow rates between wms components (WMproxy, Workload Manager, Job Controller, Condor) across a given time interval (15 min) 1) data collector and sensor written in python with a mysql db as backend on a dedicated server, web pages written in php 2) data are collected by sensors on wms and lb and sent to the server using a pyhton soap module 3) flow rates are based on data deriving from lb queries and reported as mean values on 15 mins intervals. 4) daily cumulative values are also reported (click on the tab "Daily Statistics" ) 5) short automatic help on mouse pointer positioning over main buttons and variables 6) passing the mouse pointer on charts a window will pop up with the correspondent variable values and date of measurement. 7) link to lemon monitoring of the specific instance | |||||||
> > | OverviewMain Page Open the CNAF instance athttps://cert-wms-01:8443/wmsmon/main/main.php Notice that you need a certificate in your browser in order to access it. Once in the main page you will see a table with 1 row for each WMSLB instance monitored with a sintethic overview of instance status across columns Notice that a filter to find VO dedicated WMSLB instances is provided The following columns are reported: WMS: specific WMS instance hostname. Clicking on it you will access the relative detailed data page, described below DATE: date/time of last measurement RUNNING JOBS: IDLE JOBS WM QUEUE JC QUEUE VO VIEWS LB EVENTS QUEUE CPU LOAD SANDBOX PARTITION GENERAL STATUS DAEMONS STATUS Specific WMSLB Instance Detailed data page ;;;;;;;;;;;; Short Description of the toolWMSMonitor monitors two kinds of variables:- WMSLB service and HW status variables (such as daemons status, condor jobs status statistics, File descriptors opened by main processes..) for which the value at the time of measurement is shown - Mean job flow rates between wms components (WMproxy, Workload Manager, Job Controller, Condor) across a given time interval reported as well (across past 15 min is the default) *UNDERSTANDING CHARTS* For most relevant variables, charts with recent history (whose time interval can be selected) are made available *Components Report History* Four charts are available under the tag "Components Report History" reporting info respectively on : 1 -Condor Statistics: i.e. the number of jobs in the status "Running", "Idle" 2 Component Queues: i.e. the number of jobs enqueued to and waiting to be processed correspondent wmslb component In particular the number of jobs enqueued to the Workload Manager and Job Controller are reported. Also the number of events still to be transferred/processed by the LB (and therefore not yet accounted for in LB queries from users or wmsmon itself), is reported as "LB events queue" *3*-Job Flow Rates: i.e. the number of jobs processed by the correspondent component reported as mean value in Hz. Jobs->WMProxy: for each point reports the mean job submission rate since previous measurement (ex. 900 jobs successfully submitted to the WMS between 2008-20-01 10:00:00 and 2008-20-01 10:15:00 , i.e. 15 min = 900sec, will produce a 1 Hz point at time 20-01 10:15) Jobs->WM: for each point reports the mean rate of jobs enqueued to Workload Manager from WMproxy since previous measurement (ex. 900 jobs successfully enqueued to Workload Manager from WMproxy between 2008-20-01 10:00:00 and 2008-20-01 10:15:00 , i.e. 15 min = 900sec, will produce a 1 Hz point at time 20-01 10:15) Jobs Resub->WM: for each point reports the mean rate of jobs enqueued to Workload Manager from JobController, i.e. resubmitted after failure, since previous measurement (ex. 900 jobs successfully enqueued to Workload Manager from JobController between 2008-20-01 10:00:00 and 2008-20-01 10:15:00 , i.e. 15 min = 900sec, will produce a 1 Hz point at time 20-01 10:15) *4*-Job Flow Rates: i.e. the number of jobs processed by the correspondent component reported as mean value in Hz. Total Jobs->WM: for each point reports the mean rate of jobs enqueued to Workload Manager from both WMproxy and Job Controller (i.e. both Submitted and Resubmitted jobs) since previous measurement (ex. 900 jobs successfully enqueued to Workload Manager from both WMproxy and Job Controller between 2008-20-01 10:00:00 and 2008-20-01 10:15:00 , i.e. 15 min = 900sec, will produce a 1 Hz point at time 20-01 10:15) Jobs->JC: for each point reports the mean rate of jobs enqueued to Job Controller from Workload Manager since previous measurement (ex. 900 jobs successfully enqueued to Job Controller from Workload Manage between 2008-20-01 10:00:00 and 2008-20-01 10:15:00 , i.e. 15 min = 900sec, will produce a 1 Hz point at time 20-01 10:15) Jobs->Condor: for each point reports the mean rate of jobs enqueued to Condor from Job Controller since previous measurement (ex. 900 jobs successfully enqueued to Condor from Job Controller between 2008-20-01 10:00:00 and 2008-20-01 10:15:00 , i.e. 15 min = 900sec, will produce a 1 Hz point at time 20-01 10:15) *Daily Statistics* Three charts are available under the tag "Daily Statistics" reporting info respectively on : *1*-Jobs Flow: for each day in the selected interval the total number of Jobs processed respectively by each component are reported. In particular in the chart are reported: Jobs->WMProxy: for each point reports the total number of jobs successfully submitted to the WMS for correspondent day. Jobs ->WM: for each point reports the total number of jobs successfully enqueued to Workload Manager from both WMproxy for correspondent day. Jobs Resub->WM: for each point reports the total number of jobs enqueued to Workload Manager from JobController (i.e. total number of resubmissions) for correspondent day. *2*-Jobs Final State:for each day in the selected interval the total number of Jobs with final state "Done Successfully" and "Aborted" respectively are reported. *3*-Jobs Flow: for each day in the selected interval the total number of Jobs processed respectively by each component are reported. In particular in the chart are reported: Total Jobs->WM: for each point reports the total number of jobs enqueued to Workload Manager from both WMproxy and Job Controller (i.e. both Submitted and Resubmitted jobs) for correspondent day. Jobs->JC: for each point reports the total number of jobs enqueued to Job Controller from Workload Manager for correspondent day. Jobs->Condor: for each point reports the total number of jobs enqueued to Condor from Job Controller for correspondent day. Notice that: - A short automatic help on mouse pointer positioning over main buttons and variables - Passing the mouse pointer on charts a window will pop up with the correspondent variable values and date of measurement. - A link to lemon monitoring of the specific instance (both WMS and LB if on separate machines) is available | |||||||
User Documentation | ||||||||
Added: | ||||||||
> > | 2) data are collected by sensors on wms and lb and sent to the server using a pyhton soap module --> | |||||||
Interaction with Related Tools | ||||||||
Changed: | ||||||||
< < | Future work:
-Packaging for current release deployment | |||||||
> > | Future work:
-Packaging for current release deployment | |||||||
-- DaniloDongiovanni - 25 Jan 2008 |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
Added: | ||||||||
> > |
Main Features of the toolWMSMonitor monitors two kinds of variables:- Instant values of WMSLB service and HW status - Mean job flow rates between wms components (WMproxy, Workload Manager, Job Controller, Condor) across a given time interval (15 min) 1) data collector and sensor written in python with a mysql db as backend on a dedicated server, web pages written in php 2) data are collected by sensors on wms and lb and sent to the server using a pyhton soap module 3) flow rates are based on data deriving from lb queries and reported as mean values on 15 mins intervals. 4) daily cumulative values are also reported (click on the tab "Daily Statistics" ) 5) short automatic help on mouse pointer positioning over main buttons and variables 6) passing the mouse pointer on charts a window will pop up with the correspondent variable values and date of measurement. 7) link to lemon monitoring of the specific instance User Documentation<--
User Requirements
Architecture-->Interaction with Related Tools Future work:
-Packaging for current release deployment |