Difference: WmsProbe (1 vs. 6)

Revision 62011-11-07 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Nagios WMS probe

Line: 80 to 80
 --timeout-job-waiting Time allowed for a job to stay in Waiting with 'no compatible resources'. (Default: 2700) --hosts <h1,h2,..> Comma-separated list of CE hostnames to run monitor on.
Added:
>
>

Test

To test the probe you have to create a valid proxy.

State + Monit

First you have to "submit" a jdl:

/usr/libexec/grid-monitoring/probes/emi.wms/WMS-probe --vo <vo> -x <path of the proxy> -H <WMS hostname> -m emi.wms.WMS-JobState

[ale@cream-48 ~]$ /usr/libexec/grid-monitoring/probes/emi.wms/WMS-probe --vo dteam -x /tmp/x509up_u501 -H cream-45.pd.infn.it -m emi.wms.WMS-JobState
OK:
OK:
Testing from: cream-48.pd.infn.it
DN: /C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Alessio Gianelle/CN=proxy
VOMS FQANs: /dteam/Role=NULL/Capability=NULL, /dteam/NGI_IT/Role=NULL/Capability=NULL


Connecting to the service https://cream-45.pd.infn.it:7443/glite_wms_wmproxy_server


====================== glite-wms-job-submit Success ======================

The job has been successfully submitted to the WMProxy
Your job identifier is:

https://cream-45.pd.infn.it:9000/o9ELufQPDtyIxkJ0_YGuKQ

The job identifier has been saved in the following file:
/var/lib/gridprobes/dteam/emi.wms/WMS/cream-45.pd.infn.it/jobID

==========================================================================

Then you can "monitor" the job:

/usr/libexec/grid-monitoring/probes/emi.wms/WMS-probe --vo <vo> -x <path of the proxy> -H <WMS hostname> -m emi.wms.WMS-JobMonit --pass-check-dest active

[ale@cream-48 ~]$ /usr/libexec/grid-monitoring/probes/emi.wms/WMS-probe --vo dteam -x /tmp/x509up_u501 -H cream-45.pd.infn.it -m emi.wms.WMS-JobMonit --pass-check-dest active
metric results >>> <cream-45.pd.infn.it,emi.wms.WMS-JobState-dteam>
OK: [Scheduled] https://cream-45.pd.infn.it:9000/o9ELufQPDtyIxkJ0_YGuKQ
OK: [Scheduled] https://cream-45.pd.infn.it:9000/o9ELufQPDtyIxkJ0_YGuKQ
glite-wms-job-status https://cream-45.pd.infn.it:9000/o9ELufQPDtyIxkJ0_YGuKQ


======================= glite-wms-job-status Success =====================
BOOKKEEPING INFORMATION:

Status info for the Job : https://cream-45.pd.infn.it:9000/o9ELufQPDtyIxkJ0_YGuKQ
Current Status:     Scheduled
Status Reason:      unavailable
Destination:        cccreamceli09.in2p3.fr:8443/cream-sge-long
Submitted:          Mon Nov  7 16:38:01 2011 CET
==========================================================================
OK: Jobs processed - 1
OK: Jobs processed - 1
[Scheduled] : 1|jobs_processed=1;; DONE=0;; RUNNING=0;; SCHEDULED=1;; SUBMITTED=0;; READY=0;; WAITING=0;; ABORTED=0;; CANCELLED=0;; CLEARED=0;; MISSED=0;; UNDETERMINED=0;; unknown=0;1;2

If you execute again the emi.wms.WMS-JobState metrics you can have as output the last status see by the metrics emi.wms.WMS-JobMonit

[ale@cream-48 ~]$ /usr/libexec/grid-monitoring/probes/emi.wms/WMS-probe --vo dteam -x /tmp/x509up_u501 -H cream-45.pd.infn.it -m emi.wms.WMS-JobState 
OK: Active job - Scheduled [2011-11-07T15:38:28Z]
OK: Active job - Scheduled [2011-11-07T15:38:28Z]
Testing from: cream-48.pd.infn.it
DN: /C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Alessio Gianelle/CN=proxy
VOMS FQANs: /dteam/Role=NULL/Capability=NULL, /dteam/NGI_IT/Role=NULL/Capability=NULL
Active job - Scheduled [2011-11-07T15:38:28Z]

At the end when job finished the execution of the emi.wms.WMS-JobMonit metrics should trigger also a get_output

[ale@cream-48 ~]$ /usr/libexec/grid-monitoring/probes/emi.wms/WMS-probe --vo dteam -x /tmp/x509up_u501 -H cream-45.pd.infn.it -m emi.wms.WMS-JobMonit --pass-check-dest active
metric results >>> <cream-45.pd.infn.it,emi.wms.WMS-JobSubmit-dteam>
OK: success.

glite-wms-job-output --noint --nosubdir --dir /var/lib/gridprobes/dteam/emi.wms/WMS/cream-45.pd.infn.it/jobOutput https://cream-45.pd.infn.it:9000/o9ELufQPDtyIxkJ0_YGuKQ 2>&1

Connecting to the service https://cream-45.pd.infn.it:7443/glite_wms_wmproxy_server


Warning - option --nosubdir specified: 
output files with same name will be overridden


Warning - Directory already exists: 
/var/lib/gridprobes/dteam/emi.wms/WMS/cream-45.pd.infn.it/jobOutput


================================================================================

         JOB GET OUTPUT OUTCOME

Output sandbox files for the job:
https://cream-45.pd.infn.it:9000/o9ELufQPDtyIxkJ0_YGuKQ
have been successfully retrieved and stored in the directory:
/var/lib/gridprobes/dteam/emi.wms/WMS/cream-45.pd.infn.it/jobOutput

================================================================================
metric results >>> <cream-45.pd.infn.it,emi.wms.WMS-JobState-dteam>
OK: success.

OK: Jobs processed - 1
OK: Jobs processed - 1
Done : 1|jobs_processed=1;; DONE=1;; RUNNING=0;; SCHEDULED=0;; SUBMITTED=0;; READY=0;; WAITING=0;; ABORTED=0;; CANCELLED=0;; CLEARED=0;; MISSED=0;; UNDETERMINED=0;; unknown=0;1;2

You can verify that it works correctly checking the status of the job.

[ale@cream-48 ~]$ glite-wms-job-status https://cream-45.pd.infn.it:9000/o9ELufQPDtyIxkJ0_YGuKQ

======================= glite-wms-job-status Success =====================
BOOKKEEPING INFORMATION:

Status info for the Job : https://cream-45.pd.infn.it:9000/o9ELufQPDtyIxkJ0_YGuKQ
Current Status:     Cleared 
Status Reason:      user retrieved output sandbox
Destination:        cccreamceli09.in2p3.fr:8443/cream-sge-long
Submitted:          Mon Nov  7 16:38:01 2011 CET
==========================================================================

State + Monit + Cancel

First you have to "submit" a jdl:

/usr/libexec/grid-monitoring/probes/emi.wms/WMS-probe --vo <vo> -x <path of the proxy> -H <WMS hostname> -m emi.wms.WMS-JobState

[ale@cream-48 ~]$ /usr/libexec/grid-monitoring/probes/emi.wms/WMS-probe --vo dteam -x /tmp/x509up_u501 -H cream-45.pd.infn.it -m emi.wms.WMS-JobState OK: 
OK: 
Testing from: cream-48.pd.infn.it
DN: /C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Alessio Gianelle/CN=proxy
VOMS FQANs: /dteam/Role=NULL/Capability=NULL, /dteam/NGI_IT/Role=NULL/Capability=NULL


Connecting to the service https://cream-45.pd.infn.it:7443/glite_wms_wmproxy_server


====================== glite-wms-job-submit Success ======================

The job has been successfully submitted to the WMProxy
Your job identifier is:

https://cream-45.pd.infn.it:9000/apDAgqvkXls1HNLfRxPO0A

The job identifier has been saved in the following file:
/var/lib/gridprobes/dteam/emi.wms/WMS/cream-45.pd.infn.it/jobID

==========================================================================

As before you can "monitor" the job:

/usr/libexec/grid-monitoring/probes/emi.wms/WMS-probe --vo <vo> -x <path of the proxy> -H <WMS hostname> -m emi.wms.WMS-JobMonit --pass-check-dest active

[ale@cream-48 ~]$ /usr/libexec/grid-monitoring/probes/emi.wms/WMS-probe --vo dteam -x /tmp/x509up_u501 -H cream-45.pd.infn.it -m emi.wms.WMS-JobMonit --pass-check-dest active
metric results >>> <cream-45.pd.infn.it,emi.wms.WMS-JobState-dteam>
OK: [Scheduled] https://cream-45.pd.infn.it:9000/apDAgqvkXls1HNLfRxPO0A
OK: [Scheduled] https://cream-45.pd.infn.it:9000/apDAgqvkXls1HNLfRxPO0A
glite-wms-job-status https://cream-45.pd.infn.it:9000/apDAgqvkXls1HNLfRxPO0A


======================= glite-wms-job-status Success =====================
BOOKKEEPING INFORMATION:

Status info for the Job : https://cream-45.pd.infn.it:9000/apDAgqvkXls1HNLfRxPO0A
Current Status:     Scheduled 
Status Reason:      unavailable
Destination:        t2-ce-01.lnl.infn.it:8443/cream-lsf-cert1
Submitted:          Mon Nov  7 16:51:29 2011 CET
==========================================================================
OK: Jobs processed - 1
OK: Jobs processed - 1
[Scheduled] : 1|jobs_processed=1;; DONE=0;; RUNNING=0;; SCHEDULED=1;; SUBMITTED=0;; READY=0;; WAITING=0;; ABORTED=0;; CANCELLED=0;; CLEARED=0;; MISSED=0;; UNDETERMINED=0;; unknown=0;1;2

Before it finishes you can "cancel" it:

/usr/libexec/grid-monitoring/probes/emi.wms/WMS-probe --vo <vo> -x <path of the proxy> -H <WMS hostname> -m emi.wms.WMS-JobCancel

[ale@cream-48 ~]$ /usr/libexec/grid-monitoring/probes/emi.wms/WMS-probe --vo dteam -x /tmp/x509up_u501 -H cream-45.pd.infn.it -m emi.wms.WMS-JobCancel
OK: job cancelled
OK: job cancelled
Testing from: cream-48.pd.infn.it
DN: /C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Alessio Gianelle/CN=proxy
VOMS FQANs: /dteam/Role=NULL/Capability=NULL, /dteam/NGI_IT/Role=NULL/Capability=NULL
Job cancellation request sent:
glite-wms-job-cancel --noint  -i /var/lib/gridprobes/dteam/emi.wms/WMS/cream-45.pd.infn.it/jobID
Job bookkeeping files deleted.
[ale@cream-48 ~]$ /usr/libexec/grid-monitoring/probes/emi.wms/WMS-probe --vo dteam -x /tmp/x509up_u501 -H cream-45.pd.infn.it -m emi.wms.WMS-JobMonit --pass-check-dest active
OK: no active jobs [2011-11-07T16:51:53Z]
OK: no active jobs [2011-11-07T16:51:53Z]|jobs_processed=0;;

You can verify that it works correctly checking the status of the job.

[ale@cream-48 ~]$ glite-wms-job-status https://cream-45.pd.infn.it:9000/apDAgqvkXls1HNLfRxPO0A


======================= glite-wms-job-status Success =====================
BOOKKEEPING INFORMATION:

Status info for the Job : https://cream-45.pd.infn.it:9000/apDAgqvkXls1HNLfRxPO0A
Current Status:     Cancelled 
Destination:        t2-ce-01.lnl.infn.it:8443/cream-lsf-cert1
Submitted:          Mon Nov  7 16:51:29 2011 CET
==========================================================================

Revision 52011-11-07 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Nagios WMS probe

Line: 9 to 9
 

Installation

This probe need to be installed on a WMS User Interface because it use wms-cli commands to monitor WMS.
Added:
>
>

Dependencies

python >= 2.4
python-ldap  
openssl >= 0.9.8e-12
nagios-submit-conf >= 0.2
python-GridMon >= 1.1.10

About the last two rpms they can be install using this repository:

[egi-sam]
name=EGI SAM repo
baseurl=http://repository.egi.eu/sw/production/sam/1/$basearch
enabled=1
gpgcheck=0
protect=1
priority=10
 

WMS Metrics

Revision 42011-11-04 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Nagios WMS probe

Changed:
<
<
Test WMS service with job submission to CEs.It is based on the python-GridMon library developed by the SAM team. Details about the command line parameters can be found here.
>
>
Test WMS service with job submission to CEs.It is based on the python-GridMon library developed by the SAM team. Details about the command line parameters can be found here.
 
Added:
>
>

Installation

This probe need to be installed on a WMS User Interface because it use wms-cli commands to monitor WMS.

 

WMS Metrics

  1. emi.wms.WMS-JobState. Submits grid job to CE(s) via WMS under test. Accepts passive check updates from emi.wms.WMS-JobMonit.

Revision 32011-09-26 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Nagios WMS probe

Line: 9 to 9
 
  1. emi.wms.WMS-JobState. Submits grid job to CE(s) via WMS under test. Accepts passive check updates from emi.wms.WMS-JobMonit.
  2. emi.wms.WMS-JobMonit. Monitors submitted grid jobs.
Added:
>
>
  1. emi.wms.WMS-JobCancel. Cancel grid job.
 
  1. emi.wms.WMS-JobSubmit. Passive check. Holds terminal status of job submission.

emi.wms.WMS-JobState

Revision 22011-09-26 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="WebHome"
Deleted:
<
<
 

Nagios WMS probe

Added:
>
>
Test WMS service with job submission to CEs.It is based on the python-GridMon library developed by the SAM team. Details about the command line parameters can be found here.

WMS Metrics

  1. emi.wms.WMS-JobState. Submits grid job to CE(s) via WMS under test. Accepts passive check updates from emi.wms.WMS-JobMonit.
  2. emi.wms.WMS-JobMonit. Monitors submitted grid jobs.
  3. emi.wms.WMS-JobSubmit. Passive check. Holds terminal status of job submission.

emi.wms.WMS-JobState

This metric is used to submit jobs through the WMS under test. It accepts these parameter:

--jdl-templ <file>              JDL template file (full path). 
                                Default: <emi.wms.ProbesLocation>/WMS-jdl.template
--jdl-retrycount <val>          JDL RetryCount (Default: 0).
--jdl-shallowretrycount <val>   JDL ShallowRetryCount (Default: 1).
--ces-file <file>               File with list of CEs. Two schemes [file:] or http: 
                                (Default: /var/lib/gridprobes/<vo>/GoodCEs)
--prev-status <0-3>             Previous Nagios status of the metric.

GoodCEs

If exists the file "GoodCEs" all CEs from the file are OR'ed in the resulting Requirements ClassAdd. Eg.

Requirements = (other.GlueCEInfoHostName == "ce106.cern.ch") || (other.GlueCEInfoHostName == "creamce.gina.sara.nl")

This file can been automatically popolated using the script gather_healthy_nodes from the hr.srce grid-monitoring probes

Otherwise a very general requirement: Requirements = true is used for match-making.

JDL template

This is the default jdl template used for submission:

Type="Job";
JobType="Normal";
Executable = "<jdlExecutable>";
StdError = "gridjob.out";
StdOutput = "gridjob.out";
OutputSandbox = {"gridjob.out"};
RetryCount = <jdlRetryCount>;
ShallowRetryCount = <jdlShallowRetryCount>;
Requirements = <jdlReqCEInfoHostName>;

At the moment the parameter jdlExecutable is hardcoded, and the script /bin/hostname is used.

emi.wms.WMS-JobMonit

 
Deleted:
<
<
-- AlessioGianelle - 2011-09-16
 \ No newline at end of file
Added:
>
>
Monitors submitted grid jobs. Threaded implementation with one thread per monitored resource with max 10 threads. Passively updates emi.wms.WMS-JobState with the latest state of the job according to WMS when job is not in a terminal state. When job enters terminal state or was canceled the metric updates both emi.wms.WMS-JobState and org.sam.WMS-JobSubmit with the final job status. The latter metrics are updated (as passive checks) either via Nagios command file or NSCA. emi.wms.WMS-JobSubmit is the metric which goes to Metric Store Database. It accepts these parameters:
--timeout-job-global <sec>   Global timeout for jobs. Job will be cancelled and dropped 
                             if it is not in terminal state by that time. (Default: 3600)
--timeout-job-waiting <sec>  Time allowed for a job to stay in Waiting with 'no compatible 
                             resources'. (Default: 2700)
--hosts <h1,h2,..>           Comma-separated list of CE hostnames to run monitor on.

Revision 12011-09-16 - AlessioGianelle

Line: 1 to 1
Added:
>
>
META TOPICPARENT name="WebHome"

Nagios WMS probe

-- AlessioGianelle - 2011-09-16

 
This site is powered by the TWiki collaboration platformCopyright © 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback