CREAM-CE direct job submission metrics
These metrics are used to probe cream-ce using cream-cli commands.
- emi.cream.CREAMCEDJS-DirectJobState. Direct job submission to CREAM-CE.
- emi.cream.CREAMCEDJS-DirectJobMonit. Babysit submitted grid jobs.
- emi.cream.CREAMCEDJS-DelegateProxy. Delegate proxy to CREAM CE
- emi.cream.CREAMCEDJS-DirectJobCancel. Cancel active job.
- emi.cream.CREAMCEDJS-ServiceInfo. Get CREAM CE service info
- emi.cream.CREAMCEDJS-SubmitAllowed. Check if submission to the CREAM CE is allowed
- emi.cream.CREAMCEDJS-DirectJobSubmit. Passive. Final status of direct job submission to CREAM CE
emi.cream.CREAMCEDJS-DirectJobState
Direct submission to a CREAM-CE, which can be choosen using these parameters:
--resource <URI> |
CREAM CE to send job to. Format : <host>[:<port>]/cream-<lrms-system-name>-<queue-name> If not given - resource discovery will be performed. |
--ldap-uri <URI> |
Format [ldap://]hostname[:port[/]] (Default: ldap://sam-bdii.cern.ch:2170) |
--prev-status <0-3> |
Previous Nagios status of the metric. |
As specified if the destination CREAM-CE is not explicited a resuorce discovery will be performed using the given ldap server.
At the moment the template jdl used for submission is very simple:
Type="Job";
JobType="Normal";
Executable = "<jdlExecutable>";
Arguments = "<jdlArguments>";
StdOutput = "cream.out";
StdError = "cream.out";
OutputSandbox = {"cream.out"};
OutputSandboxBaseDestUri="gsiftp://localhost";
where the Executable is the command "/bin/hostname/"
emi.cream.CREAMCEDJS-DirectJobMonit
Monitors submitted grid jobs. Threaded implementation with one thread per monitored resource with max 10 threads. Passively updates emi.cream.CREAMCEDJS-DirectJobState with the latest state of the job according to CREAM when job is not in a terminal state. When job enters terminal state or was canceled the metric updates both emi.cream.CREAMCEDJS-DirectJobState and emi.cream.CREAMCEDJS-DirectJobSubmit with the final job status. The latter metrics are updated (as passive checks) either via Naigos command file or NSCA. emi.cream.CREAMCEDJS-DirectJobSubmit is the metric which goes to Metric Store Database.
Test
To test the probe you have to create a valid proxy.
First you have to "submit" a job:
/usr/libexec/grid-monitoring/probes/emi.cream/CREAMCEDJS-probe --vo <vo> -x <path of the proxy> -H <CREAM hostname> -m emi.cream.CREAMCEDJS-DirectJobState --resource <CREAM CE url>
[ale@cream-48 ~]$ /usr/libexec/grid-monitoring/probes/emi.cream/CREAMCEDJS-probe --vo dteam -x /tmp/x509up_u501 -H cream-30.pd.infn.it -m emi.cream.CREAMCEDJS-DirectJobState --resource cream-30.pd.infn.it:8443/cream-pbs-cert
OK: Job was submitted [https://cream-30.pd.infn.it:8443/CREAM126240562].
OK: Job was submitted [https://cream-30.pd.infn.it:8443/CREAM126240562].
Testing from: cream-48.pd.infn.it
DN: /C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Alessio Gianelle/CN=proxy
VOMS FQANs: /dteam/Role=NULL/Capability=NULL, /dteam/NGI_IT/Role=NULL/Capability=NULL
https://cream-30.pd.infn.it:8443/CREAM126240562
Then you can monitor the job:
/usr/libexec/grid-monitoring/probes/emi.cream/CREAMCEDJS-probe --vo <vo> -x <path of the proxy> -H <CREAM hostname> -m emi.cream.CREAMCEDJS-DirectJobMonit --pass-check-dest active
[ale@cream-48 ~]$ /usr/libexec/grid-monitoring/probes/emi.cream/CREAMCEDJS-probe --vo dteam -x /tmp/x509up_u501 -H cream-30.pd.infn.it -m emi.cream.CREAMCEDJS-DirectJobMonit --pass-check-dest active
OK: DONE.
metric results >>> <cream-30.pd.infn.it,emi.cream.CREAMCEDJS-DirectJobSubmit-dteam>
metric results >>> <cream-30.pd.infn.it,emi.cream.CREAMCEDJS-DirectJobState-dteam>
OK: Jobs processed - 1
OK: Jobs processed - 1
DONE : 1|jobs_processed=1;; DONE=1;; REALLY-RUNNING=0;; RUNNING=0;; REGISTERED=0;; PENDING=0;; IDLE=0;; HELD=0;; CANCELLED=0;; ABORTED=0;; UNKNOWN=0;; MISSED=0;; UNDETERMINED=0;; unknown=0;1;2
When it finishes the output file is retrieve and stored into /var/lib/gridprobes/<VO or FQAN>/emi.cream/CREAMCEDJS/<hostname>/jobOutput. The output file should contains the hostname of the worker node where job have run.
[ale@cream-48 ~]$ cat /var/lib/gridprobes/dteam/emi.cream/CREAMCEDJS/cream-30.pd.infn.it/jobOutput/cream-30.pd.infn.it_8443_CREAM126240562/cream.out
cream-wn-030.pn.pd.infn.it
To test easily the "Cancel" metrics you need to modify the JDL template to increment job duration:
[ale@cream-48 ~]$ cat /usr/libexec/grid-monitoring/probes/emi.cream/CREAMDJS-jdl.template
[
Type="Job";
JobType="Normal";
#Executable = "<jdlExecutable>";
Executable = "/bin/sleep";
#Arguments = "<jdlArguments>";
Arguments = "100";
StdOutput = "cream.out";
StdError = "cream.out";
OutputSandbox = {"cream.out"};
OutputSandboxBaseDestUri="gsiftp://localhost";
]
Then you have to "submit" the job:
/usr/libexec/grid-monitoring/probes/emi.cream/CREAMCEDJS-probe --vo <vo> -x <path of the proxy> -H <CREAM hostname> -m emi.cream.CREAMCEDJS-DirectJobState --resource <CREAM CE url>
[ale@cream-48 ~]$ /usr/libexec/grid-monitoring/probes/emi.cream/CREAMCEDJS-probe --vo dteam -x /tmp/x509up_u501 -H cream-30.pd.infn.it -m emi.cream.CREAMCEDJS-DirectJobState --resource cream-30.pd.infn.it:8443/cream-pbs-cert
OK: Job was submitted [https://cream-30.pd.infn.it:8443/CREAM226348631].
OK: Job was submitted [https://cream-30.pd.infn.it:8443/CREAM226348631].
Testing from: cream-48.pd.infn.it
DN: /C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Alessio Gianelle/CN=proxy
VOMS FQANs: /dteam/Role=NULL/Capability=NULL, /dteam/NGI_IT/Role=NULL/Capability=NULL
https://cream-30.pd.infn.it:8443/CREAM226348631
Monitor the job until it arrives to the
RUNNING state:
/usr/libexec/grid-monitoring/probes/emi.cream/CREAMCEDJS-probe --vo <vo> -x <path of the proxy> -H <CREAM hostname> -m emi.cream.CREAMCEDJS-DirectJobMonit --pass-check-dest active
[ale@cream-48 ~]$ /usr/libexec/grid-monitoring/probes/emi.cream/CREAMCEDJS-probe --vo dteam -x /tmp/x509up_u501 -H cream-30.pd.infn.it -m emi.cream.CREAMCEDJS-DirectJobMonit --pass-check-dest active
metric results >>> <cream-30.pd.infn.it,emi.cream.CREAMCEDJS-DirectJobState-dteam>
OK: [RUNNING] https://cream-30.pd.infn.it:8443/CREAM226348631
OK: [RUNNING] https://cream-30.pd.infn.it:8443/CREAM226348631
glite-ce-job-status https://cream-30.pd.infn.it:8443/CREAM226348631
****** JobID=[https://cream-30.pd.infn.it:8443/CREAM226348631]
Status = [RUNNING]
OK: Jobs processed - 1
OK: Jobs processed - 1
[RUNNING] : 1|jobs_processed=1;; DONE=0;; REALLY-RUNNING=0;; RUNNING=1;; REGISTERED=0;; PENDING=0;; IDLE=0;; HELD=0;; CANCELLED=0;; ABORTED=0;; UNKNOWN=0;; MISSED=0;; UNDETERMINED=0;; unknown=0;1;2
Then you can cancel it:
/usr/libexec/grid-monitoring/probes/emi.cream/CREAMCEDJS-probe --vo <vo> -x <path of the proxy> -H <CREAM hostname> -m emi.cream.CREAMCEDJS-DirectJobCancel
[ale@cream-48 ~]$ /usr/libexec/grid-monitoring/probes/emi.cream/CREAMCEDJS-probe --vo dteam -x /tmp/x509up_u501 -H cream-30.pd.infn.it -m emi.cream.CREAMCEDJS-DirectJobCancel
OK: job cancelled
OK: job cancelled
Testing from: cream-48.pd.infn.it
DN: /C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Alessio Gianelle/CN=proxy
VOMS FQANs: /dteam/Role=NULL/Capability=NULL, /dteam/NGI_IT/Role=NULL/Capability=NULL
Job cancellation request sent:
glite-ce-job-cancel --noint https://cream-30.pd.infn.it:8443/CREAM226348631
Job bookkeeping files deleted.
You can check the manually if the final status of the job is
CANCELLED as expected:
[ale@cream-48 ~]$ glite-ce-job-status https://cream-30.pd.infn.it:8443/CREAM226348631
****** JobID=[https://cream-30.pd.infn.it:8443/CREAM226348631]
Status = [CANCELLED]
ExitCode = []
Description = [Cancelled by user]
To test delegation use this command:
/usr/libexec/grid-monitoring/probes/emi.cream/CREAMCEDJS-probe --vo <vo> -x <path of the proxy> -H <CREAM hostname> -m emi.cream.CREAMCEDJS-DelegateProxy
[ale@cream-48 ~]$ /usr/libexec/grid-monitoring/probes/emi.cream/CREAMCEDJS-probe --vo dteam -x /tmp/x509up_u501 -H cream-30.pd.infn.it -m emi.cream.CREAMCEDJS-DelegateProxy
OK: [Delegated]
OK: [Delegated]
glite-ce-delegate-proxy -e cream-30.pd.infn.it:8443 dteam-551a6
2011-11-10 13:40:24,178 NOTICE - Proxy with delegation id [dteam-551a6] succesfully delegated to endpoint [https://cream-30.pd.infn.it:8443//ce-cream/services/gridsite-delegation]
You can check if the delegation is correct submitting a job using the returned delegation id
[ale@cream-48 ~]$ glite-ce-job-submit -D dteam-551a6 -r cream-30.pd.infn.it:8443/cream-pbs-cert test.jdl
https://cream-30.pd.infn.it:8443/CREAM290551353
[ale@cream-48 ~]$ glite-ce-job-status https://cream-30.pd.infn.it:8443/CREAM290551353
****** JobID=[https://cream-30.pd.infn.it:8443/CREAM290551353]
Status = [DONE-OK]
ExitCode = [0]
To verify this metric simply issue this command:
/usr/libexec/grid-monitoring/probes/emi.cream/CREAMCEDJS-probe --vo <vo> -x <path of the proxy> -H <CREAM hostname> -m emi.cream.CREAMCEDJS-ServiceInfo
[ale@cream-48 ~]$ /usr/libexec/grid-monitoring/probes/emi.cream/CREAMCEDJS-probe --vo dteam -x /tmp/x509up_u501 -H cream-30.pd.infn.it -m emi.cream.CREAMCEDJS-ServiceInfo
OK: success
OK: success
Testing from: cream-48.pd.infn.it
DN: /C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Alessio Gianelle/CN=proxy
VOMS FQANs: /dteam/Role=NULL/Capability=NULL, /dteam/NGI_IT/Role=NULL/Capability=NULL
success
description = CREAM 2
doesAcceptNewJobSubmissions = True
interfaceVersion = 2.1
property = [(Property){
name = "cemon_url"
value = "NA"
}]
serviceVersion = 1.13
startupTime = 2011-10-10 14:44:12.000638
status = RUNNING
To verify this metric simply issue this command:
/usr/libexec/grid-monitoring/probes/emi.cream/CREAMCEDJS-probe --vo <vo> -x <path of the proxy> -H <CREAM hostname> -m emi.cream.CREAMCEDJS-SubmitAllowed
[ale@cream-48 ~]$ /usr/libexec/grid-monitoring/probes/emi.cream/CREAMCEDJS-probe --vo dteam -x /tmp/x509up_u501 -H cream-30.pd.infn.it -m emi.cream.CREAMCEDJS-SubmitAllowed
OK: [Submission Allowed]
OK: [Submission Allowed]
glite-ce-allowed-submission cream-30.pd.infn.it:8443
Job Submission to this CREAM CE is enabled
You can also disable submission to the CREAM CE (you MUST be an
admin for this CE)
[ale@cream-48 ~]$ glite-ce-disable-submission cream-30.pd.infn.it
Operation for disabling new submissions succeeded
Then verify that the metrics returns the correct message:
[ale@cream-48 ~]$ /usr/libexec/grid-monitoring/probes/emi.cream/CREAMCEDJS-probe --vo dteam -x /tmp/x509up_u501 -H cream-30.pd.infn.it -m emi.cream.CREAMCEDJS-SubmitAllowed
OK: [Submission Allowed]
OK: [Submission Allowed]
glite-ce-allowed-submission cream-30.pd.infn.it:8443
Job Submission to this CREAM CE is disabled