Test
To test the probe you have to create a valid proxy.
State + Monit
First you have to "submit" a job:
/usr/libexec/grid-monitoring/probes/emi.cream/CREAMCEDJS-probe --vo <vo> -x <path of the proxy> -H <CREAM hostname> -m emi.cream.CREAMCEDJS-DirectJobState --resource <CREAM CE url>
[ale@cream-48 ~]$ /usr/libexec/grid-monitoring/probes/emi.cream/CREAMCEDJS-probe --vo dteam -x /tmp/x509up_u501 -H cream-30.pd.infn.it -m emi.cream.CREAMCEDJS-DirectJobState --resource cream-30.pd.infn.it:8443/cream-pbs-cert
OK: Job was submitted [https://cream-30.pd.infn.it:8443/CREAM126240562].
OK: Job was submitted [https://cream-30.pd.infn.it:8443/CREAM126240562].
Testing from: cream-48.pd.infn.it
DN: /C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Alessio Gianelle/CN=proxy
VOMS FQANs: /dteam/Role=NULL/Capability=NULL, /dteam/NGI_IT/Role=NULL/Capability=NULL
https://cream-30.pd.infn.it:8443/CREAM126240562
Then you can monitor the job:
/usr/libexec/grid-monitoring/probes/emi.cream/CREAMCEDJS-probe --vo <vo> -x <path of the proxy> -H <CREAM hostname> -m emi.cream.CREAMCEDJS-DirectJobMonit --pass-check-dest active
[ale@cream-48 ~]$ /usr/libexec/grid-monitoring/probes/emi.cream/CREAMCEDJS-probe --vo dteam -x /tmp/x509up_u501 -H cream-30.pd.infn.it -m emi.cream.CREAMCEDJS-DirectJobMonit --pass-check-dest active
OK: DONE.
metric results >>> <cream-30.pd.infn.it,emi.cream.CREAMCEDJS-DirectJobSubmit-dteam>
metric results >>> <cream-30.pd.infn.it,emi.cream.CREAMCEDJS-DirectJobState-dteam>
OK: Jobs processed - 1
OK: Jobs processed - 1
DONE : 1|jobs_processed=1;; DONE=1;; REALLY-RUNNING=0;; RUNNING=0;; REGISTERED=0;; PENDING=0;; IDLE=0;; HELD=0;; CANCELLED=0;; ABORTED=0;; UNKNOWN=0;; MISSED=0;; UNDETERMINED=0;; unknown=0;1;2
When it finishes the output file is retrieve and stored into /var/lib/gridprobes/<VO or FQAN>/emi.cream/CREAMCEDJS/<hostname>/jobOutput. The output file should contains the hostname of the worker node where job have run.
[ale@cream-48 ~]$ cat /var/lib/gridprobes/dteam/emi.cream/CREAMCEDJS/cream-30.pd.infn.it/jobOutput/cream-30.pd.infn.it_8443_CREAM126240562/cream.out
cream-wn-030.pn.pd.infn.it
State + Cancel
To test easily the "Cancel" metrics you need to modify the JDL template to increment job duration:
[ale@cream-48 ~]$ cat /usr/libexec/grid-monitoring/probes/emi.cream/CREAMDJS-jdl.template
[
Type="Job";
JobType="Normal";
#Executable = "<jdlExecutable>";
Executable = "/bin/sleep";
#Arguments = "<jdlArguments>";
Arguments = "100";
StdOutput = "cream.out";
StdError = "cream.out";
OutputSandbox = {"cream.out"};
OutputSandboxBaseDestUri="gsiftp://localhost";
]
Then you have to "submit" the job:
/usr/libexec/grid-monitoring/probes/emi.cream/CREAMCEDJS-probe --vo <vo> -x <path of the proxy> -H <CREAM hostname> -m emi.cream.CREAMCEDJS-DirectJobState --resource <CREAM CE url>
[ale@cream-48 ~]$ /usr/libexec/grid-monitoring/probes/emi.cream/CREAMCEDJS-probe --vo dteam -x /tmp/x509up_u501 -H cream-30.pd.infn.it -m emi.cream.CREAMCEDJS-DirectJobState --resource cream-30.pd.infn.it:8443/cream-pbs-cert
OK: Job was submitted [https://cream-30.pd.infn.it:8443/CREAM226348631].
OK: Job was submitted [https://cream-30.pd.infn.it:8443/CREAM226348631].
Testing from: cream-48.pd.infn.it
DN: /C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Alessio Gianelle/CN=proxy
VOMS FQANs: /dteam/Role=NULL/Capability=NULL, /dteam/NGI_IT/Role=NULL/Capability=NULL
https://cream-30.pd.infn.it:8443/CREAM226348631
Monitor the job until it arrives to the RUNNING state:
/usr/libexec/grid-monitoring/probes/emi.cream/CREAMCEDJS-probe --vo <vo> -x <path of the proxy> -H <CREAM hostname> -m emi.cream.CREAMCEDJS-DirectJobMonit --pass-check-dest active
[ale@cream-48 ~]$ /usr/libexec/grid-monitoring/probes/emi.cream/CREAMCEDJS-probe --vo dteam -x /tmp/x509up_u501 -H cream-30.pd.infn.it -m emi.cream.CREAMCEDJS-DirectJobMonit --pass-check-dest active
metric results >>> <cream-30.pd.infn.it,emi.cream.CREAMCEDJS-DirectJobState-dteam>
OK: [RUNNING] https://cream-30.pd.infn.it:8443/CREAM226348631
OK: [RUNNING] https://cream-30.pd.infn.it:8443/CREAM226348631
glite-ce-job-status https://cream-30.pd.infn.it:8443/CREAM226348631
****** JobID=[https://cream-30.pd.infn.it:8443/CREAM226348631]
Status = [RUNNING]
OK: Jobs processed - 1
OK: Jobs processed - 1
[RUNNING] : 1|jobs_processed=1;; DONE=0;; REALLY-RUNNING=0;; RUNNING=1;; REGISTERED=0;; PENDING=0;; IDLE=0;; HELD=0;; CANCELLED=0;; ABORTED=0;; UNKNOWN=0;; MISSED=0;; UNDETERMINED=0;; unknown=0;1;2
Then you can cancel it:
/usr/libexec/grid-monitoring/probes/emi.cream/CREAMCEDJS-probe --vo <vo> -x <path of the proxy> -H <CREAM hostname> -m emi.cream.CREAMCEDJS-DirectJobCancel
[ale@cream-48 ~]$ /usr/libexec/grid-monitoring/probes/emi.cream/CREAMCEDJS-probe --vo dteam -x /tmp/x509up_u501 -H cream-30.pd.infn.it -m emi.cream.CREAMCEDJS-DirectJobCancel
OK: job cancelled
OK: job cancelled
Testing from: cream-48.pd.infn.it
DN: /C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Alessio Gianelle/CN=proxy
VOMS FQANs: /dteam/Role=NULL/Capability=NULL, /dteam/NGI_IT/Role=NULL/Capability=NULL
Job cancellation request sent:
glite-ce-job-cancel --noint https://cream-30.pd.infn.it:8443/CREAM226348631
Job bookkeeping files deleted.
You can check the manually if the final status of the job is CANCELLED as expected:
[ale@cream-48 ~]$ glite-ce-job-status https://cream-30.pd.infn.it:8443/CREAM226348631
****** JobID=[https://cream-30.pd.infn.it:8443/CREAM226348631]
Status = [CANCELLED]
ExitCode = []
Description = [Cancelled by user]
|