Tags:
, view all tags

CREAM-CE direct job submission metrics

These metrics are used to probe cream-ce using cream-cli commands.

  1. emi.cream.CREAMCEDJS-DirectJobState. Direct job submission to CREAM-CE.
  2. emi.cream.CREAMCEDJS-DirectJobMonit. Babysit submitted grid jobs.
  3. emi.cream.CREAMCEDJS-DelegateProxy. Delegate proxy to CREAM CE
  4. emi.cream.CREAMCEDJS-DirectJobCancel. Cancel active job.
  5. emi.cream.CREAMCEDJS-ServiceInfo. Get CREAM CE service info
  6. emi.cream.CREAMCEDJS-SubmitAllowed. Check if submission to the CREAM CE is allowed
  7. emi.cream.CREAMCEDJS-DirectJobSubmit. Passive. Final status of direct job submission to CREAM CE

emi.cream.CREAMCEDJS-DirectJobState

Direct submission to a CREAM-CE, which can be choosen using these parameters:

--resource <URI> CREAM CE to send job to. Format : <host>[:<port>]/cream-<lrms-system-name>-<queue-name>
If not given - resource discovery will be performed.
--ldap-uri <URI> Format [ldap://]hostname[:port[/]] (Default: ldap://sam-bdii.cern.ch:2170)
--prev-status <0-3> Previous Nagios status of the metric.

As specified if the destination CREAM-CE is not explicited a resuorce discovery will be performed using the given ldap server.

At the moment the template jdl used for submission is very simple:

Type="Job";
JobType="Normal";
Executable = "<jdlExecutable>";
Arguments = "<jdlArguments>";
StdOutput = "cream.out";
StdError = "cream.out";
OutputSandbox = {"cream.out"};
OutputSandboxBaseDestUri="gsiftp://localhost";

where the Executable is the command "/bin/hostname/"

emi.cream.CREAMCEDJS-DirectJobMonit

Monitors submitted grid jobs. Threaded implementation with one thread per monitored resource with max 10 threads. Passively updates emi.cream.CREAMCEDJS-DirectJobState with the latest state of the job according to CREAM when job is not in a terminal state. When job enters terminal state or was canceled the metric updates both emi.cream.CREAMCEDJS-DirectJobState and emi.cream.CREAMCEDJS-DirectJobSubmit with the final job status. The latter metrics are updated (as passive checks) either via Naigos command file or NSCA. emi.cream.CREAMCEDJS-DirectJobSubmit is the metric which goes to Metric Store Database.

Test

To test the probe you have to create a valid proxy.

State + Monit

First you have to "submit" a job:

/usr/libexec/grid-monitoring/probes/emi.cream/CREAMCEDJS-probe --vo <vo> -x <path of the proxy> -H <CREAM hostname> -m emi.cream.CREAMCEDJS-DirectJobState --resource <CREAM CE url>

[ale@cream-48 ~]$ /usr/libexec/grid-monitoring/probes/emi.cream/CREAMCEDJS-probe --vo dteam -x /tmp/x509up_u501 -H cream-30.pd.infn.it -m emi.cream.CREAMCEDJS-DirectJobState --resource cream-30.pd.infn.it:8443/cream-pbs-cert
OK: Job was submitted [https://cream-30.pd.infn.it:8443/CREAM126240562].
OK: Job was submitted [https://cream-30.pd.infn.it:8443/CREAM126240562].
Testing from: cream-48.pd.infn.it
DN: /C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Alessio Gianelle/CN=proxy
VOMS FQANs: /dteam/Role=NULL/Capability=NULL, /dteam/NGI_IT/Role=NULL/Capability=NULL

https://cream-30.pd.infn.it:8443/CREAM126240562

Then you can monitor the job:

/usr/libexec/grid-monitoring/probes/emi.cream/CREAMCEDJS-probe --vo <vo> -x <path of the proxy> -H <CREAM hostname> -m emi.cream.CREAMCEDJS-DirectJobMonit --pass-check-dest active

[ale@cream-48 ~]$ /usr/libexec/grid-monitoring/probes/emi.cream/CREAMCEDJS-probe --vo dteam -x /tmp/x509up_u501 -H cream-30.pd.infn.it -m emi.cream.CREAMCEDJS-DirectJobMonit --pass-check-dest active
OK: DONE.
metric results >>> <cream-30.pd.infn.it,emi.cream.CREAMCEDJS-DirectJobSubmit-dteam>



metric results >>> <cream-30.pd.infn.it,emi.cream.CREAMCEDJS-DirectJobState-dteam>



OK: Jobs processed - 1
OK: Jobs processed - 1
DONE : 1|jobs_processed=1;; DONE=1;; REALLY-RUNNING=0;; RUNNING=0;; REGISTERED=0;; PENDING=0;; IDLE=0;; HELD=0;; CANCELLED=0;; ABORTED=0;; UNKNOWN=0;; MISSED=0;; UNDETERMINED=0;; unknown=0;1;2

When it finishes the output file is retrieve and stored into /var/lib/gridprobes/<VO or FQAN>/emi.cream/CREAMCEDJS/<hostname>/jobOutput. The output file should contains the hostname of the worker node where job have run.

[ale@cream-48 ~]$ cat /var/lib/gridprobes/dteam/emi.cream/CREAMCEDJS/cream-30.pd.infn.it/jobOutput/cream-30.pd.infn.it_8443_CREAM126240562/cream.out 
cream-wn-030.pn.pd.infn.it

State + Cancel

To test easily the "Cancel" metrics you need to modify the JDL template to increment job duration:

[ale@cream-48 ~]$ cat /usr/libexec/grid-monitoring/probes/emi.cream/CREAMDJS-jdl.template
[
Type="Job";
JobType="Normal";
#Executable = "<jdlExecutable>";
Executable = "/bin/sleep";
#Arguments = "<jdlArguments>";
Arguments = "100";
StdOutput = "cream.out";
StdError = "cream.out";
OutputSandbox = {"cream.out"};
OutputSandboxBaseDestUri="gsiftp://localhost";
]

Then you have to "submit" the job:

/usr/libexec/grid-monitoring/probes/emi.cream/CREAMCEDJS-probe --vo <vo> -x <path of the proxy> -H <CREAM hostname> -m emi.cream.CREAMCEDJS-DirectJobState --resource <CREAM CE url>

[ale@cream-48 ~]$ /usr/libexec/grid-monitoring/probes/emi.cream/CREAMCEDJS-probe --vo dteam -x /tmp/x509up_u501 -H cream-30.pd.infn.it -m emi.cream.CREAMCEDJS-DirectJobState --resource cream-30.pd.infn.it:8443/cream-pbs-cert
OK: Job was submitted [https://cream-30.pd.infn.it:8443/CREAM226348631].
OK: Job was submitted [https://cream-30.pd.infn.it:8443/CREAM226348631].
Testing from: cream-48.pd.infn.it
DN: /C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Alessio Gianelle/CN=proxy
VOMS FQANs: /dteam/Role=NULL/Capability=NULL, /dteam/NGI_IT/Role=NULL/Capability=NULL

https://cream-30.pd.infn.it:8443/CREAM226348631

Monitor the job until it arrives to the RUNNING state:

/usr/libexec/grid-monitoring/probes/emi.cream/CREAMCEDJS-probe --vo <vo> -x <path of the proxy> -H <CREAM hostname> -m emi.cream.CREAMCEDJS-DirectJobMonit --pass-check-dest active

[ale@cream-48 ~]$ /usr/libexec/grid-monitoring/probes/emi.cream/CREAMCEDJS-probe --vo dteam -x /tmp/x509up_u501 -H cream-30.pd.infn.it -m emi.cream.CREAMCEDJS-DirectJobMonit --pass-check-dest active
metric results >>> <cream-30.pd.infn.it,emi.cream.CREAMCEDJS-DirectJobState-dteam>
OK: [RUNNING] https://cream-30.pd.infn.it:8443/CREAM226348631
OK: [RUNNING] https://cream-30.pd.infn.it:8443/CREAM226348631

glite-ce-job-status https://cream-30.pd.infn.it:8443/CREAM226348631

******  JobID=[https://cream-30.pd.infn.it:8443/CREAM226348631]
  Status        = [RUNNING]
OK: Jobs processed - 1
OK: Jobs processed - 1
[RUNNING] : 1|jobs_processed=1;; DONE=0;; REALLY-RUNNING=0;; RUNNING=1;; REGISTERED=0;; PENDING=0;; IDLE=0;; HELD=0;; CANCELLED=0;; ABORTED=0;; UNKNOWN=0;; MISSED=0;; UNDETERMINED=0;; unknown=0;1;2

Then you can cancel it:

/usr/libexec/grid-monitoring/probes/emi.cream/CREAMCEDJS-probe --vo <vo> -x <path of the proxy> -H <CREAM hostname> -m emi.cream.CREAMCEDJS-DirectJobCancel

[ale@cream-48 ~]$ /usr/libexec/grid-monitoring/probes/emi.cream/CREAMCEDJS-probe --vo dteam -x /tmp/x509up_u501 -H cream-30.pd.infn.it -m emi.cream.CREAMCEDJS-DirectJobCancel
OK: job cancelled
OK: job cancelled
Testing from: cream-48.pd.infn.it
DN: /C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Alessio Gianelle/CN=proxy
VOMS FQANs: /dteam/Role=NULL/Capability=NULL, /dteam/NGI_IT/Role=NULL/Capability=NULL
Job cancellation request sent:
glite-ce-job-cancel --noint https://cream-30.pd.infn.it:8443/CREAM226348631
Job bookkeeping files deleted.

You can check the manually if the final status of the job is CANCELLED as expected:

[ale@cream-48 ~]$ glite-ce-job-status https://cream-30.pd.infn.it:8443/CREAM226348631

******  JobID=[https://cream-30.pd.infn.it:8443/CREAM226348631]
   Status        = [CANCELLED]
   ExitCode      = []
   Description   = [Cancelled by user]

Edit | Attach | PDF | History: r5 < r4 < r3 < r2 < r1 | Backlinks | Raw View | More topic actions...
Topic revision: r4 - 2011-11-10 - AlessioGianelle
 

  • Edit
  • Attach
This site is powered by the TWiki collaboration platformCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback