Tags:
, view all tags

New CREAM-CE direct job submission metrics

The following metrics are a restructured version of the existing ones and provide a better approach for probing a CREAM CE.

  1. cream_serviceInfo.py - get CREAM CE service info
  2. cream_allowedSubmission.py - check if the submission to the selected CREAM CE is allowed
  3. cream_jobSubmit.py - submit a job directly to the selected CREAM CE
  4. cream_jobCancel.py - cancel an active job.
  5. cream_jobPurge.py - purge a terminted job.

All of them have been implemented in python and are based on the cream-cli commands. They share the same logic structure and provide useful information about their version, usage (i.e. help) including the options list and their meaning. For example:

$ ./cream_serviceInfo.py 
Usage: cream_serviceInfo.py [options]

cream_serviceInfo.py: error: Specify either option -u URL or option -H HOSTNAME (and -p PORT) or read the help (-h)

$ ./cream_serviceInfo.py --help
Usage: cream_serviceInfo.py [options]

Options:
  --version             show program's version number and exit
  -h, --help            show this help message and exit
  -H HOSTNAME, --hostname=HOSTNAME
                        The hostname of the CREAM service.
  -p PORT, --port=PORT  The port of the service. [default: none]
  -u URL, --url=URL     The status endpoint URL of the service. Example:
                        https://<host>[:<port>]
  -x PROXY, --proxy=PROXY
                        The proxy path
  -v, --verbose         verbose mode [default: False]

$ ./cream_serviceInfo.py --version
cream_serviceInfo v.1.0

The interaction with the CREAM CE requires the use of a valid VOMS proxy expressed by the X509_USER_PROXY env variable or through the --proxy option. All metrics check the existence of the proxy file and calculate the time left. In case of error, the related error message will be thrown:

$ ./cream_serviceInfo.py --hostname cream-41.pd.infn.it --port 8443 --proxy /tmp/x509up_u0 --verbose
Proxy file not found or not readable

The verbose mode (--verbose) could be enabled to each metric. It provides several details about the probe execution itself by highlighting the internal commands:

$ ./cream_serviceInfo.py --hostname cream-41.pd.infn.it --port 8443 --verbose
executing command: /usr/bin/voms-proxy-info -timeleft
invoking service info
executing command: /usr/bin/glite-ce-service-info cream-41.pd.infn.it:8443
Interface Version  = [2.1]
Service Version    = [1.16.1 - EMI version: 3.5.1-1.el6]
Description        = [CREAM 2]
Started at         = [Thu Jan  2 19:06:51 2014]
Submission enabled = [YES]
Status             = [RUNNING]

In case of mistakes on the selected options or on their values, the probe tries to explain what is wrong. For example the cream_serviceInfo doesn't support the --queue option:

$ ./cream_serviceInfo.py --hostname cream-41.pd.infn.it --port 8443 --queue creamtest1 --verbose
Usage: cream_serviceInfo.py [options]

cream_serviceInfo.py: error: no such option: --queue

In case of the errors in interacting with the CREAM CE, useful details will be provided about the failure:

$ ./cream_allowedSubmission.py --url https://cream-43.pd.infn.it:8443
command '/usr/bin/glite-ce-allowed-submission cream-43.pd.infn.it:8443' failed: return_code=1
details: ['2014-01-16 15:59:57,085 FATAL - Received NULL fault; the error is due to another cause: FaultString=[connection error] - FaultCode=[SOAP-ENV:Client] - FaultSubCode=[SOAP-ENV:Client] - FaultDetail=[Connection refused]\n']

cream_serviceInfo.py

The serviceInfo.py retrieves information about the status of the CREAM CE. The help shows how the probe must be invoked:

$ ./cream_serviceInfo.py --help
Usage: cream_serviceInfo.py [options]

Options:
  --version             show program's version number and exit
  -h, --help            show this help message and exit
  -H HOSTNAME, --hostname=HOSTNAME
                        The hostname of the CREAM service.
  -p PORT, --port=PORT  The port of the service. [default: none]
  -x PROXY, --proxy=PROXY
                        The proxy path
  -v, --verbose         verbose mode [default: False]
  -u URL, --url=URL     The status endpoint URL of the service. Example:
                        https://<host>[:<port>]

In order to get information about the CREAM service on the host https://cream-41.pd.infn.it:8443, use the following command:

$ ./cream_serviceInfo.py --url https://cream-41.pd.infn.it:8443
Interface Version  = [2.1]
Service Version    = [1.16.1 - EMI version: 3.5.1-1.el6]
Description        = [CREAM 2]
Started at         = [Thu Jan  2 19:06:51 2014]
Submission enabled = [YES]
Status             = [RUNNING]

or similary:

$ ./cream_serviceInfo.py --hostname cream-41.pd.infn.it --port 8443 
Interface Version  = [2.1]
Service Version    = [1.16.1 - EMI version: 3.5.1-1.el6]
Description        = [CREAM 2]
Started at         = [Thu Jan  2 19:06:51 2014]
Submission enabled = [YES]
Status             = [RUNNING]

cream_allowedSubmission

This is a simple metric which checks if the submission to the selected CREAM CE is allowed. Its usage is analogous to the above metric:

$ ./cream_allowedSubmission.py --help
Usage: cream_allowedSubmission.py [options]

Options:
  --version             show program's version number and exit
  -h, --help            show this help message and exit
  -H HOSTNAME, --hostname=HOSTNAME
                        The hostname of the CREAM service.
  -p PORT, --port=PORT  The port of the service. [default: none]
  -u URL, --url=URL     The status endpoint URL of the service. Example:
                        https://<host>[:<port>]
  -x PROXY, --proxy=PROXY
                        The proxy path
  -v, --verbose         verbose mode [default: False]

Notice: the use of the --url option is equivalent to specify both the options: --hostname and --port:

$ ./cream_allowedSubmission.py --hostname cream-41.pd.infn.it --port 8443
ENABLED

$ ./cream_allowedSubmission.py --url https://cream-41.pd.infn.it:8443
ENABLED

The verbose mode highlights the internal commands:

$ ./cream_allowedSubmission.py --url https://cream-41.pd.infn.it:8443 --verbose
executing command: /usr/bin/voms-proxy-info -timeleft
invoking allowedSubmission
executing command: /usr/bin/glite-ce-allowed-submission cream-41.pd.infn.it:8443
ENABLED

cream_jobSubmit.py

This metric submits a job directly to the selected CREAM CE and waits until the job termination by providing the final status. Finally the job is purged. Moreover the stage-in and stage-out phases are both performed automatically by the CE. In particular the stage-out needs the OutputSandboxBaseDestUri="gsiftp://localhost" set in the JDL.

$ ./cream_jobSubmit.py --help
Usage: cream_jobSubmit.py [options]

Options:
  --version             show program's version number and exit
  -h, --help            show this help message and exit
  -H HOSTNAME, --hostname=HOSTNAME
                        The hostname of the CREAM service.
  -p PORT, --port=PORT  The port of the service. [default: none]
  -u URL, --url=URL     The status endpoint URL of the service. Example:
                        https://<host>[:<port>]/cream-<lrms>-<queue>
  -x PROXY, --proxy=PROXY
                        The proxy path
  -v, --verbose         verbose mode [default: False]
  -l LRMS, --lrms=LRMS  The LRMS name (e.g.: 'lsf', 'pbs' etc)
  -q QUEUE, --queue=QUEUE
                        The queue name (e.g.: 'creamtest')
  -j JDL, --jdl=JDL     The jdl path

The --url (-u) directive must be used to target the probe to a specific CREAM CE identified by its identifier (i.e. CREAM CE ID). Alternatively is it possible to specify the CREAM CE identifier by using the --hostname , --port, --lrms and --queue options which are mutually exclusive with respect to the --url option.

Consider the JDL file hostname.jdl with the following content:

$ cat ./hostname.jdl
[
Type="Job";
JobType="Normal";
Executable = "/bin/hostname";
StdOutput = "std.out";
StdError = "std.err";
OutputSandbox = {"std.out","std.err"};
OutputSandboxBaseDestUri="gsiftp://localhost"
]

If verbose mode is disabled, the output should look like this:

$ ./cream_jobSubmit.py --url https://cream-41.pd.infn.it:8443/cream-lsf-creamtest1 --jdl ./hostname.jdl 
Job terminated with status DONE-OK

Notice: the use of the --url option is equivalent to specify both the options: --hostname, --port --lrms and --queue:

$ ./cream_jobSubmit.py --hostname cream-41.pd.infn.it --port 8443 --lrms lsf --queue creamtest1 --jdl ./hostname.jdl 
Job terminated with status DONE-OK

If the verbose mode is enabled, the output of the above command should be like this:

$ ./cream_jobSubmit.py --hostname cream-41.pd.infn.it --port 8443 --lrms lsf --queue creamtest1 --jdl ./hostname.jdl --verbose
executing command: /usr/bin/voms-proxy-info -timeleft
executing command: /usr/bin/glite-ce-job-submit -d -a -r cream-41.pd.infn.it:8443/cream-lsf-creamtest1 ./hostname.jdl
['2014-01-16 13:52:57,305 DEBUG - Using certificate proxy file [/tmp/x509up_u0]\n', '2014-01-16 13:52:57,324 DEBUG - VO from certificate=[dteam]\n', '2014-01-16 13:52:57,324 WARN - No configuration file suitable for loading. Using built-in configuration\n', '2014-01-16 13:52:57,324 DEBUG - Logfile is [/tmp/glite_cream_cli_logs/glite-ce-job-submit_CREAM_root_20140116-135257.log]\n', '2014-01-16 13:52:57,328 INFO - certUtil::generateUniqueID() - Generated DelegationID: [99b19aafc98e11bb7956cc58d901bd860697227d]\n', '2014-01-16 13:52:59,645 DEBUG - Registering to [https://cream-41.pd.infn.it:8443/ce-cream/services/CREAM2] JDL=[ StdOutput = "std.out"; BatchSystem = "lsf"; QueueName = "creamtest1"; Executable = "/bin/hostname"; Type = "Job"; JobType = "Normal"; OutputSandboxBaseDestUri = "gsiftp://localhost"; OutputSandbox = { "std.out","std.err" }; StdError = "std.err" ] - JDL File=[./hostname.jdl]\n', '2014-01-16 13:53:00,022 DEBUG - Will invoke JobStart for JobID [CREAM304354901]\n', 'https://cream-41.pd.infn.it:8443/CREAM304354901\n']
job id: https://cream-41.pd.infn.it:8443/CREAM304354901
invoking jobStatus
executing command: /usr/bin/glite-ce-job-status https://cream-41.pd.infn.it:8443/CREAM304354901
job status: REALLY-RUNNING
invoking jobStatus
executing command: /usr/bin/glite-ce-job-status https://cream-41.pd.infn.it:8443/CREAM304354901
job status: DONE-OK
invoking getOutputSandbox
executing command: /usr/bin/glite-ce-job-output --noint https://cream-41.pd.infn.it:8443/CREAM187996258
output sandbox dir: ./cream-41.pd.infn.it_8443_CREAM187996258
invoking jobPurge
executing command: /usr/bin/glite-ce-job-purge --noint https://cream-41.pd.infn.it:8443/CREAM187996258
Job terminated with status DONE-OK

Notice the output sandbox dir: ./cream-41.pd.infn.it_8443_CREAM187996258. This is the output sandbox directory containing all the produced files:

$ ls -la ./cream-41.pd.infn.it_8443_CREAM187996258
total 12
drwxr-xr-x   2 root root 4096 17 gen 16:20 .
dr-xr-x---. 23 root root 4096 17 gen 16:20 ..
-rw-------   1 root root    0 17 gen 16:20 std.err
-rw-------   1 root root   26 17 gen 16:20 std.out

cream_jobCancel.py

This metric submits a job directly to the selected CREAM CE, waits until the job gain the REALLY-RUNNING state and then tries to cancel it. Finally it checks if the job has been correctly canceled.

$ ./cream_jobCancel.py --help
Usage: cream_jobCancel.py [options]

Options:
  --version             show program's version number and exit
  -h, --help            show this help message and exit
  -H HOSTNAME, --hostname=HOSTNAME
                        The hostname of the CREAM service.
  -p PORT, --port=PORT  The port of the service. [default: none]
  -x PROXY, --proxy=PROXY
                        The proxy path
  -v, --verbose         verbose mode [default: False]
  -u URL, --url=URL     The status endpoint URL of the service. Example:
                        https://<host>[:<port>]/cream-<lrms>-<queue>
  -l LRMS, --lrms=LRMS  The LRMS name (e.g.: 'lsf', 'pbs' etc)
  -q QUEUE, --queue=QUEUE
                        The queue name (e.g.: 'creamtest')
  -j JDL, --jdl=JDL     The jdl path

The job must be enough long in terms of execution time, in order to allow the probe to check the current job status and invoke the glite-ce-job-cancel. For example consider the job (i.e. hostnane.jdl) of the above metric. In this case the probe will fail because the job already terminated before the execution of the e glite-ce-job-cancel command:

$ ./cream_jobCancel.py --url https://cream-41.pd.infn.it:8443/cream-lsf-creamtest1 --jdl ./hostname.jdl 
job already terminated

Now consider the following job:

$ cat ./sleep.jdl 
[
Type="Job";
JobType="Normal";
Executable = "/bin/sleep";
Arguments = "200";
StdOutput = "cream.out";
StdError = "cream.out";
]

The output of the probe should be like:

$ ./cream_jobCancel.py --url https://cream-41.pd.infn.it:8443/cream-lsf-creamtest1 --jdl ./sleep.jdl 
job cancelled

or like this with --verbose option specified:

$ ./cream_jobCancel.py --url https://cream-41.pd.infn.it:8443/cream-lsf-creamtest1 --jdl ./sleep.jdl --verbose
executing command: /usr/bin/voms-proxy-info -timeleft
executing command: /usr/bin/glite-ce-job-submit -d -a -r cream-41.pd.infn.it:8443/cream-lsf-creamtest1 ./sleep.jdl
['2014-01-16 17:22:49,728 DEBUG - Using certificate proxy file [/tmp/x509up_u0]\n', '2014-01-16 17:22:49,744 DEBUG - VO from certificate=[dteam]\n', '2014-01-16 17:22:49,745 WARN - No configuration file suitable for loading. Using built-in configuration\n', '2014-01-16 17:22:49,745 DEBUG - Logfile is [/tmp/glite_cream_cli_logs/glite-ce-job-submit_CREAM_root_20140116-172249.log]\n', '2014-01-16 17:22:49,747 INFO - certUtil::generateUniqueID() - Generated DelegationID: [69e3bdaf4e818e1f71f1b7ff442f74583c869b84]\n', '2014-01-16 17:22:52,165 DEBUG - Registering to [https://cream-41.pd.infn.it:8443/ce-cream/services/CREAM2] JDL=[ StdOutput = "cream.out"; BatchSystem = "lsf"; QueueName = "creamtest1"; Executable = "/bin/sleep"; Type = "Job"; Arguments = "200"; JobType = "Normal"; StdError = "cream.out" ] - JDL File=[./sleep.jdl]\n', '2014-01-16 17:22:52,513 DEBUG - Will invoke JobStart for JobID [CREAM437649288]\n', 'https://cream-41.pd.infn.it:8443/CREAM437649288\n']
job id: https://cream-41.pd.infn.it:8443/CREAM437649288
invoking jobStatus
executing command: /usr/bin/glite-ce-job-status https://cream-41.pd.infn.it:8443/CREAM437649288
job status: REALLY-RUNNING
invoking jobCancel
executing command: /usr/bin/glite-ce-job-cancel --noint https://cream-41.pd.infn.it:8443/CREAM437649288
invoking jobStatus
executing command: /usr/bin/glite-ce-job-status https://cream-41.pd.infn.it:8443/CREAM437649288
job status: REALLY-RUNNING
invoking jobStatus
executing command: /usr/bin/glite-ce-job-status https://cream-41.pd.infn.it:8443/CREAM437649288
job status: CANCELLED
invoking jobPurge
executing command: /usr/bin/glite-ce-job-purge --noint https://cream-41.pd.infn.it:8443/CREAM437649288
job cancelled

cream_jobPurge.py

This metric is analogous of cream_jobCancel.py. It submits a short job (e.g. hostname.jdl), waits its termination (e.g DONE-OK) and then it tries to purge it. Finally, in order to verify the purging operation was successfully executed, the probe checks the job status by executing the glite-ce-job-status command which just in this scenario, must fail because the job doesn't exist anymore.

$ ./cream_jobPurge.py --help
Usage: cream_jobPurge.py [options]

Options:
  --version             show program's version number and exit
  -h, --help            show this help message and exit
  -H HOSTNAME, --hostname=HOSTNAME
                        The hostname of the CREAM service.
  -p PORT, --port=PORT  The port of the service. [default: none]
  -x PROXY, --proxy=PROXY
                        The proxy path
  -v, --verbose         verbose mode [default: False]
  -u URL, --url=URL     The status endpoint URL of the service. Example:
                        https://<host>[:<port>]/cream-<lrms>-<queue>
  -l LRMS, --lrms=LRMS  The LRMS name (e.g.: 'lsf', 'pbs' etc)
  -q QUEUE, --queue=QUEUE
                        The queue name (e.g.: 'creamtest')
  -j JDL, --jdl=JDL     The jdl path

$ ./cream_jobPurge.py --url https://cream-41.pd.infn.it:8443/cream-lsf-creamtest1 --jdl ./hostname.jdl 
job purged

$ ./cream_jobPurge.py --url https://cream-41.pd.infn.it:8443/cream-lsf-creamtest1 --jdl ./hostname.jdl --verbose
executing command: /usr/bin/voms-proxy-info -timeleft
executing command: /usr/bin/glite-ce-job-submit -d -a -r cream-41.pd.infn.it:8443/cream-lsf-creamtest1 ./hostname.jdl
['2014-01-16 14:02:48,282 DEBUG - Using certificate proxy file [/tmp/x509up_u0]\n', '2014-01-16 14:02:48,301 DEBUG - VO from certificate=[dteam]\n', '2014-01-16 14:02:48,301 WARN - No configuration file suitable for loading. Using built-in configuration\n', '2014-01-16 14:02:48,302 DEBUG - Logfile is [/tmp/glite_cream_cli_logs/glite-ce-job-submit_CREAM_root_20140116-140248.log]\n', '2014-01-16 14:02:48,305 INFO - certUtil::generateUniqueID() - Generated DelegationID: [85a80615fc8046baf6dce2a27b708fc82ecbce55]\n', '2014-01-16 14:02:51,105 DEBUG - Registering to [https://cream-41.pd.infn.it:8443/ce-cream/services/CREAM2] JDL=[ StdOutput = "std.out"; BatchSystem = "lsf"; QueueName = "creamtest1"; Executable = "/bin/hostname"; Type = "Job"; JobType = "Normal"; OutputSandboxBaseDestUri = "gsiftp://localhost"; OutputSandbox = { "std.out","std.err" }; StdError = "std.err" ] - JDL File=[./hostname.jdl]\n', '2014-01-16 14:02:51,504 DEBUG - Will invoke JobStart for JobID [CREAM691625071]\n', 'https://cream-41.pd.infn.it:8443/CREAM691625071\n']
job id: https://cream-41.pd.infn.it:8443/CREAM691625071
invoking jobStatus
executing command: /usr/bin/glite-ce-job-status https://cream-41.pd.infn.it:8443/CREAM691625071
job status: DONE-OK
invoking jobPurge
executing command: /usr/bin/glite-ce-job-purge --noint https://cream-41.pd.infn.it:8443/CREAM691625071
invoking jobStatus
executing command: /usr/bin/glite-ce-job-status https://cream-41.pd.infn.it:8443/CREAM691625071
job purged

-- LisaZangrando - 2014-01-16

Edit | Attach | PDF | History: r5 < r4 < r3 < r2 < r1 | Backlinks | Raw View | More topic actions...
Topic revision: r3 - 2014-01-17 - LisaZangrando
 

  • Edit
  • Attach
This site is powered by the TWiki collaboration platformCopyright © 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback