New CREAM-CE direct job submission metrics
The following metrics are a restructured version of the existing
ones
and provide a better approach for probing a CREAM CE.
- cream_serviceInfo.py - get CREAM CE service info
- cream_allowedSubmission.py - check if the submission to the selected CREAM CE is allowed
- cream_jobSubmit.py - submit a job directly to the selected CREAM CE
- cream_jobCancel.py - cancel an active job.
- cream_jobPurge.py - purge a terminted job.
All of them have been implemented in python and are based on the
cream-cli
commands. They share the same logic structure and provide useful information about their version, usage (i.e. help) including the options list and their meaning.
For example:
$ ./cream_serviceInfo.py
Usage: cream_serviceInfo.py [options]
cream_serviceInfo.py: error: Specify either option -u URL or option -H HOSTNAME (and -p PORT) or read the help (-h)
$ ./cream_serviceInfo.py --help
Usage: cream_serviceInfo.py [options]
Options:
--version show program's version number and exit
-h, --help show this help message and exit
-H HOSTNAME, --hostname=HOSTNAME
The hostname of the CREAM service.
-p PORT, --port=PORT The port of the service. [default: none]
-u URL, --url=URL The status endpoint URL of the service. Example:
https://<host>[:<port>]
-x PROXY, --proxy=PROXY
The proxy path
-v, --verbose verbose mode [default: False]
$ ./cream_serviceInfo.py --version
cream_serviceInfo v.1.0
The interaction with the CREAM CE requires the use of a valid VOMS proxy expressed by the X509_USER_PROXY env variable or through the --proxy option. All metrics check the existence of the proxy file and calculate the time left. In case of error, the related error message will be thrown:
$ ./cream_serviceInfo.py --hostname cream-41.pd.infn.it --port 8443 --proxy /tmp/x509up_u0 --verbose
Proxy file not found or not readable
The verbose mode (--verbose) could be enabled to each metric. It provides several details about the probe execution itself by highlighting the internal commands:
$ ./cream_serviceInfo.py --hostname cream-41.pd.infn.it --port 8443 --verbose
executing command: /usr/bin/voms-proxy-info -timeleft
invoking service info
executing command: /usr/bin/glite-ce-service-info cream-41.pd.infn.it:8443
Interface Version = [2.1]
Service Version = [1.16.1 - EMI version: 3.5.1-1.el6]
Description = [CREAM 2]
Started at = [Thu Jan 2 19:06:51 2014]
Submission enabled = [YES]
Status = [RUNNING]
In case of mistakes on the selected options or on their values, the probe tries to explain what is wrong.
For example the cream_serviceInfo doesn't support the --queue option:
$ ./cream_serviceInfo.py --hostname cream-41.pd.infn.it --port 8443 --queue creamtest1 --verbose
Usage: cream_serviceInfo.py [options]
cream_serviceInfo.py: error: no such option: --queue
In case of the errors in interacting with the CREAM CE, useful details will be provided about the failure:
$ ./cream_allowedSubmission.py --url https://cream-43.pd.infn.it:8443
command '/usr/bin/glite-ce-allowed-submission cream-43.pd.infn.it:8443' failed: return_code=1
details: ['2014-01-16 15:59:57,085 FATAL - Received NULL fault; the error is due to another cause: FaultString=[connection error] - FaultCode=[SOAP-ENV:Client] - FaultSubCode=[SOAP-ENV:Client] - FaultDetail=[Connection refused]\n']
cream_serviceInfo.py
The serviceInfo.py retrieves information about the status of the CREAM CE. The help shows how the probe must be invoked:
$ ./cream_serviceInfo.py --help
Usage: cream_serviceInfo.py [options]
Options:
--version show program's version number and exit
-h, --help show this help message and exit
-H HOSTNAME, --hostname=HOSTNAME
The hostname of the CREAM service.
-p PORT, --port=PORT The port of the service. [default: none]
-x PROXY, --proxy=PROXY
The proxy path
-v, --verbose verbose mode [default: False]
-u URL, --url=URL The status endpoint URL of the service. Example:
https://<host>[:<port>]
In order to get information about the CREAM service on the host
https://cream-41.pd.infn.it:8443
, use the following command:
$ ./cream_serviceInfo.py --url https://cream-41.pd.infn.it:8443
Interface Version = [2.1]
Service Version = [1.16.1 - EMI version: 3.5.1-1.el6]
Description = [CREAM 2]
Started at = [Thu Jan 2 19:06:51 2014]
Submission enabled = [YES]
Status = [RUNNING]
or similary:
$ ./cream_serviceInfo.py --hostname cream-41.pd.infn.it --port 8443
Interface Version = [2.1]
Service Version = [1.16.1 - EMI version: 3.5.1-1.el6]
Description = [CREAM 2]
Started at = [Thu Jan 2 19:06:51 2014]
Submission enabled = [YES]
Status = [RUNNING]
cream_allowedSubmission
This is a simple metric which checks if the submission to the selected CREAM CE is allowed. Its usage is analogous to the above metric:
$ ./cream_allowedSubmission.py --help
Usage: cream_allowedSubmission.py [options]
Options:
--version show program's version number and exit
-h, --help show this help message and exit
-H HOSTNAME, --hostname=HOSTNAME
The hostname of the CREAM service.
-p PORT, --port=PORT The port of the service. [default: none]
-u URL, --url=URL The status endpoint URL of the service. Example:
https://<host>[:<port>]
-x PROXY, --proxy=PROXY
The proxy path
-v, --verbose verbose mode [default: False]
Notice: the use of the --url option is equivalent to specify both the options: --hostname and --port:
$ ./cream_allowedSubmission.py --hostname cream-41.pd.infn.it --port 8443
ENABLED
$ ./cream_allowedSubmission.py --url https://cream-41.pd.infn.it:8443
ENABLED
The verbose mode highlights the internal commands:
$ ./cream_allowedSubmission.py --url https://cream-41.pd.infn.it:8443 --verbose
executing command: /usr/bin/voms-proxy-info -timeleft
invoking allowedSubmission
executing command: /usr/bin/glite-ce-allowed-submission cream-41.pd.infn.it:8443
ENABLED
cream_jobSubmit.py
This metric submits a job directly to the selected CREAM CE and waits until the job termination by providing the final status. Finally the job is purged.
Moreover the stage-in and stage-out phases are both performed automatically by the CE. In particular the stage-out needs the
OutputSandboxBaseDestUri="gsiftp://localhost" set in the JDL.
$ ./cream_jobSubmit.py --help
Usage: cream_jobSubmit.py [options]
Options:
--version show program's version number and exit
-h, --help show this help message and exit
-H HOSTNAME, --hostname=HOSTNAME
The hostname of the CREAM service.
-p PORT, --port=PORT The port of the service. [default: none]
-u URL, --url=URL The status endpoint URL of the service. Example:
https://<host>[:<port>]/cream-<lrms>-<queue>
-x PROXY, --proxy=PROXY
The proxy path
-v, --verbose verbose mode [default: False]
-l LRMS, --lrms=LRMS The LRMS name (e.g.: 'lsf', 'pbs' etc)
-q QUEUE, --queue=QUEUE
The queue name (e.g.: 'creamtest')
-j JDL, --jdl=JDL The jdl path
The --url (-u) directive must be used to target the probe to a specific CREAM CE identified by its identifier (i.e. CREAM CE ID). Alternatively is it possible to specify the CREAM CE identifier by using the --hostname , --port, --lrms and --queue options which are mutually exclusive with respect to the --url option.
Consider the JDL file hostname.jdl with the following content:
$ cat ./hostname.jdl
[
Type="Job";
JobType="Normal";
Executable = "/bin/hostname";
StdOutput = "std.out";
StdError = "std.err";
OutputSandbox = {"std.out","std.err"};
OutputSandboxBaseDestUri="gsiftp://localhost"
]
If verbose mode is disabled, the output should look like this:
$ ./cream_jobSubmit.py --url https://cream-41.pd.infn.it:8443/cream-lsf-creamtest1 --jdl ./hostname.jdl
Job terminated with status DONE-OK
Notice: the use of the --url option is equivalent to specify both the options: --hostname, --port --lrms and --queue:
$ ./cream_jobSubmit.py --hostname cream-41.pd.infn.it --port 8443 --lrms lsf --queue creamtest1 --jdl ./hostname.jdl
Job terminated with status DONE-OK
If the verbose mode is enabled, the output of the above command should be like this:
$ ./cream_jobSubmit.py --hostname cream-41.pd.infn.it --port 8443 --lrms lsf --queue creamtest1 --jdl ./hostname.jdl --verbose
executing command: /usr/bin/voms-proxy-info -timeleft
executing command: /usr/bin/glite-ce-job-submit -d -a -r cream-41.pd.infn.it:8443/cream-lsf-creamtest1 ./hostname.jdl
['2014-01-16 13:52:57,305 DEBUG - Using certificate proxy file [/tmp/x509up_u0]\n', '2014-01-16 13:52:57,324 DEBUG - VO from certificate=[dteam]\n', '2014-01-16 13:52:57,324 WARN - No configuration file suitable for loading. Using built-in configuration\n', '2014-01-16 13:52:57,324 DEBUG - Logfile is [/tmp/glite_cream_cli_logs/glite-ce-job-submit_CREAM_root_20140116-135257.log]\n', '2014-01-16 13:52:57,328 INFO - certUtil::generateUniqueID() - Generated DelegationID: [99b19aafc98e11bb7956cc58d901bd860697227d]\n', '2014-01-16 13:52:59,645 DEBUG - Registering to [https://cream-41.pd.infn.it:8443/ce-cream/services/CREAM2] JDL=[ StdOutput = "std.out"; BatchSystem = "lsf"; QueueName = "creamtest1"; Executable = "/bin/hostname"; Type = "Job"; JobType = "Normal"; OutputSandboxBaseDestUri = "gsiftp://localhost"; OutputSandbox = { "std.out","std.err" }; StdError = "std.err" ] - JDL File=[./hostname.jdl]\n', '2014-01-16 13:53:00,022 DEBUG - Will invoke JobStart for JobID [CREAM304354901]\n', 'https://cream-41.pd.infn.it:8443/CREAM304354901\n']
job id: https://cream-41.pd.infn.it:8443/CREAM304354901
invoking jobStatus
executing command: /usr/bin/glite-ce-job-status https://cream-41.pd.infn.it:8443/CREAM304354901
job status: REALLY-RUNNING
invoking jobStatus
executing command: /usr/bin/glite-ce-job-status https://cream-41.pd.infn.it:8443/CREAM304354901
job status: DONE-OK
invoking getOutputSandbox
executing command: /usr/bin/glite-ce-job-output --noint https://cream-41.pd.infn.it:8443/CREAM187996258
output sandbox dir: ./cream-41.pd.infn.it_8443_CREAM187996258
invoking jobPurge
executing command: /usr/bin/glite-ce-job-purge --noint https://cream-41.pd.infn.it:8443/CREAM187996258
Job terminated with status DONE-OK
Notice the
output sandbox dir: ./cream-41.pd.infn.it_8443_CREAM187996258. This is the output sandbox directory containing all the produced files:
$ ls -la ./cream-41.pd.infn.it_8443_CREAM187996258
total 12
drwxr-xr-x 2 root root 4096 17 gen 16:20 .
dr-xr-x---. 23 root root 4096 17 gen 16:20 ..
-rw------- 1 root root 0 17 gen 16:20 std.err
-rw------- 1 root root 26 17 gen 16:20 std.out
cream_jobCancel.py
This metric submits a job directly to the selected CREAM CE, waits until the job gain the REALLY-RUNNING state and then tries to cancel it. Finally it checks if the job has been correctly canceled.
$ ./cream_jobCancel.py --help
Usage: cream_jobCancel.py [options]
Options:
--version show program's version number and exit
-h, --help show this help message and exit
-H HOSTNAME, --hostname=HOSTNAME
The hostname of the CREAM service.
-p PORT, --port=PORT The port of the service. [default: none]
-x PROXY, --proxy=PROXY
The proxy path
-v, --verbose verbose mode [default: False]
-u URL, --url=URL The status endpoint URL of the service. Example:
https://<host>[:<port>]/cream-<lrms>-<queue>
-l LRMS, --lrms=LRMS The LRMS name (e.g.: 'lsf', 'pbs' etc)
-q QUEUE, --queue=QUEUE
The queue name (e.g.: 'creamtest')
-j JDL, --jdl=JDL The jdl path
The job must be enough long in terms of execution time, in order to allow the probe to check the current job status and invoke the glite-ce-job-cancel.
For example consider the job (i.e. hostnane.jdl) of the above metric. In this case the probe will fail because the job already terminated before the execution of the e glite-ce-job-cancel command:
$ ./cream_jobCancel.py --url https://cream-41.pd.infn.it:8443/cream-lsf-creamtest1 --jdl ./hostname.jdl
job already terminated
Now consider the following job:
$ cat ./sleep.jdl
[
Type="Job";
JobType="Normal";
Executable = "/bin/sleep";
Arguments = "200";
StdOutput = "cream.out";
StdError = "cream.out";
]
The output of the probe should be like:
$ ./cream_jobCancel.py --url https://cream-41.pd.infn.it:8443/cream-lsf-creamtest1 --jdl ./sleep.jdl
job cancelled
or like this with --verbose option specified:
$ ./cream_jobCancel.py --url https://cream-41.pd.infn.it:8443/cream-lsf-creamtest1 --jdl ./sleep.jdl --verbose
executing command: /usr/bin/voms-proxy-info -timeleft
executing command: /usr/bin/glite-ce-job-submit -d -a -r cream-41.pd.infn.it:8443/cream-lsf-creamtest1 ./sleep.jdl
['2014-01-16 17:22:49,728 DEBUG - Using certificate proxy file [/tmp/x509up_u0]\n', '2014-01-16 17:22:49,744 DEBUG - VO from certificate=[dteam]\n', '2014-01-16 17:22:49,745 WARN - No configuration file suitable for loading. Using built-in configuration\n', '2014-01-16 17:22:49,745 DEBUG - Logfile is [/tmp/glite_cream_cli_logs/glite-ce-job-submit_CREAM_root_20140116-172249.log]\n', '2014-01-16 17:22:49,747 INFO - certUtil::generateUniqueID() - Generated DelegationID: [69e3bdaf4e818e1f71f1b7ff442f74583c869b84]\n', '2014-01-16 17:22:52,165 DEBUG - Registering to [https://cream-41.pd.infn.it:8443/ce-cream/services/CREAM2] JDL=[ StdOutput = "cream.out"; BatchSystem = "lsf"; QueueName = "creamtest1"; Executable = "/bin/sleep"; Type = "Job"; Arguments = "200"; JobType = "Normal"; StdError = "cream.out" ] - JDL File=[./sleep.jdl]\n', '2014-01-16 17:22:52,513 DEBUG - Will invoke JobStart for JobID [CREAM437649288]\n', 'https://cream-41.pd.infn.it:8443/CREAM437649288\n']
job id: https://cream-41.pd.infn.it:8443/CREAM437649288
invoking jobStatus
executing command: /usr/bin/glite-ce-job-status https://cream-41.pd.infn.it:8443/CREAM437649288
job status: REALLY-RUNNING
invoking jobCancel
executing command: /usr/bin/glite-ce-job-cancel --noint https://cream-41.pd.infn.it:8443/CREAM437649288
invoking jobStatus
executing command: /usr/bin/glite-ce-job-status https://cream-41.pd.infn.it:8443/CREAM437649288
job status: REALLY-RUNNING
invoking jobStatus
executing command: /usr/bin/glite-ce-job-status https://cream-41.pd.infn.it:8443/CREAM437649288
job status: CANCELLED
invoking jobPurge
executing command: /usr/bin/glite-ce-job-purge --noint https://cream-41.pd.infn.it:8443/CREAM437649288
job cancelled
cream_jobPurge.py
This metric is analogous of cream_jobCancel.py. It submits a short job (e.g. hostname.jdl), waits its termination (e.g DONE-OK) and then it tries to purge it. Finally, in order to verify the purging operation was successfully executed, the probe checks the job status by executing the glite-ce-job-status command which just in this scenario, must fail because the job doesn't exist anymore.
$ ./cream_jobPurge.py --help
Usage: cream_jobPurge.py [options]
Options:
--version show program's version number and exit
-h, --help show this help message and exit
-H HOSTNAME, --hostname=HOSTNAME
The hostname of the CREAM service.
-p PORT, --port=PORT The port of the service. [default: none]
-x PROXY, --proxy=PROXY
The proxy path
-v, --verbose verbose mode [default: False]
-u URL, --url=URL The status endpoint URL of the service. Example:
https://<host>[:<port>]/cream-<lrms>-<queue>
-l LRMS, --lrms=LRMS The LRMS name (e.g.: 'lsf', 'pbs' etc)
-q QUEUE, --queue=QUEUE
The queue name (e.g.: 'creamtest')
-j JDL, --jdl=JDL The jdl path
$ ./cream_jobPurge.py --url https://cream-41.pd.infn.it:8443/cream-lsf-creamtest1 --jdl ./hostname.jdl
job purged
$ ./cream_jobPurge.py --url https://cream-41.pd.infn.it:8443/cream-lsf-creamtest1 --jdl ./hostname.jdl --verbose
executing command: /usr/bin/voms-proxy-info -timeleft
executing command: /usr/bin/glite-ce-job-submit -d -a -r cream-41.pd.infn.it:8443/cream-lsf-creamtest1 ./hostname.jdl
['2014-01-16 14:02:48,282 DEBUG - Using certificate proxy file [/tmp/x509up_u0]\n', '2014-01-16 14:02:48,301 DEBUG - VO from certificate=[dteam]\n', '2014-01-16 14:02:48,301 WARN - No configuration file suitable for loading. Using built-in configuration\n', '2014-01-16 14:02:48,302 DEBUG - Logfile is [/tmp/glite_cream_cli_logs/glite-ce-job-submit_CREAM_root_20140116-140248.log]\n', '2014-01-16 14:02:48,305 INFO - certUtil::generateUniqueID() - Generated DelegationID: [85a80615fc8046baf6dce2a27b708fc82ecbce55]\n', '2014-01-16 14:02:51,105 DEBUG - Registering to [https://cream-41.pd.infn.it:8443/ce-cream/services/CREAM2] JDL=[ StdOutput = "std.out"; BatchSystem = "lsf"; QueueName = "creamtest1"; Executable = "/bin/hostname"; Type = "Job"; JobType = "Normal"; OutputSandboxBaseDestUri = "gsiftp://localhost"; OutputSandbox = { "std.out","std.err" }; StdError = "std.err" ] - JDL File=[./hostname.jdl]\n', '2014-01-16 14:02:51,504 DEBUG - Will invoke JobStart for JobID [CREAM691625071]\n', 'https://cream-41.pd.infn.it:8443/CREAM691625071\n']
job id: https://cream-41.pd.infn.it:8443/CREAM691625071
invoking jobStatus
executing command: /usr/bin/glite-ce-job-status https://cream-41.pd.infn.it:8443/CREAM691625071
job status: DONE-OK
invoking jobPurge
executing command: /usr/bin/glite-ce-job-purge --noint https://cream-41.pd.infn.it:8443/CREAM691625071
invoking jobStatus
executing command: /usr/bin/glite-ce-job-status https://cream-41.pd.infn.it:8443/CREAM691625071
job purged
--
LisaZangrando - 2014-01-16