Difference: JdlGuide (1 vs. 2)

Revision 22011-04-12 - MassimoSgaravatto

Line: 1 to 1
 
META TOPICPARENT name="UserGuide"
Changed:
<
<
JDL guide
>
>

CREAM JDL guide

1 Introduction

The Job Description Language (JDL) is a high-level, user-oriented language based on Condor classified advertisements (classads) for describing jobs to be submitted to the CREAM CE service. Being the JDL an extensible language the user is allowed to use whatever attribute for the description of a request without incurring in errors from the JDL parser. However, only a certain set of attributes, that we will refer as “supported attributes” from now on, is taken into account by the CREAM CE service.

Some of the attributes in the JDL are mandatory. If the user does not specify them, CREAM cannot handle the request. For the other attributes the system may find a default value if they are necessary for processing the request.

Before starting with the detailed attribute description please note that a request description is composed by entries that are strings having the format

attribute = expression;

and are terminated by the semicolon character. The whole description has to be included between square brackets, i.e. [ ]. The termination with the semicolon is not mandatory for the last attribute before the closing square bracket ].

Attribute expressions can span several lines provided the semicolon is put only at the end of the whole expression. Comments must have a sharp character (#) or a double slash (//) at the beginning of each line. Comments spanning multiple lines can be specified enclosing the text between “/*” and “*/”.

Please note that since CREAM exposes a publicly available WSDL interface, no assumption is made in the document (unless explicitly specified) about the client tool used to submit the JDL description of the job.

2 Request and Job Types

2.1 Type

This a string representing the type of the request described by the JDL, e.g.:

Type = “Job”;

For the time being the only possible value is: Job

The value for this attribute is case insensitive. If this attribute is not specified in the JDL description, the default value (“Job”) will be considered.

Default: “Job”

3 Job Attributes Description

This section reports the detailed description of the JDL attributes that can be specified for describing Job requests. A sub-section for each attribute is provided.

3.1 JobType

This a string representing the type of the job described by the JDL, e.g.:

JobType = “Normal”;

At least for the time being the only possible value is: Normal.

This attribute only makes sense when the Type attribute equals to “Job”. The value for this attribute is case insensitive. If not specified in the JDL, it will be set to “Normal”

Default: “Normal”

3.2 Executable

This is a string representing the executable/command name. The user can specify an executable that lies already on the remote CE and in this case the absolute path, possibly including environment variables referring to this file should be specified, e.g.:

Executable = “/usr/local/java/j2sdk1.4.0_01/bin/java”;

Or:

Executable = “$JAVA_HOME/bin/java”;

The other possibility is to provide either an executable located on a remote gridFTP server accessible by the user (HTTPS servers are also supported but this requires to have the GridSite htcp client command installed on the WN). In both cases the executable file will be staged from the original location to the Computing Element WN. In both cases only the file name has to be specified as executable. The URI of the executable should be then listed in the InputSandbox attribute expression to make it be transferred. E.g. respectively:

Executable = “cms_sim.exe”;
InputSandbox = {“file:///home/edguser/sim/cms_sim.exe”, ……… };

Or:

Executable = “cms_sim.exe”;
InputSandbox = {“gsiftp://neo.datamat.it:5678/tmp/cms_sim.exe”, ……… };

Also descriptions as follows:

Executable = “cms_sim.exe”;
InputSandbox = {“/home/edguser/sim/cms_sim.exe”, ……… };

are accepted and interpreted as in the first case, i.e. the executable file is available on the local file system.

It is important to remark that if the job needs for the execution some command line arguments, they have to be specified through the Arguments attribute. This attribute is mandatory.

Mandatory: Yes

Default: No

3.3 Arguments

This is a string containing all the job command line arguments. E.g. an executable sum that has to be started as:

$ sum  N1 N2 –out result.out

is described by:

Executable = “sum”;
Arguments = “N1 N2 –out result.out”;

If you want to specify a quoted string inside the Arguments then you have to escape quotes with the \ character. E.g. when describing a job like:

$ grep –i “my name” *.txt

you will have to specify:

Executable = “/bin/grep”;
Arguments = “-i \”my name\” *.txt”;

Analogously, if the job takes as argument a string containing a special character (e.g. the job is the tail command issued on a file whose name contains the ampersand character, say file1&file2), since on the shell line you would have to write:

$ tail –f file1\&file2 

in the JDL you’ll have to write:

Executable = “/usr/bin/tail”;
Arguments = “-f file1\\\&file2”;

i.e. a \ for each special character. In general, special characters such as:

&, |, >, < 
are only allowed if specified inside a quoted string or preceded by triple \. The character ` cannot be specified in the JDL.

Mandatory: No

Default: No

3.4 StdInput

This is a string representing the standard input of the job. . This means that the job is run as follows:

$ executable < <standard input file>

It can be an absolute path possibly including environment variables (wild cards are instead not allowed), i.e. it is already available on the CE, e.g.

StdInput = “/var/tpm/jobInput”;

or just a file name, e.g.

StdInput = “myjobInput”;

and this means that file needs to be made available on the WN where the job is run. Therefore the standard input file has to be added to the InputSandbox file list so that it will be downloaded on the WN. The same rules described for the Executable attribute apply to StdInput.

Mandatory: No

Default: No

3.5 StdOutput

This is a string representing the file name where the standard output of the job is saved. The user can specify either a file name or a relative path (with respect to the job working directory on the WN), e.g.:

StdOutput = “myjobOutput”;

StdOutput = “event1/myjobOutput”;

Wild cards are not allowed. The value specified for StdError can be the same as the one for StdOutput: this means that the two standard streams of the job are saved in the same file. The user can choose to have this file staged automatically on a GridFTP server specifying a URI for that file in the OutputSandbox attribute expression. E.g.:

StdOutput = “myjobOutput”;
OutputSandbox = {
“gsiftp://fox.infn.it:5678/home/gftp/myjobOutput”,
…
};

indicates that myjobOutput when the job has completed its execution has to be transferred on gsiftp://fox.infn.it:5678 in the /home/gftp directory.

Mandatory: No

Default: No

StdError

This is a string representing the file name where the standard error of the job is saved. The user can specify either a file name or a relative path (with respect to the job working directory on the WN), e.g.:

StdError = “myjobError”;

StdError = “event1/myjobError”;

Wild cards are not allowed. The value specified for StdError can be the same as the one for StdOutput: this means that the two standard streams of the job are saved in the same file. The user can choose to have this file staged automatically on a GridFTP server specifying a URI for that file in the OutputSandboxDestURI attribute expression The same rules as for the StdOutput apply to StdError.

Mandatory: No

Default: No

3.6 InputSandbox

This is a string or a list of strings identifying the list of files available on the file system of the client (UI) machine and/or on an accessible gridFTP server (HTTPS servers are also supported but this requires to have the GridSite htcp client command installed on the WN) needed by the job for running. These files hence have to be transferred to the WN before the job is started. Wildcards and environment variables are admitted in the specification of this attribute only if the submission takes place through a client able to resolve them locally before passing the JDL to the CREAM service (e.g. this is the case for the CREAM CLI). Admitted wildcard patterns are the ones supported by the Linux glob function. One can remove the special meaning of the characters:
'? ', '*', ' and ['
by preceding them by a backslash.

File names can be provided as URI on a gridFTP/HTTPS server, simple file names, absolute paths and relative paths with respect to the current UI working directory. The InputSandbox file list cannot contain two or more files having the same name (even if in different paths) as when transferred in the job’s working directory on the WN they would overwrite each other. This attribute can also be used to accomplish executable and standard input staging to the CE where job execution takes place as explained above. The InputSandbox attribute meaning is strictly coupled with the value of the InputSandboxBaseURI defined in the following that specifies a common location on a gridFTP/HTTPS server where files in the InputSandbox not specified as URI are located.

Support for file transfer from gridftp servers running using user credentials instead of host credentials is also provided1. In this case the distinguish name of such user credentials must be specified in the URI using:

 ?DN=<distinguish name>

as shown in the example below.

Here below follows an example of InputSandbox setting:

InputSandbox = {
"/tmp/ns.log", 
"mytest.exe",
"myscript.sh",
"data/event1.txt",
"gsiftp://neo.datamat.it:5678/home/fpacini/cms_sim.exe ",
"file:///tmp/myconf",                      
"gsiftp://lxsgaravatto.pd.infn.it:47320/etc/fstab?DN=/C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Massimo Sgaravatto"
};
InputSandboxBaseURI = "gsiftp://matrix.datamat.it:5432/tmp";

It means that:

  • /tmp/ns.log is located on the UI machine local file system
  • mytest.exe , myscript.sh and data/event1.txt are available on gsiftp://matrix.datamat.it:5432 in the /tmp directory
  • /tmp/myconf is located on the user local file system (explicitly specified using the file:// prefix)
  • /etc/fstab is available on gsiftp://lxsgaravatto.pd.infn.it:47320 which is a gridftp server running using user credentials (with the specified distinguish name)

If the InputSandboxBaseURI is not specified than also mytest.exe, myscript.sh and data/event1.txt would be interpreted as located on the user local file system

Mandatory: No

Default: No

3.7 InputSandboxBaseURI

This is a string representing the URI on a gridFTP server (HTTPS servers are also supported but this requires to have the GridSite htcp client command installed on the WN) where the InputSandbox files that have been specified as simple file names and absolute/relative paths are available for being transferred on the WN before the job is started. E.g.

InputSandbox = {
 …
 "data/event1.txt",
 …
 };
InputSandboxBaseURI = "gsiftp://matrix.datamat.it:5432/tmp";

makes CREAM consider

"gsiftp://matrix.datamat.it:5432/tmp/data/event1.txt" 

for the transfer on the WN.

Support for file transfer from gridftp servers running using user credentials instead of host credentials is also provided1. In this case the distinguish name of such user credentials must be specified in the URI using:

 ?DN=<distinguish name>

as shown in the example below.

E.g.

InputSandbox = {
 …
 "data/event2.txt",
 …
 };
InputSandboxBaseURI  = "gsiftp://lxsgaravatto.pd.infn.it:47320/tmp?DN=/C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Massimo Sgaravatto";

makes CREAM consider

"gsiftp://lxsgaravatto.pd.infn.it:47320/tmp/data/event2.txt" 

for the transfer on the WN, where that gridftp server has been started using user credentials (with the specified distinguish name)

Mandatory: No

Default: No


## OutputSandbox

This is a string or a list of strings identifying the list of files generated by the job on the WN at runtime, which the user wants to save. This attribute can be combined with the OutputSandboxDestURI or the OutputSandboxBaseDestURI to have, upon job completion, the output directly copied to specified locations running a gridFTP server (HTTPS servers are also supported but this requires to have the GridSite htcp client command installed on the WN).

Wildcards are admitted in the specification of this attribute only if the OutputSandboxBaseDestURI attribute is used along with the OutputSandbox attribute. Admitted wildcard patterns are the ones supported by the Linux glob function. One can remove the special meaning of the characters:

 '? ', '*' and ' ['
by preceding them by a backslash.

File names can be provided as simple file names or relative paths with respect to the current working directory on the executing WN. The OutputSandbox file list should not contain two or more files having the same name (even if in different paths).

Mandatory: No

Default: No


## OutputSandboxDestURI This is a string or a list of strings representing the URI(s) on a gridFTP/HTTPS server where the files listed in the OutputSandbox attribute have to be transferred at job completion.

The OutputSandboxDestURI list contains for each of the files specified in the OutputSandbox list the URI (including the file name) where it has to be transferred at job completion. Support for file transfer to gridftp servers running using user credentials instead of host credentials is also provided1. In this case the distinguish name of such user credentials must be specified in the URI using:

 ?DN=<distinguish name>

as shown in the example below. E.g.

OutputSandbox = {
"myjobOutput",
"run1/event1",
"run2/event2",
};

OutputSandboxDestURI = {
"gsiftp://matrix.datamat.it:5432/tmp/myjobOutput ",
"gsiftp://grid003.ct.infn.it:6789/home/cms/event1",
"gsiftp://lxsgaravatto.pd.infn.it:47320/tmp/event2?DN=/C=IT/O=INFN/
OU=Personal Certificate/L=Padova/CN=Massimo Sgaravatto"
};

makes CREAM transfer respectively:

  • myjobOutput on matrix.datamat.it in the directory /tmp
  • event1 on grid003.ct.infn.it in the directory /home/cms
  • event2 on lxsgaravatto.pd.infn.it (gridftp server running using user credentials, with the specified distinguish name) in the directory /tmp

Specifying the URI gsiftp://localhost, the OutputSandboxFile is saved in the gridftp server of the CREAM CE, as shown in the following example:

OutputSandbox = {
"file1",
"file2",
};
OutputSandboxDestURI = {
"gsiftp://localhost/file1",
"gsiftp://grid003.ct.infn.it:6789/home/cms/file2"
};

In the above example file1 is saved on the gridftp server of the CREAM CE, while file2 is saved on grid003.ct.infn.it (directory /home/cms).


3.8 OutputData

This attribute allows the user to ask for the automatic upload and registration to the Replica Catalog of datasets produced by the job on the WN. Through this attribute it is possible to indicate for each output file the LFN (Logical Fine Name) to be used for registration and the SE (Storage Element) on which the file has to be uploaded. The OutputData attribute is not mandatory.

OutputData is a list of classads where each classad contains the following three attributes:

  • OutputFile
  • StorageElement
  • LogicalFileName

These three attributes are only admitted if members of one of the classads composing OutputData. They cannot be specified independently in the job JDL.

Here below follows an example of the OutputData attribute:

OutputData = {
[
            OutputFile = "dataset_1.out ";
            LogicalFileName = "lfn:/test/result1";
 ],
[
            OutputFile = "dataset_2.out ";
            StorageElement = "se001.cnaf.infn.it";
 ],
]
            OutputFile = "cms/dataset_3.out";
            StorageElement = "se012.to.infn.it";
            LogicalFileName = "lfn:/cms/outfile1";
 ],
[
            OutputFile = "dataset_4.out ";
 ]
      };

If the attribute OutputData is found in the JDL then the JobWrapper at the end of the job calls the Data Management service that copies the file from the WN onto the specified SE and registers it with the given LFN. If the specified LFN is already in use, the DM service registers the file with a newly generated identifier GUID (Grid Unique Identifier).

During this process the JobWrapper creates a file (named DSUpload_.out) with the results of the operation that is put automatically in the OutputSandbox attribute list by the UI and can then be retrieved by the user.

Mandatory: No

Default: No

3.8.1 OutputFile

This is a string attribute representing the name of the output file, generated by the job on the WN, which has to be automatically uploaded and registered by the WMS. Wildcards are not admitted in the specification of this attribute. File names can be provided as simple file names, absolute paths or relative paths with respect to the current working directory.

Mandatory: Yes (only if OutputData has been specified)

Default: No

3.8.2 StorageElement

This is a string representing the URI of the Storage Element where the output file specified in the corresponding OutputFile attribute has to be uploaded by the WMS.

Mandatory: No

Default: No

3.8.3 LogicalFileName

This is a string representing the logical file name (LFN) the user wants to associate to the output file when registering it to the Replica Catalogue. The specified name has to be prefixed by “lfn:” (lowercase). If this attribute is not specified then the corresponding output file is registered with a GUID that is assigned automatically by the Data Management services.

Mandatory: No

Default: No (If not specified a GUID is assigned by DM services)

  -- MassimoSgaravatto - 2011-04-07 \ No newline at end of file

Revision 12011-04-07 - MassimoSgaravatto

Line: 1 to 1
Added:
>
>
META TOPICPARENT name="UserGuide"
JDL guide

-- MassimoSgaravatto - 2011-04-07

 
This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback