System Administrator Guide for WMS for EMI

1 Installation and configuration

1.1 Prerequisites

1.1.1 Operating System

A standard x86_64 SL(C)5 distribution is supposed to be properly installed. An EPEL repository must be installed on the machine.

1.1.2 Node synchronization

A general requirement for the Grid nodes is that they are synchronized. This requirement may be fulfilled in several ways. One of the most common one is using the NTP protocol with a time server.

1.1.3 Cron and logrotate

Many components deployed on the WMS rely on the presence of cron (including support for /etc/cron.* directories) and logrotate. You should make sure these utils are available on your system.

1.2 Installation

1.2.1 Repositories

For a successful installation, you will need to configure your package manager to reference a number of repositories (in addition to your OS);

  • the EPEL repository
  • the EMI middleware repository
  • the CA repository
and to REMOVE () or DEACTIVATE (!!!)

  • the DAG repository

1.2.1.1 The EPEL repository

You can install the EPEL repository, issuing:

rpm -Uvh http://download.fedora.redhat.com/pub/epel/5/i386/epel-release-5-4.noarch.rpm

1.2.1.2 The EMI middleware repository


The EMI-1 repository can be found under http://emisoft.web.cern.ch/emisoft/dist/EMI/1/sl5/x86_64/

To use yum, the yum repo to be installed in /etc/yum.repos.d can be found at http://emisoft.web.cern.ch/emisoft/

The packages are signed with the EMI gpg key, that can be downloaded from http://emisoft.web.cern.ch/emisoft/dist/EMI/1/RPM-GPG-KEY-emi. To import it:

[root@emi-demo11 ~]# wget http://emisoft.web.cern.ch/emisoft/dist/EMI/1/RPM-GPG-KEY-emi -O /tmp/emi-key_gd.asc
[root@emi-demo11 ~]# rpm --import /tmp/emi-key_gd.asc

1.2.1.3 The Certification Authority repository

The most up-to-date version of the list of trusted Certification Authorities (CA) is needed on your node. The relevant yum repo can be installed issuing:

wget http://repository.egi.eu/sw/production/cas/1/current/repo-files/egi-trustanchors.repo -O /etc/yum.repos.d/egi-trustanchors.repo

1.2.1.4 Important note on automatic updates

An update of an RPM not followed by configuration can cause problems. Therefore WE STRONGLY RECOMMEND NOT TO USE AUTOMATIC UPDATE PROCEDURE OF ANY KIND.


Running the script available at http://forge.cnaf.infn.it/frs/download.php/101/disable_yum.sh (implemented by Giuseppe Platania (INFN Catania) yum autoupdate will be disabled

1.2.2 Installation of a WMS node

First of all, install the yum-protectbase rpm:

yum install yum-protectbase.noarch

Then proceed with the installation of the CA certificates.

1.2.2.1 Installation of the CA certificates

The CA certificate can be installed issuing:

yum install ca-policy-egi-core 

1.2.2.2 Installation of the WMS software

Install the WMS metapackage:

yum install emi-wms

1.3 Configuration

1.3.1 Using the YAIM configuration tool


For a detailed description on how to configure the middleware with YAIM, please check the YAIM guide.


The necessary YAIM modules needed to configure a certain node type are automatically installed with the middleware.

1.3.2 Configuration of a WMS node

1.3.2.1 Install host certificate


The WMS node requires the host certificate/key files to be installed. Contact your national Certification Authority (CA) to understand how to obtain a host certificate if you do not have one already.


Once you have obtained a valid certificate:

  • hostcert.pem - containing the machine public key
  • hostkey.pem - containing the machine private key
make sure to place the two files in the target node into the /etc/grid-security directory. Then set the proper mode and ownerships doing:
chown root.root /etc/grid-security/hostkey.pem

chmod 600 /etc/grid-security/hostcert.pem

chmod 400 /etc/grid-security/hostkey.pem

chown root.root /etc/grid-security/hostcert.pem

1.3.2.2 Configure the siteinfo.def file

Set your siteinfo.def file, which is the input file used by yaim. The yaim variables relevant for WMS are the following:

  • $WMS_HOST -> the WMS hostname, ex. : 'egee-rb-01.$MY_DOMAIN'
  • $PX_HOST -> the hostname of a server myproxy, ex.: 'myproxy.$MY_DOMAIN'
  • $BDII_HOST -> the hostname of the site bdii to be used, ex: 'sitebdii.$MY_DOMAIN'
  • $LB_HOST -> the hostname of the LB server to be used, ex: 'lb-server.$MY_DOMAIN:9000' This variable is set as a service specific variable in the file services/glite-wms, located one directory below the one where the 'site-info.def' file is locate

If an LB has to be installed in colocation on the same server (LBProxy = both), the following parameters have to be set in siteinfo/services/glite-wms file:
  • LB_HOST = "WMS hostanme:port" ex: "devel11.cnaf.infn.it:9000"
  • GLITE_LB_TYPE = both
and the following parameters have to be set in siteinfo.def file:
  • GLITE_LB_AUTHZ_REGISTER_JOBS = ".*"
  • GLITE_LB_WMS_DN = "the WMS DN" ex: "/C=IT/O=INFN/OU=Host/L=CNAF/CN=devel09.cnaf.infn.it"

1.3.2.3 Configure SELinux

In order for the httpd section of yaim to be run correctly, SELinux should be disabled, or, as an alternative, the certificates should be correctly labelled, see:

http://docs.fedoraproject.org/en-US/Fedora/13/html/Security-Enhanced_Linux/sect-Security-Enhanced_Linux-Working_with_SELinux-SELinux_Contexts_Labeling_Files.html

In WMS EMI 3 SL6 (glite-wms-interface-3.5.0-6.sl6.x86_64) in order for the httpd section of yaim to be run correctly, SELinux should be disabled, or, as an alternative, the workaround described above must be applyed.

  • Context (EMI 3 SL6):
       # cat /etc/emi-version
       3.0.0-1
       # rpm -qa |grep gridsite
       gridsite-2.0.4-1.el6.x86_64
       gridsite-libs-2.0.4-1.el6.x86_64
       # rpm -qa |grep selinux
       libselinux-2.0.94-5.3.el6.x86_64
       libselinux-utils-2.0.94-5.3.el6.x86_64
       libselinux-python-2.0.94-5.3.el6.x86_64
       selinux-policy-3.7.19-155.el6_3.14.noarch
       selinux-policy-targeted-3.7.19-155.el6_3.14.noarch
       libselinux-ruby-2.0.94-5.3.el6.x86_64
       

SELINUX enabled:

   # getenforce
   Enforcing
   # ls -Z /var/lib/glite/.certs/host*.pem
   -rw-r--r--. glite glite user_u:object_r:var_lib_t:s0 /var/lib/glite/.certs/hostcert.pem
   -r--------. glite glite user_u:object_r:var_lib_t:s0 /var/lib/glite/.certs/hostkey.pem
   

  • Sympthom of the problem
       # /etc/init.d/glite-wms-wmproxy restart
       Restarting /usr/bin/glite_wms_wmproxy_server... ko
       

WORKAROUND:

   /usr/sbin/setsebool httpd_can_network_connect=1
   /usr/sbin/semanage port -a -t http_port_t -p tcp 7443

   # /etc/init.d/glite-wms-wmproxy restart
   Restarting /usr/bin/glite_wms_wmproxy_server... ok
   

1.3.2.4 Run yaim

After having filled the siteinfo.def file, run yaim:

/opt/glite/yaim/bin/yaim -c -s <site-info.def> -n WMS

1.3.3 Configuration of the WMS CLI

The WMS CLI is part of the EMI-UI. To configure it please refer to xxx.

2 Operating the system

2.1 How to start the WMS service

A system administrator can start the WMS service by issuing:

service gLite start

A system administrator can stop the WMS service by issuing:

service gLite stop

2.2 Daemons

Scripts to check the daemons status and to start/stop are located in the ${GLITE_WMS_LOCATION}/etc/init.d/ directory (i.e. ${GLITE_WMS_LOCATION}/etc/init.d/glite-wms-wm start/stop/status). Glite production installation also provide a more generic service, called gLite, to manage all of them simultaneously, try service gLite status/start/stop On a typical WMS node the following services must be running:

  • glite-lb-locallogger:
     glite-lb-logd running
    glite-lb-interlogd running
  • glite-lb-proxy:
     glite-lb-proxy running as 4137
  • glite-proxy-renewald:
     glite-proxy-renewd running
  • globus-gridftp:
     globus-gridftp-server (pid 3107) is running...
  • glite-wms-jc:
     JobController running in pid: 10008
    CondorG master running in pid: 10063 10062
    CondorG schedd running in pid: 10070
  • glite-wms-lm:
     Logmonitor running...
  • glite-wms-wm:
     /opt/glite/bin/glite-wms-workload_manager (pid 9957) is running...
  • glite-wms-wmproxy:
    WMProxy httpd listening on port 7443
    httpd (pid 22223 22222 22221 22220 22219 22218 22217) is running ....
    ===
    WMProxy Server running instances:
    UID PID PPID C STIME TTY TIME CMD
  • glite-wms-ice:
    /opt/glite/bin/glite-wms-ice-safe (pid 10103) is running...

2.3 Init scripts

The init scripts are located under /etc/init.d and are the following:

/etc/init.d/globus-gridftp
/etc/init.d/glite-wms-wmproxy
/etc/init.d/glite-wms-wm
/etc/init.d/glite-wms-lm
/etc/init.d/glite-wms-jc
/etc/init.d/glite-wms-ice
/etc/init.d/glite-proxy-renewald
/etc/init.d/glite-lb-locallogger
/etc/init.d/glite-lb-bkserverd

2.4 Configuration Files

The configuration files are located under /etc/glite-wms and are the following:

/etc/glite-wms/glite_wms.conf
/etc/glite-wms/glite_wms_wmproxy.gacl
/etc/glite-wms/glite_wms_wmproxy_httpd.conf
/etc/glite-wms/wmproxy_logrotate.conf
/etc/glite-wms/.drain

For configuration files related to other services running on the wms node, please refer to Service Reference Card.

2.4.1 glite_wms.conf

This is the general configuration file for the WMS. The syntax is based on the ClassAd language. The parameter names are case insensitive. It is organised in sections: one for every running service plus a common section.

[
    Common = [...];
    JobController = [...];
    LogMonitor = [...];
    NetworkServer = [...];
    WorkloadManager = [...];
    WorkloadManagerProxy = [...];
    ICE = [...]
]

The value of a parameter can be expressed in terms of environment variables, with the typical UNIX shell syntax: a $ sign followed by the name of the variable in brackets (e.g. ${HOME}).

2.4.1.1 Common section

In general there is no need to change this section.

DGUser: the user under which a WMS process runs

LBProxy: Boolean attribute to switch from LB and LBProxy. If the value of this attribute is true, LBProxy is used for logging and query operations about jobs

HostProxyFile (no default): the host proxy certificate file

2.4.1.2 WorkloadManagerProxy section

Very important paramenters are those that configure the so called limiter ( OperationLoadScripts) used to inhibit submission in the case that some system load limits are hit.
Since the WMS and LB suggested deployment is to have them on two separate physical machines a foundamental parameter is also LBServer.

The relevant parameters available in this section are the following:

* MaxInputSandboxSize = 10000000; this puts a PER FILE limit in the dimension of the input sandbox of the jdl. Units are byte

LogFile: String attribute containing the path of the WMProxy log file

LogLevel: Integer attribute containing a value from 0 to 6 (Optional). The integer value represents the WMProxy log file verbosity level: from 0 (fatal) to 6 (debug: maximum verbosity)

SandboxStagingPath: Root directory where job sandboxes are stored. It MUST be in the form: <DocumentRoot>/<single directory name>, where DocumentRoot is set as inside glite_wms_wmproxy_httpd.conf configuration file. The directory MUST be accessible by the user under which WMProxy is running (usually it is the "glite" user). The user running WMProxy is determined by the value of the environment variable GLITE_USER, if not differently set with User directive inside glite_wms_wmproxy_httpd.conf configuration file

ListMatchRootPath: Directory path where temporary pipes for list-match operations are created. The directory MUST be accessible by the user under which WMProxy is running (usually it is the "glite" user). The user running WMProxy is determined by the value of the environment variable GLITE_USER, if not differently set with User directive inside glite_wms_wmproxy_httpd.conf configuration file

GridFTPPort: Port number where gridFTP server is listening

MinPerusalTimeInterval: Integer value representing the time interval (in seconds) between two savings of job partial execution output. This attribute affects the WMProxy and other components behaviour only if perusal functionality are explicity requested by the user via the JDL, see EnableFilePerusal JDL attribute

LBServer: Address or list of addresses of the LB Server[s] to be used for storing job's information in the format of <host>[:<port>] (default value for port is 9000). Selection of the LB Server to use is made randomically from the list by the WMProxy, for any different service request. WMproxy maintains a list of weights associated to the available LB Servers so that failing LB Servers have decreasing probability of being selected. If the Service Discovery is enabled, the LB Servers found using the Service Discovery are added in the list.

Note that the following lines have same meaning:

LBServer = "ghemon.cnaf.infn.it:9000";
LBServer = {"ghemon.cnaf.infn.it:9000"};

WeightsCacheValidity: Time in seconds (n) indicating the validity of the weights (i.e. probability to be selected) associated to the available LB Servers. When last weights update (i.e. last received request) has occurred more than n seconds ago then the weights are restored to the same value for all LB Servers

DISMISSED IN EMI2 LBLocalLogger: address of LB Local Logger in the format of <host>[:<port>] (default value for port is 9002). This attribute is needed only if LB Local Logger runs on another host and LBProxy is not enabled. Removed starting from EMI2 releases.

AsyncJobStart: Boolean attribute used to switch from synchronous/asynchronous job start behavior. When set to true, during job start operation the control is returned to user immediately after the request has been received, while the actual execution of the operation (that could be quite time consuming) is performed asynchronously

EnableServiceDiscovery: Boolean attribute to enable Service Discovery. If the value of this attribute is true, the Service Discovery is enabled, i.e. WMProxy invokes Service Discovery for finding available LB Servers

ServiceDiscoveryInfoValidity: Time in seconds (n) indicating the validity of the information provided by the Service Discovery. A call to Service Discovery for updated information is done every n seconds

LBServiceDiscoveryType: Type key for LB Servers to be discovered by Service Discovery

MaxServedRequests: Long attribute limiting the number of operation served by each WMProxy instance before exiting and releasing possibly allocated memory. This value is overriden by GLITE_WMS_WMPROXY_MAX_SERVED_REQUESTS environment variable, if set. This feature can be disabled by setting a lower-or-equal to zero value

OperationsLoadScripts: ClassAd type attribute where an internal attribute can be specified for any WMProxy provided operation. The names of these attributes are equal to the names of the server operations (e.g. for jobSubmit operation the attribute name to use is "jobSubmit"). This internal attributes are used to provide the path and the name of the script to be executed to verify the load of the WMProxy server for any provided operation. If the server load is too high the requested operation is refused. The path and the name of the script can be followed by user defined options and parameters depending on the specific script needs for arguments.

WMProxy provide a load script that can be used for any of the provided operations. The template load script glite_wms_wmproxy_load_monitor.template is installed by the rpm file glite-wms-wmproxy in the directory ${WMS_LOCATION_SBIN}.

To call the script glite_wms_wmproxy_load_monitor, when the operation jobSubmit is requested, with the options:

--load1 10 --load5 10 --load15 10 --memusage 95 --diskusage 95 --fdnum 500

add the attribute:

OperationsLoadScripts [
   jobSubmit = "${WMS_LOCATION_SBIN}/glite_wms_wmproxy_load_monitor 
      --oper jobSubmit --load1 10 --load5 10 --load15 10 
      --memusage 95 --diskusage 95 --fdnum 500";
]

Any kind of load script file can be used. If a user custom script is used, the only rule to follow is that the script exit value must be 0 in the case the operation can continue the execution, 1 in the opposite case (operation refused - Server load too high).
The script files must be executable and must have the proper access permissions

2.4.1.3 Workload Manager section

Important parameters in this section are:

DISMISSED IN EMI2 EnableBulkMM = true; //enable the bulk matchmaking for collection

NEW IN EMI2 EnableReplanner = ; // The job replanner can now be toggled by configuration. The replanning feature is not always used, and in some cases it can show problems with queries to the LB, in case of high load. For this reason it is now disabled by default.

IsmUpdateRate = 600; // information supermarket update rate (in seconds)
WorkerThreads = 5; // enable the multithread for the WM component. Speed up the matchmaking process. 5 is a god compromise between machine load and speed.

CeForwardParameters: the parameters forwarded by the WM to the CE

CeMonitorAsynchPort: the port used to listen to notification arriving from CEMon's. A value of -1 means that listening is disabled

CeMonitorServices: the list of CEMon's the WM listens to

DISMISSED IN EMI2 DispatcherType: the WM can read its input using different mechanisms. Currently supported types are "filelist" and "jobdir". Removed starting from EMI2 releases, only jobdir is supported.

EnableRecovery: specifies if at startup the WM should perform a special recovery procedure for the requests that it finds already in its input.

ExpiryPeriod: the maximum time, expressed in seconds, a submitted job is kept in the overall system, from the time it arrives for the first time at the WM

Input: the input source of new requests. If DispatcherType is "filelist" the source is a file; if DispatcherType is "jobdir" the source is the base dir of a JobDir structure, which is supposed to be already in place when the WM starts. A JobDir structure consists of a base dir under which lie other three subdirectories, named tmp, new, old

IsmBlackList: a list of CEs that have to be excluded in the ISM

IsmDump: if the ISM dump is enabled, the dump, in ClassAd format, will be written to this file. In order to avoid file corruptions, the contents of a dump are built in a temporary file, whose name is the same value of this parameter with the prefix ".tmp|, which only at the end of the operation is renamed to the specified file

IsmIiPurchasingRate: the period between two ISM purchases from the BDII, in seconds

IsmThreads: All the threads releated to the ISM management are taken from the thread pool or created separately

IsmUpdateRate: the period between two updates of the ISM, in seconds. Note that conceptually purchasing just retrieves the list of available resources, whereas an ISM update gathers the resource information for each resource.

JobWrapperTemplateDir: the job wrapper sent to the CE and then executed on Worker Node is based on a bash template which is augmented by the WM with job-specific information. This is the location where all the templates - one at the moment - are stored

LogFile: the name of the file where messages are logged

LogLevel: each logging statement in the code specifies a log level. If that level is less than or equal to LogLevel the message is actually logged, otherwise it is ignored. The levels go from 1 (minimum verbosity) to 6 (maximum verbosity)

MatchRetryPeriod: once a job becomes pending, meaning that there are no resources available, this parameter represents the period between successive match-making attempts, in seconds

MaxOutputSandboxSize: the maximum size of the output sandbox, in bytes. The limit is currently enforced by the job wrapper running on the Worker Node, which doesn't upload more data than what specified here. If the value is -1 there is no limit.

MaxRetryCount: the system limit to the number of deep resubmissions for a job. The actual limit is the minimum between this value and the one specified in the job description

QueueSize: (def=1000) Size of the queue of events "ready" to be managed by the workers thread pool

RuntimeMalloc: allows to the use an alternative malloc library (examples are nedmalloc, google performance tools, ccmalloc), specifying the path to the shared object, to be loaded with LD_PRELOAD. Example: RuntimeMalloc = "/usr/lib64/libtcmalloc_minimal.so".

WorkerThreads: the number of request handler threads

ReplanGracePeriod (3600): the minimum time a job should be in status 'scheduled' after being evaluated for replanning

MaxReplansCount (5): the maximum number of replans a job should undergo before being terminated

NEW IN EMI2 SbRetryDifferentProtocols (false): if different protocols should be used when retrying failed or hanging transfers of ISB or OSB in the jobwrapper. See also bug https://savannah.cern.ch/bugs/?48479

WmsRequirements: This expression is appended in && to the user requirements. It contains both WMS typical requirements and queue requirements, such as authZ checks.

Example: requirements = (userrequirements) && (wmsrequirements); The default value for this attribute (set by yaim) is:

WmsRequirements = ((ShortDeadlineJob =?= TRUE) ? RegExp(".sdj$", other.GlueCEUniqueID) : !RegExp(".sdj$", other.GlueCEUniqueID)) && (other.GlueCEPolicyMaxTotalJobs == 0 || other.GlueCEStateTotalJobs < other.GlueCEPolicyMaxTotalJobs) && (EnableWmsFeedback =?= TRUE ? RegExp("cream", other.GlueCEImplementationName, "i") : true);

PropagateToLRMS: this expression is propagated to the LRMS

2.4.1.4 LogMonitor section

Usually there is no need to change the default parameters with the exceptio of: RemoveJobFiles = true;
That by default is set to false.
Setting it to true will force condor to remove unused internal files when the job are in a final state.

The relevant parameters available in this section are the following:

LogFile: String attribute containing the path of the JobController log file

LogLevel: Log verbosity level. If that level is less than or equal to LogLevel the message is actually logged, otherwise it is ignored. The level goes from 1 (minimum verbosity) to 6 (maximum verbosity)

LockFile: Path of the lock file for the service

CondorLogDir: Path of the directory where LogMonitor stores the CondorG log files

CondorLogRecycleDir: Path of the directory where old CondorG log files are saved

JobsPerCondorLog: Max number of job logged in the same CondorG log file

GlobusDownTimeout: Log monitor waits this number of seconds before considering as failed a condor job which has lost contact with the CE and so resubmitting it if possible

MainLoopDuration: LogMonitor loops between the CondorLog files every this number of seconds

MonitorInternalDir: Path of the directory where LogMonitor stores its own files

IdRepositoryName: Name of the file containing pieces of information about the jobs used by LogMonitor and JobController

ExternalLogFile: Path of the directory where extra log files are stored

RemoveJobFiles: If sets to true all files used to submit jobs to condor are removed when they are no more necessary. Set it to "false" only for debug purpose. Files are stored in the SubmitFileDir directory as it is set in the JobController section

2.4.1.5 Job Controller section

Usually there is no need to change the default parameters.

The relevant parameters available in this section are the following:

LogFile: String attribute containing the path of the JobController log file

LogLevel: Log verbosity level. If that level is less than or equal to LogLevel the message is actually logged, otherwise it is ignored. The level goes from 1 (minimum verbosity) to 6 (maximum verbosity)

LockFile: Path of the lock file for the service

CondorSubmit: Path of the "condor_submit" command

CondorRemove: Path of the "condor_remove" command

CondorDagman: Path of the "condor_dagman" command

DagmanMaxPre: Sets the maximum number of PRE scripts within the DAG that may be running at one time; it is the "-maxpre" parameter of the condor_dagman command

MaximumTimeAllowedForCondorMatch: Sets the number of seconds that a job can wait in the condor queue to be matched before being resubmitted

ContainerRefreshThreshold: Number of jobs that JobController can take in memory before resyncronizing its container with the one phisically saved in the file "IdRepositoryName" (see LM section)

DISMISSED IN EMI2 InputType: The JobController can read its input using different mechanisms. Currently supported types are "filelist" and "jobdir". Removed starting from EMI2 releases, only jobdir is supported.

Input: The input source of new requests. If InputType is "filelist" the source is a file; if InputType is "jobdir" the source is the base dir of a JobDir structure, which is supposed to be already in place when the JobController starts. A JobDir structure consists of a base dir under which lie other three subdirectories, named tmp, new, old

SubmitFileDir: Path of the directory where the submit files for condor are stored

OutputFileDir: Path of the directory where the standard error/output files of the jobs (e.g. the JobWrapper) are stored

2.4.1.6 ICE section

The parameters for ICE are available in the ICE Configuration Guide

2.4.1.7 Network Server section

Although the Network Server is no more installed on WMS nodes some configuration paramenters in its section of the global conf file are still needed.

The important parameters in this section are those regardig the contact with the information system. In particular:

  • II_Contact = "egee-bdii.cnaf.infn.it"; set the hostname of the bdii to be contacted
  • II_Port = 2170; set the port on which the bdii is contacted
  • Gris_DN = "mds-vo-name=local, o=grid"; set the path where the bdii is publishing information
  • II_Timeout = 100; Set the timeout for the bdii query. It is important that this value is not too small, it is very dangerous if many bdii queries fail for timeout reasons. The risk is that all the information on the InformationSupermarker expire making all jobs in Waiting Status not to match any CE (they remain in Waiting Status for a long time, until a query to the bdii is successful). By default that value is set to 30, but 100 is a safer.
  • MaxInputSandboxSize = 10000000; # NOT USED

2.4.2 glite_wms_wmproxy.gacl

WMS User Authentication is performed by the WMProxy component based on a GACL module.
The fundamental file used to manages the WMS authentication is the /etc/glite-wms/glite_wms_wmproxy.gacl file.
This file contains the name of the VO that are allowed to use the WMS. A .gacl file example that allows the dteam and ops VOs is the following:

<pre><gacl version='0.0.1'>
 <entry>
   <voms>
     <fqan>/ops/Role=NULL/Capability=NULL</fqan>
   </voms>
   <allow>
     <exec/>
   </allow>
 </entry>
 <entry>
   <voms>
     <fqan>/dteam/Role=NULL/Capability=NULL</fqan>
   </voms>
   <allow>
     <exec/>
   </allow>
 </entry>
</pre>

There must be an exact match between the fqan expressed in the gacl file and the one in the user proxy.

The gacl file can also contain the DNs of single user allowed to use the WMS resources. The following entry in the .gacl file will allow Daniele Cesini to use the WMS even if he is not in the VOs allowed to use the WMS:

<pre><entry>
      <person>
        <dn>/C=IT/O=INFN/OU=Personal Certificate/L=CNAF/CN=Daniele Cesini/Email=daniele.cesini@cnaf.infn.it/</dn>
      </person>
      <allow><exec/></allow>
    </entry>
</pre>

An entry with a DENY tag can be used to ban users or VOs:

<pre><entry>
      <voms>
        <fqan>/dteam/Role=admin/Capability=NULL</fqan>
      </voms>
      <deny><exec/></deny>
    </entry>
</pre>

As the previous examples shows, it is possible to allow/ban users and VOs on the basis of their FQAN (i.e. those returned by the voms-proxy-info --fqan command).

User Mapping on a WMS node is done through lcmaps as in any other gLite services, so fundamental places to look in case of mapping problems are:

  • the gridmap file: /etc/grid-security/grid-mapfile;
  • the lcmaps log: /var/log/glite/lcmaps.log;
  • the gridmapdir: /etc/grid-security/gridmapdir/;
  • the existing pool accounts for a VO or a VO group/role

2.4.3 glite_wms_wmproxy_httpd.conf

This file is a WMProxy specific configuration file configuring the HTTP daemon and Fast CGI

Removed starting from EMI2 releases, only jobdir is supported.

2.4.4 wmproxy_logrotate.conf

This file configures the logrotate tool, that performs the rotation of httpd-wmproxy-access and httpd-wmproxy-errors HTTPD daemon log files. DISMISSED IN EMI2 Starting from EMI2 releases, all the log files will be rotated in the same way and by the same tool. In this case it will remain logrotate, but the configuration will be handled by yaim in /etc/logrotate.d

2.4.5 /var/.drain

This file is used to put the WMS in draining mode, so that it does not accept new submission requests but allows any other operations like output retrieval. The content should be the following:

<gacl>
    <entry>
    <any-user/>
      <deny><exec/></deny>
    </entry>
 </gacl>

2.5 Log files

The WMS log files, located under /var/log/wms, are the following:

  • workload_manager_events.log contains logs of the workload manager component
  • wmproxy.log contains logs of the wmproxy component
  • httpd-wmproxy-access.log wmproxy httpd access log
  • httpd-wmproxy-errors.log wmproxy httpd error log
  • glite-wms-wmproxy.restart.cron.log log of the /etc/cron.d/glite-wms-wmproxy.restart.cron cron
  • glite-wms-wmproxy-purge-proxycache.log log of the /etc/cron.d/glite-wms-wmproxy-purge-proxycache.cron cron
  • wmproxy_logrotate.log contains logs about the rotation of wmproxy httpd log files DISMISSED IN EMI2
  • renewal.log proxy renewal service log
  • logmonitor_events.log contains logs of the logmonitor component
  • jobcontoller_events.log contains logs of the jobcontroller component
  • ice.log contains logs of the ice component
  • glite-wms-purgeStorage.log log of the /etc/cron.d/glite-wms-purger.cron cron
For information on log files of other services running on the wms node, please refer to Service Reference Card

2.6 Network ports

Information about network ports is available in the Service Reference Card

2.7 Cron jobs

Information about cron jobs is available in the Service Reference Card

2.8 Security related operations

2.8.1 How authorization works

In the WMS, two different authorization mechanisms can be utilized. The default is local, based on the Gridsite GACL http://www.gridsite.org/wiki/GACL. The relevant entries are basically DN and FQAN, that can be used to set permission on single users and roles (i.e. user banning and so on). FQANs support wildcards to allow for easier handling. The GACL file is "${WMS_LOCATION_ETC}/glite_wms_wmproxy.gacl". It is a structurally simple xml file where policies are specified, either directly or through the siteinfo.def.

Another way to perform authorization is to use Argus as a site service. Argus is typically enabled via sitenfo.def, through the following variables.

USE_ARGUS=<boolean>
ARGUS_PEPD_ENDPOINTS="list_of_space_separated_URLs" # i.e.: "https://argus01.lcg.cscs.ch:8154/authz https://argus02.lcg.cscs.ch:8154/authz https://argus03.lcg.cscs.ch:8154/authz"

On the Argus server side, the policies to be defined will have to specify an action and a resource id. The WMS automatically sets the resource id to its service endpoint. The actions are the following:

getVersion
getJDL
getMaxInputSandboxSize
getSandboxDestURI
getSandboxBulkDestURI
getQuota
getFreeQuota
getOutputFileList
getJobTemplate
getDAGTemplate
getCollectionTemplate
getIntParametricJobTemplate
getStringParametricJobTemplate
getDelegationVersion
getProxyReq
putProxy
renewProxyReq
getNewProxyReq
destroyProxy
getProxyTerminationTime
getACLItems
addACLItems
removeACLItem
getProxyInfo
enableFilePerusal
getPerusalFiles
getTransferProtocols
getJobStatusOp
jobStart
jobSubmit
jobSubmitJSDL
jobRegister
jobListMatch
jobCancel
jobPurge

The profile used for creating the request complies to the glite profile.

2.8.2 How to filter out unwanted VOs

It can be useful to force the WMS to only select resources specific to a certain VO as the matchmaking time is consequntly reduced.

It can be enabled by providing ad additional ldap clause which will be added in the search filter at purchasing time.

The default search filter is:

(|(objectclass=gluecesebind)(objectclass=gluecluster)(objectclass=gluesubcluster)(objectclass=gluevoview)(objectclass=gluece))

The idea is to supply system administrators with the possibility to specify an additional ldap clause which will be added in logical AND to the latest two clauses of the default filter in order to match gluece/gluevoview objectclasses specific attributes.
To such an aim the configuration file supplies users with:

IsmIILDAPCEFilterExt for handling the additional search filter while purchasing information about CE from the BDII

As an example, by specifying the following:

IsmIILDAPCEFilterExt = "(|(GlueCEAccessControlBaseRule=VO:cms)(GlueCEAccessControlBaseRule=VOMS:/cms/*))"

the search filter during the purchasing would be:

(|(objectclass=gluecesebind)(objectclass=gluecluster)(objectclass=gluesubcluster)(&(|(objectclass=gluevoview)(objectclass=gluece))
(|(GlueCEAccessControlBaseRule=VO:cms)(GlueCEAccessControlBaseRule=VOMS:/cms/*))))

and thus the WMS would select only resources (i.e. CE/Views) belonging to CMS.

2.8.3 How to block/ban a VO

To ban a VO, it is suggested to reconfigure the service via yaim without that VO in the siteinfo.def

2.9 Job purging

Purging a WMS job means removing from the WMS node any relevant information about the job (e.g. the job sandbox area).

A job can be purged:

  • Explicitly by the administrator by invoking the command /usr/sbin/glite-wms-purgeStorage

  • Automatically by the wms services when a job is aborted and when the output of a completed job is retrieved by the user
  • Automatically by the following cron:
    3 */6 * * mon-sat glite . /usr/libexec/grid-env.sh ; /usr/sbin/glite-wms-purgeStorage.sh -l /var/log/wms/glite-wms-purgeStorage.log -p /var/SandboxDir -t 604800 > /dev/null 2>&1
    0 1 * * sun glite . /usr/libexec/grid-env.sh ; /usr/sbin/glite-wms-purgeStorage.sh -l /var/log/wms/glite-wms-purgeStorage.log -p /var/SandboxDir -o -s -t 1296000 > /dev/null 2>&1

For jobs submitted to a CREAM CE through the WMS, the purging is done by the ICE component of the WMS when it detects the job has reached a terminal status. The purging operation is not done if in the WMS conf file ( /etc/glite-wms/glite_wms.conf) the attribute purge_jobs in the ICE section is set to false.

3 Service Migration

During operation, the WMS retains information about what follows:

1) Job requests (submit, cancel, etc.)

2) Job sandboxes (user data)

3) Job tracking metadata (log events, statuses, statistics)

Provided that this data is consistently preserved, a WMS instance can be migrated to another machine with no extra overhead. No new and old instances must be working at the same time for this whole process to work, the host certificate must also be present at the standard location (/etc/grid-security). Unless otherwise specified, the new instance might even install an updated version of both WMS and L&B.

1)

Job requests are stored in the form of ASCII files inside maildir-like directories. As taken from the configuration, they are typically:

Input = "${WMS_LOCATION_VAR}/workload_manager/jobdir"; Input = "${WMS_LOCATION_VAR}/jobcontrol/jobdir/"; Input = "${WMS_LOCATION_VAR}/ice/jobdir";

2)

User sandbox directories are stored in a directory hierarchy whose root path is SandboxStagingPath, as set in the WorkloadManagerProxy section of the configuration (tipically /var/Sandboxdir).

3) The L&B service, that is always installed in a WMS node (though in two defferent modes, 'proxy' and 'both'), stores job tracking information about both processed and unprocessed jobs in a MySql database, named lbserver20. This database typically resides in /var/lib/mysql/lbserver20.

-- FabioCapannini - 2011-04-28

Edit | Attach | PDF | History: r41 < r40 < r39 < r38 < r37 | Backlinks | Raw View | More topic actions
Topic revision: r41 - 2018-03-08 - DiegoMichelotto
 
This site is powered by the TWiki collaboration platformCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback