Difference: WMSSystemAdministratorGuide (1 vs. 41)

Revision 412018-03-08 - DiegoMichelotto

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

System Administrator Guide for WMS for EMI

Line: 508 to 508
 
<gacl>
    <entry>
    <any-user/>
Changed:
<
<
a
>
>
 

Revision 402014-05-19 - AlviseDorigo

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

System Administrator Guide for WMS for EMI

Line: 463 to 463
  There must be an exact match between the fqan expressed in the gacl file and the one in the user proxy.
Changed:
<
<
The gacl file can also contain the DNs of single users that are allow to use the WMS resources. The following entry in the .gacl file will allow Daniele Cesini to use the WMS even if he is not in the VOs allowed to use the WMS:
>
>
The gacl file can also contain the DNs of single user allowed to use the WMS resources. The following entry in the .gacl file will allow Daniele Cesini to use the WMS even if he is not in the VOs allowed to use the WMS:
 
<pre><entry>
Changed:
<
<
>
>
  /C=IT/O=INFN/OU=Personal Certificate/L=CNAF/CN=Daniele Cesini/Email=daniele.cesini@cnaf.infn.it/
Changed:
<
<
>
>
 

Revision 392013-03-26 - MarcoCecchi

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

System Administrator Guide for WMS for EMI

Line: 289 to 289
  MaxServedRequests: Long attribute limiting the number of operation served by each WMProxy instance before exiting and releasing possibly allocated memory. This value is overriden by GLITE_WMS_WMPROXY_MAX_SERVED_REQUESTS environment variable, if set. This feature can be disabled by setting a lower-or-equal to zero value
Changed:
<
<
OperationsLoadScripts: ClassAd type attribute where an internal attribute can be specified for any WMProxy provided operation. The names of these attributes are equal to the names of the server operations (e.g. for jobSubmit operation the attribute name to use is "jobSubmit"). This internal attributes are used to provide the path and the name of the script to be executed to verify the load of the WMProxy server for any provided operation. If the server load is too high the requested operation is refused. The path and the name of the script can be followed by user defined options and parameters depending on the specific script needs for arguments.

WMProxy provide a load script that can be used for any of the provided operations. The template load script glite_wms_wmproxy_load_monitor.template is installed by the rpm file glite-wms-wmproxy in the directory ${WMS_LOCATION_SBIN}.

To call the script glite_wms_wmproxy_load_monitor, when the operation jobSubmit is requested, with the options:

--load1 10 --load5 10 --load15 10 --memusage 95 --diskusage 95 --fdnum 500

add the attribute:
>
>
OperationsLoadScripts: ClassAd type attribute where an internal attribute can be specified for any WMProxy provided operation. The names of these attributes are equal to the names of the server operations (e.g. for jobSubmit operation the attribute name to use is "jobSubmit"). This internal attributes are used to provide the path and the name of the script to be executed to verify the load of the WMProxy server for any provided operation. If the server load is too high the requested operation is refused. The path and the name of the script can be followed by user defined options and parameters depending on the specific script needs for arguments.

WMProxy provide a load script that can be used for any of the provided operations. The template load script glite_wms_wmproxy_load_monitor.template is installed by the rpm file glite-wms-wmproxy in the directory ${WMS_LOCATION_SBIN}.

To call the script glite_wms_wmproxy_load_monitor, when the operation jobSubmit is requested, with the options:

--load1 10 --load5 10 --load15 10 --memusage 95 --diskusage 95 --fdnum 500

add the attribute:
 
OperationsLoadScripts [
   jobSubmit = "${WMS_LOCATION_SBIN}/glite_wms_wmproxy_load_monitor 
      --oper jobSubmit --load1 10 --load5 10 --load15 10 
Line: 536 to 538
 

0.1 Security related operations

Changed:
<
<

0.0.1 How to select specific VO resources

>
>

0.0.1 How authorization works

In the WMS, two different authorization mechanisms can be utilized. The default is local, based on the Gridsite GACL http://www.gridsite.org/wiki/GACL. The relevant entries are basically DN and FQAN, that can be used to set permission on single users and roles (i.e. user banning and so on). FQANs support wildcards to allow for easier handling. The GACL file is "${WMS_LOCATION_ETC}/glite_wms_wmproxy.gacl". It is a structurally simple xml file where policies are specified, either directly or through the siteinfo.def.

Another way to perform authorization is to use Argus as a site service. Argus is typically enabled via sitenfo.def, through the following variables.

USE_ARGUS=<boolean>
ARGUS_PEPD_ENDPOINTS="list_of_space_separated_URLs" # i.e.: "https://argus01.lcg.cscs.ch:8154/authz https://argus02.lcg.cscs.ch:8154/authz https://argus03.lcg.cscs.ch:8154/authz"

On the Argus server side, the policies to be defined will have to specify an action and a resource id. The WMS automatically sets the resource id to its service endpoint. The actions are the following:

getVersion
getJDL
getMaxInputSandboxSize
getSandboxDestURI
getSandboxBulkDestURI
getQuota
getFreeQuota
getOutputFileList
getJobTemplate
getDAGTemplate
getCollectionTemplate
getIntParametricJobTemplate
getStringParametricJobTemplate
getDelegationVersion
getProxyReq
putProxy
renewProxyReq
getNewProxyReq
destroyProxy
getProxyTerminationTime
getACLItems
addACLItems
removeACLItem
getProxyInfo
enableFilePerusal
getPerusalFiles
getTransferProtocols
getJobStatusOp
jobStart
jobSubmit
jobSubmitJSDL
jobRegister
jobListMatch
jobCancel
jobPurge

The profile used for creating the request complies to the glite profile.

0.0.2 How to filter out unwanted VOs

  It can be useful to force the WMS to only select resources specific to a certain VO as the matchmaking time is consequntly reduced.
Line: 560 to 617
  and thus the WMS would select only resources (i.e. CE/Views) belonging to CMS.
Deleted:
<
<

0.0.1 How to block/ban a user

Information about how to ban users is available in https://twiki.cnaf.infn.it/twiki/bin/view/EgeeJra1It/WMSServiceRefCard#How_to_block_ban_a_user

 

0.0.1 How to block/ban a VO

To ban a VO, it is suggested to reconfigure the service via yaim without that VO in the siteinfo.def

Revision 382013-03-04 - MarcoCecchi

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

System Administrator Guide for WMS for EMI

Line: 580 to 581
  For jobs submitted to a CREAM CE through the WMS, the purging is done by the ICE component of the WMS when it detects the job has reached a terminal status. The purging operation is not done if in the WMS conf file ( /etc/glite-wms/glite_wms.conf) the attribute purge_jobs in the ICE section is set to false.
Added:
>
>

1 Service Migration

During operation, the WMS retains information about what follows:

1) Job requests (submit, cancel, etc.)

2) Job sandboxes (user data)

3) Job tracking metadata (log events, statuses, statistics)

Provided that this data is consistently preserved, a WMS instance can be migrated to another machine with no extra overhead. No new and old instances must be working at the same time for this whole process to work, the host certificate must also be present at the standard location (/etc/grid-security). Unless otherwise specified, the new instance might even install an updated version of both WMS and L&B.

1)

Job requests are stored in the form of ASCII files inside maildir-like directories. As taken from the configuration, they are typically:

Input = "${WMS_LOCATION_VAR}/workload_manager/jobdir"; Input = "${WMS_LOCATION_VAR}/jobcontrol/jobdir/"; Input = "${WMS_LOCATION_VAR}/ice/jobdir";

2)

User sandbox directories are stored in a directory hierarchy whose root path is SandboxStagingPath, as set in the WorkloadManagerProxy section of the configuration (tipically /var/Sandboxdir).

3) The L&B service, that is always installed in a WMS node (though in two defferent modes, 'proxy' and 'both'), stores job tracking information about both processed and unprocessed jobs in a MySql database, named lbserver20. This database typically resides in /var/lib/mysql/lbserver20.

 -- FabioCapannini - 2011-04-28

Revision 372013-02-26 - SaraBertocco

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

System Administrator Guide for WMS for EMI

Line: 127 to 127
 In order for the httpd section of yaim to be run correctly, SELinux should be disabled, or, as an alternative, the certificates should be correctly labelled, see:
http://docs.fedoraproject.org/en-US/Fedora/13/html/Security-Enhanced_Linux/sect-Security-Enhanced_Linux-Working_with_SELinux-SELinux_Contexts_Labeling_Files.html
Added:
>
>
In WMS EMI 3 SL6 (glite-wms-interface-3.5.0-6.sl6.x86_64) in order for the httpd section of yaim to be run correctly, SELinux should be disabled, or, as an alternative, the workaround described above must be applyed.
  • Context (EMI 3 SL6):
       # cat /etc/emi-version
       3.0.0-1
       # rpm -qa |grep gridsite
       gridsite-2.0.4-1.el6.x86_64
       gridsite-libs-2.0.4-1.el6.x86_64
       # rpm -qa |grep selinux
       libselinux-2.0.94-5.3.el6.x86_64
       libselinux-utils-2.0.94-5.3.el6.x86_64
       libselinux-python-2.0.94-5.3.el6.x86_64
       selinux-policy-3.7.19-155.el6_3.14.noarch
       selinux-policy-targeted-3.7.19-155.el6_3.14.noarch
       libselinux-ruby-2.0.94-5.3.el6.x86_64
       

SELINUX enabled:

   # getenforce
   Enforcing
   # ls -Z /var/lib/glite/.certs/host*.pem
   -rw-r--r--. glite glite user_u:object_r:var_lib_t:s0 /var/lib/glite/.certs/hostcert.pem
   -r--------. glite glite user_u:object_r:var_lib_t:s0 /var/lib/glite/.certs/hostkey.pem
   

  • Sympthom of the problem
       # /etc/init.d/glite-wms-wmproxy restart
       Restarting /usr/bin/glite_wms_wmproxy_server... ko
       

WORKAROUND:

   /usr/sbin/setsebool httpd_can_network_connect=1
   /usr/sbin/semanage port -a -t http_port_t -p tcp 7443

   # /etc/init.d/glite-wms-wmproxy restart
   Restarting /usr/bin/glite_wms_wmproxy_server... ok
   
 

0.0.0.1 Run yaim

After having filled the siteinfo.def file, run yaim:

Revision 362013-02-07 - MarcoCecchi

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

System Administrator Guide for WMS for EMI

Line: 124 to 124
 
  • GLITE_LB_WMS_DN = "the WMS DN" ex: "/C=IT/O=INFN/OU=Host/L=CNAF/CN=devel09.cnaf.infn.it"

0.0.0.1 Configure SELinux

Changed:
<
<
In order for the httpd section of yaim to be run correctly, SELinux should be disabled, or, as an alternative, the certificates should be laballed correctly, see:
>
>
In order for the httpd section of yaim to be run correctly, SELinux should be disabled, or, as an alternative, the certificates should be correctly labelled, see:
 
http://docs.fedoraproject.org/en-US/Fedora/13/html/Security-Enhanced_Linux/sect-Security-Enhanced_Linux-Working_with_SELinux-SELinux_Contexts_Labeling_Files.html

0.0.0.1 Run yaim

Revision 352012-10-16 - MarcoCecchi

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

System Administrator Guide for WMS for EMI

Line: 217 to 217
  The relevant parameters available in this section are the following:
Added:
>
>
* MaxInputSandboxSize = 10000000; this puts a PER FILE limit in the dimension of the input sandbox of the jdl. Units are byte
 LogFile: String attribute containing the path of the WMProxy log file

LogLevel: Integer attribute containing a value from 0 to 6 (Optional). The integer value represents the WMProxy log file verbosity level: from 0 (fatal) to 6 (debug: maximum verbosity)

Line: 392 to 394
 
  • II_Port = 2170; set the port on which the bdii is contacted
  • Gris_DN = "mds-vo-name=local, o=grid"; set the path where the bdii is publishing information
  • II_Timeout = 100; Set the timeout for the bdii query. It is important that this value is not too small, it is very dangerous if many bdii queries fail for timeout reasons. The risk is that all the information on the InformationSupermarker expire making all jobs in Waiting Status not to match any CE (they remain in Waiting Status for a long time, until a query to the bdii is successful). By default that value is set to 30, but 100 is a safer.
Changed:
<
<
  • MaxInputSandboxSize = 10000000; this puts a limit in the dimension of the input sandbox of the jdl. Units are byte
>
>
 

0.0.1 glite_wms_wmproxy.gacl

WMS User Authentication is performed by the WMProxy component based on a GACL module.
The fundamental file used to manages the WMS authentication is the /etc/glite-wms/glite_wms_wmproxy.gacl file.
This file contains the name of the VO that are allowed to use the WMS. A .gacl file example that allows the dteam and ops VOs is the following:

Revision 342012-10-11 - MarcoCecchi

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

System Administrator Guide for WMS for EMI

Line: 260 to 260
  DISMISSED IN EMI2 EnableBulkMM = true; //enable the bulk matchmaking for collection
Changed:
<
<
NEW IN EMI2 EnabledReplanner = ; // The job replanner can now be toggled by configuration. The replanning feature is not always used, and in some cases it can show problems with queries to the LB, in case of high load. For this reason it is now disabled by default.

IsmBlackList // allow to set a list of CEs that are banned

>
>
NEW IN EMI2 EnableReplanner = ; // The job replanner can now be toggled by configuration. The replanning feature is not always used, and in some cases it can show problems with queries to the LB, in case of high load. For this reason it is now disabled by default.
  IsmUpdateRate = 600; // information supermarket update rate (in seconds)
WorkerThreads = 5; // enable the multithread for the WM component. Speed up the matchmaking process. 5 is a god compromise between machine load and speed.

Revision 332012-09-28 - MarcoCecchi

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

System Administrator Guide for WMS for EMI

Line: 260 to 260
  DISMISSED IN EMI2 EnableBulkMM = true; //enable the bulk matchmaking for collection
Added:
>
>
NEW IN EMI2 EnabledReplanner = ; // The job replanner can now be toggled by configuration. The replanning feature is not always used, and in some cases it can show problems with queries to the LB, in case of high load. For this reason it is now disabled by default.
 IsmBlackList // allow to set a list of CEs that are banned

IsmUpdateRate = 600; // information supermarket update rate (in seconds)
WorkerThreads = 5; // enable the multithread for the WM component. Speed up the matchmaking process. 5 is a god compromise between machine load and speed.

Line: 310 to 312
  MaxReplansCount (5): the maximum number of replans a job should undergo before being terminated
Changed:
<
<
SbRetryDifferentProtocols (false): if different protocols should be used when retrying failed or hanging transfers of ISB or OSB in the jobwrapper. See also bug https://savannah.cern.ch/bugs/?48479
>
>
NEW IN EMI2 SbRetryDifferentProtocols (false): if different protocols should be used when retrying failed or hanging transfers of ISB or OSB in the jobwrapper. See also bug https://savannah.cern.ch/bugs/?48479

WmsRequirements: This expression is appended in && to the user requirements. It contains both WMS typical requirements and queue requirements, such as authZ checks.

 
Changed:
<
<
WmsRequirements: This expression is appended in && to the user requirements: requirements = (userrequirements) && (wmsrequirements); The default value for this attribute (set by yaim) is:
WmsRequirements = ((ShortDeadlineJob =?= TRUE) ? RegExp(".sdj$", other.GlueCEUniqueID) : !RegExp(".sdj$", other.GlueCEUniqueID)) && (other.GlueCEPolicyMaxTotalJobs == 0 || other.GlueCEStateTotalJobs < other.GlueCEPolicyMaxTotalJobs) && (EnableWmsFeedback =?= TRUE ? RegExp("cream", other.GlueCEImplementationName, "i") : true);
>
>
Example: requirements = (userrequirements) && (wmsrequirements); The default value for this attribute (set by yaim) is:
WmsRequirements = ((ShortDeadlineJob =?= TRUE) ? RegExp(".sdj$", other.GlueCEUniqueID) : !RegExp(".sdj$", other.GlueCEUniqueID)) && (other.GlueCEPolicyMaxTotalJobs == 0 || other.GlueCEStateTotalJobs < other.GlueCEPolicyMaxTotalJobs) && (EnableWmsFeedback =?= TRUE ? RegExp("cream", other.GlueCEImplementationName, "i") : true);
  PropagateToLRMS: this expression is propagated to the LRMS

Revision 322012-05-14 - MarcoCecchi

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

System Administrator Guide for WMS for EMI

Line: 310 to 310
  MaxReplansCount (5): the maximum number of replans a job should undergo before being terminated
Changed:
<
<
SbRetryDifferentProtocols (false): if different protocols should be used when retrying failed or hanging transfers of ISB or OSB in the jobwrapper
>
>
SbRetryDifferentProtocols (false): if different protocols should be used when retrying failed or hanging transfers of ISB or OSB in the jobwrapper. See also bug https://savannah.cern.ch/bugs/?48479
  WmsRequirements: This expression is appended in && to the user requirements: requirements = (userrequirements) && (wmsrequirements); The default value for this attribute (set by yaim) is:
WmsRequirements = ((ShortDeadlineJob =?= TRUE) ? RegExp(".sdj$", other.GlueCEUniqueID) : !RegExp(".sdj$", other.GlueCEUniqueID)) && (other.GlueCEPolicyMaxTotalJobs == 0 || other.GlueCEStateTotalJobs < other.GlueCEPolicyMaxTotalJobs) && (EnableWmsFeedback =?= TRUE ? RegExp("cream", other.GlueCEImplementationName, "i") : true);

Revision 312012-05-14 - FabioCapannini

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

System Administrator Guide for WMS for EMI

Line: 310 to 310
  MaxReplansCount (5): the maximum number of replans a job should undergo before being terminated
Changed:
<
<
SbRetryDifferentProtocols (false): if different protocols should be used when retrying failed transfers of ISB or OSB in the jobwrapper
>
>
SbRetryDifferentProtocols (false): if different protocols should be used when retrying failed or hanging transfers of ISB or OSB in the jobwrapper
  WmsRequirements: This expression is appended in && to the user requirements: requirements = (userrequirements) && (wmsrequirements); The default value for this attribute (set by yaim) is:
WmsRequirements = ((ShortDeadlineJob =?= TRUE) ? RegExp(".sdj$", other.GlueCEUniqueID) : !RegExp(".sdj$", other.GlueCEUniqueID)) && (other.GlueCEPolicyMaxTotalJobs == 0 || other.GlueCEStateTotalJobs < other.GlueCEPolicyMaxTotalJobs) && (EnableWmsFeedback =?= TRUE ? RegExp("cream", other.GlueCEImplementationName, "i") : true);

Revision 302012-02-14 - MarcoCecchi

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

System Administrator Guide for WMS for EMI

Line: 233 to 233
  WeightsCacheValidity: Time in seconds (n) indicating the validity of the weights (i.e. probability to be selected) associated to the available LB Servers. When last weights update (i.e. last received request) has occurred more than n seconds ago then the weights are restored to the same value for all LB Servers
Changed:
<
<
LBLocalLogger: Address of LB Local Logger in the format of <host>[:<port>] (default value for port is 9002). This attribute is needed only if LB Local Logger runs on another host and LBProxy is not enabled
>
>
DISMISSED IN EMI2 LBLocalLogger: address of LB Local Logger in the format of <host>[:<port>] (default value for port is 9002). This attribute is needed only if LB Local Logger runs on another host and LBProxy is not enabled. Removed starting from EMI2 releases.
  AsyncJobStart: Boolean attribute used to switch from synchronous/asynchronous job start behavior. When set to true, during job start operation the control is returned to user immediately after the request has been received, while the actual execution of the operation (that could be quite time consuming) is performed asynchronously
Line: 258 to 258
  Important parameters in this section are:
Changed:
<
<
EnableBulkMM = true; //enable the bulk matchmaking for collection
IsmBlackList // allow to set a list of CEs that are banned
IsmUpdateRate = 600; // information supermarket update rate (in seconds)
WorkerThreads = 5; // enable the multithread for the WM component. Speed up the matchmaking process. 5 is a god compromise between machine load and speed.
>
>
DISMISSED IN EMI2 EnableBulkMM = true; //enable the bulk matchmaking for collection

IsmBlackList // allow to set a list of CEs that are banned

IsmUpdateRate = 600; // information supermarket update rate (in seconds)
WorkerThreads = 5; // enable the multithread for the WM component. Speed up the matchmaking process. 5 is a god compromise between machine load and speed.

  CeForwardParameters: the parameters forwarded by the WM to the CE
Line: 266 to 270
  CeMonitorServices: the list of CEMon's the WM listens to
Changed:
<
<
DispatcherType: the WM can read its input using different mechanisms. Currently supported types are "filelist" and "jobdir"
>
>
DISMISSED IN EMI2 DispatcherType: the WM can read its input using different mechanisms. Currently supported types are "filelist" and "jobdir". Removed starting from EMI2 releases, only jobdir is supported.
  EnableRecovery: specifies if at startup the WM should perform a special recovery procedure for the requests that it finds already in its input.
Line: 365 to 369
  ContainerRefreshThreshold: Number of jobs that JobController can take in memory before resyncronizing its container with the one phisically saved in the file "IdRepositoryName" (see LM section)
Changed:
<
<
InputType: The JobController can read its input using different mechanisms. Currently supported types are "filelist" and "jobdir"
>
>
DISMISSED IN EMI2 InputType: The JobController can read its input using different mechanisms. Currently supported types are "filelist" and "jobdir". Removed starting from EMI2 releases, only jobdir is supported.
  Input: The input source of new requests. If InputType is "filelist" the source is a file; if InputType is "jobdir" the source is the base dir of a JobDir structure, which is supposed to be already in place when the JobController starts. A JobDir structure consists of a base dir under which lie other three subdirectories, named tmp, new, old
Line: 442 to 446
  This file is a WMProxy specific configuration file configuring the HTTP daemon and Fast CGI
Added:
>
>
Removed starting from EMI2 releases, only jobdir is supported.
 

0.0.1 wmproxy_logrotate.conf

Changed:
<
<
This file configures the logrotate tool, that performs the rotation of httpd-wmproxy-access and httpd-wmproxy-errors HTTPD daemon log files
>
>
This file configures the logrotate tool, that performs the rotation of httpd-wmproxy-access and httpd-wmproxy-errors HTTPD daemon log files. DISMISSED IN EMI2 Starting from EMI2 releases, all the log files will be rotated in the same way and by the same tool. In this case it will remain logrotate, but the configuration will be handled by yaim in /etc/logrotate.d
 

0.0.1 /var/.drain

Line: 453 to 460
 
<gacl>
    <entry>
    <any-user/>
Changed:
<
<
>
>
a
 

0.1 Log files

Line: 465 to 472
 
  • httpd-wmproxy-errors.log wmproxy httpd error log
  • glite-wms-wmproxy.restart.cron.log log of the /etc/cron.d/glite-wms-wmproxy.restart.cron cron
  • glite-wms-wmproxy-purge-proxycache.log log of the /etc/cron.d/glite-wms-wmproxy-purge-proxycache.cron cron
Changed:
<
<
  • wmproxy_logrotate.log contains logs about the rotation of wmproxy httpd log files
>
>
  • wmproxy_logrotate.log contains logs about the rotation of wmproxy httpd log files DISMISSED IN EMI2
 
  • renewal.log proxy renewal service log
  • logmonitor_events.log contains logs of the logmonitor component
  • jobcontoller_events.log contains logs of the jobcontroller component

Revision 292012-01-11 - FabioCapannini

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

System Administrator Guide for WMS for EMI

Line: 38 to 38
 

0.0.0.1 The EMI middleware repository

Changed:
<
<

The EMI-1 RC4 repository can be found under:
http://emisoft.web.cern.ch/emisoft/dist/EMI/1/RC4/sl5/x86_64
>
>

The EMI-1 repository can be found under http://emisoft.web.cern.ch/emisoft/dist/EMI/1/sl5/x86_64/
 
Changed:
<
<
To use yum, the yum repo to be installed in /etc/yum.repos.d can be found at https://twiki.cern.ch/twiki/pub/EMI/EMI-1/rc4.repo
>
>
To use yum, the yum repo to be installed in /etc/yum.repos.d can be found at http://emisoft.web.cern.ch/emisoft/
 
Added:
>
>
The packages are signed with the EMI gpg key, that can be downloaded from http://emisoft.web.cern.ch/emisoft/dist/EMI/1/RPM-GPG-KEY-emi. To import it:
[root@emi-demo11 ~]# wget http://emisoft.web.cern.ch/emisoft/dist/EMI/1/RPM-GPG-KEY-emi -O /tmp/emi-key_gd.asc
[root@emi-demo11 ~]# rpm --import /tmp/emi-key_gd.asc
 

0.0.0.1 The Certification Authority repository

The most up-to-date version of the list of trusted Certification Authorities (CA) is needed on your node. The relevant yum repo can be installed issuing:

Revision 282011-12-15 - FabioCapannini

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

System Administrator Guide for WMS for EMI

Line: 226 to 226
  MinPerusalTimeInterval: Integer value representing the time interval (in seconds) between two savings of job partial execution output. This attribute affects the WMProxy and other components behaviour only if perusal functionality are explicity requested by the user via the JDL, see EnableFilePerusal JDL attribute
Changed:
<
<
LBServer: Address or list of addresses of the LB Server[s] to be used for storing job's information in the format of <host>[:<port>] (default value for port is 9000). This attribute is needed only if LB Server is not running in the WMProxy server host, or if more than one LB Servers must be used. Selection of the LB Server to use is made randomically from the list by the WMProxy, for any different service request. WMproxy maintains a list of weights associated to the available LB Servers so that failing LB Servers have decreasing probability of being selected. If the Service Discovery is enabled, the LB Servers found using the Service Discovery are added in the list.

Note that the following lines have same meaning:

LBServer = "ghemon.cnaf.infn.it:9000";
LBServer = {"ghemon.cnaf.infn.it:9000"};
>
>
LBServer: Address or list of addresses of the LB Server[s] to be used for storing job's information in the format of <host>[:<port>] (default value for port is 9000). Selection of the LB Server to use is made randomically from the list by the WMProxy, for any different service request. WMproxy maintains a list of weights associated to the available LB Servers so that failing LB Servers have decreasing probability of being selected. If the Service Discovery is enabled, the LB Servers found using the Service Discovery are added in the list.

Note that the following lines have same meaning:

LBServer = "ghemon.cnaf.infn.it:9000";
LBServer = {"ghemon.cnaf.infn.it:9000"};
  WeightsCacheValidity: Time in seconds (n) indicating the validity of the weights (i.e. probability to be selected) associated to the available LB Servers. When last weights update (i.e. last received request) has occurred more than n seconds ago then the weights are restored to the same value for all LB Servers

Revision 272011-11-28 - FabioCapannini

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

System Administrator Guide for WMS for EMI

Line: 112 to 112
 
  • $PX_HOST -> the hostname of a server myproxy, ex.: 'myproxy.$MY_DOMAIN'
  • $BDII_HOST -> the hostname of the site bdii to be used, ex: 'sitebdii.$MY_DOMAIN'
  • $LB_HOST -> the hostname of the LB server to be used, ex: 'lb-server.$MY_DOMAIN:9000' This variable is set as a service specific variable in the file services/glite-wms, located one directory below the one where the 'site-info.def' file is locate
Added:
>
>

If an LB has to be installed in colocation on the same server (LBProxy = both), the following parameters have to be set in siteinfo/services/glite-wms file:
  • LB_HOST = "WMS hostanme:port" ex: "devel11.cnaf.infn.it:9000"
  • GLITE_LB_TYPE = both
and the following parameters have to be set in siteinfo.def file:
  • GLITE_LB_AUTHZ_REGISTER_JOBS = ".*"
  • GLITE_LB_WMS_DN = "the WMS DN" ex: "/C=IT/O=INFN/OU=Host/L=CNAF/CN=devel09.cnaf.infn.it"
 

0.0.0.1 Configure SELinux

In order for the httpd section of yaim to be run correctly, SELinux should be disabled, or, as an alternative, the certificates should be laballed correctly, see:

Revision 262011-11-17 - FabioCapannini

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

System Administrator Guide for WMS for EMI

Line: 300 to 300
  WmsRequirements: This expression is appended in && to the user requirements: requirements = (userrequirements) && (wmsrequirements); The default value for this attribute (set by yaim) is:
WmsRequirements = ((ShortDeadlineJob =?= TRUE) ? RegExp(".sdj$", other.GlueCEUniqueID) : !RegExp(".sdj$", other.GlueCEUniqueID)) && (other.GlueCEPolicyMaxTotalJobs == 0 || other.GlueCEStateTotalJobs < other.GlueCEPolicyMaxTotalJobs) && (EnableWmsFeedback =?= TRUE ? RegExp("cream", other.GlueCEImplementationName, "i") : true);
Added:
>
>
PropagateToLRMS: this expression is propagated to the LRMS
 

0.0.0.1 LogMonitor section

Usually there is no need to change the default parameters with the exceptio of: RemoveJobFiles = true;
That by default is set to false.
Setting it to true will force condor to remove unused internal files when the job are in a final state.

Revision 252011-11-17 - FabioCapannini

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

System Administrator Guide for WMS for EMI

Line: 198 to 198
 DGUser: the user under which a WMS process runs

LBProxy: Boolean attribute to switch from LB and LBProxy. If the value of this attribute is true, LBProxy is used for logging and query operations about jobs

Added:
>
>
HostProxyFile (no default): the host proxy certificate file
 

0.0.0.1 WorkloadManagerProxy section

Very important paramenters are those that configure the so called limiter ( OperationLoadScripts) used to inhibit submission in the case that some system load limits are hit.
Since the WMS and LB suggested deployment is to have them on two separate physical machines a foundamental parameter is also LBServer.

Line: 293 to 296
  MaxReplansCount (5): the maximum number of replans a job should undergo before being terminated
Added:
>
>
SbRetryDifferentProtocols (false): if different protocols should be used when retrying failed transfers of ISB or OSB in the jobwrapper

WmsRequirements: This expression is appended in && to the user requirements: requirements = (userrequirements) && (wmsrequirements); The default value for this attribute (set by yaim) is:

WmsRequirements = ((ShortDeadlineJob =?= TRUE) ? RegExp(".sdj$", other.GlueCEUniqueID) : !RegExp(".sdj$", other.GlueCEUniqueID)) && (other.GlueCEPolicyMaxTotalJobs == 0 || other.GlueCEStateTotalJobs < other.GlueCEPolicyMaxTotalJobs) && (EnableWmsFeedback =?= TRUE ? RegExp("cream", other.GlueCEImplementationName, "i") : true);
 

0.0.0.1 LogMonitor section

Revision 242011-11-15 - FabioCapannini

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

System Administrator Guide for WMS for EMI

Line: 288 to 288
 RuntimeMalloc: allows to the use an alternative malloc library (examples are nedmalloc, google performance tools, ccmalloc), specifying the path to the shared object, to be loaded with LD_PRELOAD. Example: RuntimeMalloc = "/usr/lib64/libtcmalloc_minimal.so".

WorkerThreads: the number of request handler threads

Added:
>
>
ReplanGracePeriod (3600): the minimum time a job should be in status 'scheduled' after being evaluated for replanning

MaxReplansCount (5): the maximum number of replans a job should undergo before being terminated

 

0.0.0.1 LogMonitor section

Usually there is no need to change the default parameters with the exceptio of: RemoveJobFiles = true;
That by default is set to false.
Setting it to true will force condor to remove unused internal files when the job are in a final state.

Revision 232011-11-15 - FabioCapannini

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

System Administrator Guide for WMS for EMI

Line: 422 to 422
  This file configures the logrotate tool, that performs the rotation of httpd-wmproxy-access and httpd-wmproxy-errors HTTPD daemon log files
Changed:
<
<

0.0.1 .drain

>
>

0.0.1 /var/.drain

 
Changed:
<
<
This file is used to put the WMS in draining mode, so that it does not accept new submission requests but allows any other operations like output retrieval
>
>
This file is used to put the WMS in draining mode, so that it does not accept new submission requests but allows any other operations like output retrieval. The content should be the following:
<gacl>
    <entry>
    <any-user/>
      <deny><exec/></deny>
    </entry>
 </gacl>
 

0.1 Log files

The WMS log files, located under /var/log/wms, are the following:

Revision 222011-10-27 - TWikiAdminUser

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

System Administrator Guide for WMS for EMI

Changed:
<
<

1 Operations/Installation and configuration

>
>

1 Installation and configuration

 

0.1 Prerequisites

Line: 58 to 58
 
Running the script available at http://forge.cnaf.infn.it/frs/download.php/101/disable_yum.sh (implemented by Giuseppe Platania (INFN Catania) yum autoupdate will be disabled
Changed:
<
<

0.0.1 Operations/Installation of a WMS node

>
>

0.0.1 Installation of a WMS node

  First of all, install the yum-protectbase rpm:
Line: 67 to 67
  Then proceed with the installation of the CA certificates.
Changed:
<
<

0.0.0.1 Operations/Installation of the CA certificates

>
>

0.0.0.1 Installation of the CA certificates

  The CA certificate can be installed issuing:
yum install ca-policy-egi-core 
Changed:
<
<

0.0.0.1 Operations/Installation of the WMS software

>
>

0.0.0.1 Installation of the WMS software

  Install the WMS metapackage:
yum install emi-wms

Revision 212011-10-24 - TWikiAdminUser

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

System Administrator Guide for WMS for EMI

Changed:
<
<

1 Installation and configuration

>
>

1 Operations/Installation and configuration

 

0.1 Prerequisites

Line: 58 to 58
 
Running the script available at http://forge.cnaf.infn.it/frs/download.php/101/disable_yum.sh (implemented by Giuseppe Platania (INFN Catania) yum autoupdate will be disabled
Changed:
<
<

0.0.1 Installation of a WMS node

>
>

0.0.1 Operations/Installation of a WMS node

  First of all, install the yum-protectbase rpm:
Line: 67 to 67
  Then proceed with the installation of the CA certificates.
Changed:
<
<

0.0.0.1 Installation of the CA certificates

>
>

0.0.0.1 Operations/Installation of the CA certificates

  The CA certificate can be installed issuing:
yum install ca-policy-egi-core 
Changed:
<
<

0.0.0.1 Installation of the WMS software

>
>

0.0.0.1 Operations/Installation of the WMS software

  Install the WMS metapackage:
yum install emi-wms

Revision 202011-10-24 - TWikiAdminUser

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

System Administrator Guide for WMS for EMI

Line: 232 to 232
  MaxServedRequests: Long attribute limiting the number of operation served by each WMProxy instance before exiting and releasing possibly allocated memory. This value is overriden by GLITE_WMS_WMPROXY_MAX_SERVED_REQUESTS environment variable, if set. This feature can be disabled by setting a lower-or-equal to zero value
Changed:
<
<
OperationLoadScripts: ClassAd type attribute where an internal attribute can be specified for any WMProxy provided operation. The names of these attributes are equal to the names of the server operations (e.g. for jobSubmit operation the attribute name to use is "jobSubmit"). This internal attributes are used to provide the path and the name of the script to be executed to verify the load of the WMProxy server for any provided operation. If the server load is too high the requested operation is refused. The path and the name of the script can be followed by user defined options and parameters depending on the specific script needs for arguments.

WMProxy provide a load script that can be used for any of the provided operations. The template load script glite_wms_wmproxy_load_monitor.template is installed by the rpm file glite-wms-wmproxy in the directory ${WMS_LOCATION_SBIN}.

To call the script glite_wms_wmproxy_load_monitor, when the operation jobSubmit is requested, with the options:

--load1 10 --load5 10 --load15 10 --memusage 95 --diskusage 95 --fdnum 500

add the attribute:
OperationLoadScripts [
>
>
OperationsLoadScripts: ClassAd type attribute where an internal attribute can be specified for any WMProxy provided operation. The names of these attributes are equal to the names of the server operations (e.g. for jobSubmit operation the attribute name to use is "jobSubmit"). This internal attributes are used to provide the path and the name of the script to be executed to verify the load of the WMProxy server for any provided operation. If the server load is too high the requested operation is refused. The path and the name of the script can be followed by user defined options and parameters depending on the specific script needs for arguments.

WMProxy provide a load script that can be used for any of the provided operations. The template load script glite_wms_wmproxy_load_monitor.template is installed by the rpm file glite-wms-wmproxy in the directory ${WMS_LOCATION_SBIN}.

To call the script glite_wms_wmproxy_load_monitor, when the operation jobSubmit is requested, with the options:

--load1 10 --load5 10 --load15 10 --memusage 95 --diskusage 95 --fdnum 500

add the attribute:
OperationsLoadScripts [
  jobSubmit = "${WMS_LOCATION_SBIN}/glite_wms_wmproxy_load_monitor --oper jobSubmit --load1 10 --load5 10 --load15 10 --memusage 95 --diskusage 95 --fdnum 500";

Revision 192011-08-30 - FabioCapannini

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

System Administrator Guide for WMS for EMI

Line: 112 to 112
 
  • $PX_HOST -> the hostname of a server myproxy, ex.: 'myproxy.$MY_DOMAIN'
  • $BDII_HOST -> the hostname of the site bdii to be used, ex: 'sitebdii.$MY_DOMAIN'
  • $LB_HOST -> the hostname of the LB server to be used, ex: 'lb-server.$MY_DOMAIN:9000' This variable is set as a service specific variable in the file services/glite-wms, located one directory below the one where the 'site-info.def' file is locate
Added:
>
>

0.0.0.1 Configure SELinux

In order for the httpd section of yaim to be run correctly, SELinux should be disabled, or, as an alternative, the certificates should be laballed correctly, see:

http://docs.fedoraproject.org/en-US/Fedora/13/html/Security-Enhanced_Linux/sect-Security-Enhanced_Linux-Working_with_SELinux-SELinux_Contexts_Labeling_Files.html
 

0.0.0.1 Run yaim

After having filled the siteinfo.def file, run yaim:

Revision 182011-07-05 - FabioCapannini

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

System Administrator Guide for WMS for EMI

Line: 470 to 470
  and thus the WMS would select only resources (i.e. CE/Views) belonging to CMS.
Added:
>
>

0.0.1 How to block/ban a user

Information about how to ban users is available in https://twiki.cnaf.infn.it/twiki/bin/view/EgeeJra1It/WMSServiceRefCard#How_to_block_ban_a_user

0.0.2 How to block/ban a VO

To ban a VO, it is suggested to reconfigure the service via yaim without that VO in the siteinfo.def

0.1 Job purging

Purging a WMS job means removing from the WMS node any relevant information about the job (e.g. the job sandbox area).

A job can be purged:

  • Explicitly by the administrator by invoking the command /usr/sbin/glite-wms-purgeStorage

  • Automatically by the wms services when a job is aborted and when the output of a completed job is retrieved by the user
  • Automatically by the following cron:
    3 */6 * * mon-sat glite . /usr/libexec/grid-env.sh ; /usr/sbin/glite-wms-purgeStorage.sh -l /var/log/wms/glite-wms-purgeStorage.log -p /var/SandboxDir -t 604800 > /dev/null 2>&1
    0 1 * * sun glite . /usr/libexec/grid-env.sh ; /usr/sbin/glite-wms-purgeStorage.sh -l /var/log/wms/glite-wms-purgeStorage.log -p /var/SandboxDir -o -s -t 1296000 > /dev/null 2>&1

For jobs submitted to a CREAM CE through the WMS, the purging is done by the ICE component of the WMS when it detects the job has reached a terminal status. The purging operation is not done if in the WMS conf file ( /etc/glite-wms/glite_wms.conf) the attribute purge_jobs in the ICE section is set to false.

 -- FabioCapannini - 2011-04-28 \ No newline at end of file

Revision 172011-07-04 - FabioCapannini

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

System Administrator Guide for WMS for EMI

Line: 435 to 435
 
  • jobcontoller_events.log contains logs of the jobcontroller component
  • ice.log contains logs of the ice component
  • glite-wms-purgeStorage.log log of the /etc/cron.d/glite-wms-purger.cron cron
Added:
>
>
For information on log files of other services running on the wms node, please refer to Service Reference Card
 

0.1 Network ports

Information about network ports is available in the Service Reference Card

Line: 442 to 443
 

0.1 Cron jobs

Information about cron jobs is available in the Service Reference Card

Deleted:
<
<

System Administrator Guide for WMS for EMI


Installation and configuration

Prerequisites

Operating System

A standard x86_64 SL(C)5 distribution is supposed to be properly installed. An EPEL repository must be installed on the machine.

Node synchronization

A general requirement for the Grid nodes is that they are synchronized. This requirement may be fulfilled in several ways. One of the most common one is using the NTP protocol with a time server.

Cron and logrotate

Many components deployed on the WMS rely on the presence of cron (including support for /etc/cron.* directories) and logrotate. You should make sure these utils are available on your system.

Installation

Repositories

For a successful installation, you will need to configure your package manager to reference a number of repositories (in addition to your OS);

  • the EPEL repository
  • the EMI middleware repository
  • the CA repository
and to REMOVE () or DEACTIVATE (!!!)

  • the DAG repository

The EPEL repository

You can install the EPEL repository, issuing:

rpm -Uvh http://download.fedora.redhat.com/pub/epel/5/i386/epel-release-5-4.noarch.rpm

The EMI middleware repository


The EMI-1 RC4 repository can be found under:

http://emisoft.web.cern.ch/emisoft/dist/EMI/1/RC4/sl5/x86_64

To use yum, the yum repo to be installed in /etc/yum.repos.d can be found at https://twiki.cern.ch/twiki/pub/EMI/EMI-1/rc4.repo

The Certification Authority repository

The most up-to-date version of the list of trusted Certification Authorities (CA) is needed on your node. The relevant yum repo can be installed issuing:

wget http://repository.egi.eu/sw/production/cas/1/current/repo-files/egi-trustanchors.repo -O /etc/yum.repos.d/egi-trustanchors.repo

Important note on automatic updates

An update of an RPM not followed by configuration can cause problems. Therefore WE STRONGLY RECOMMEND NOT TO USE AUTOMATIC UPDATE PROCEDURE OF ANY KIND.


Running the script available at http://forge.cnaf.infn.it/frs/download.php/101/disable_yum.sh (implemented by Giuseppe Platania (INFN Catania) yum autoupdate will be disabled

Installation of a WMS node

First of all, install the yum-protectbase rpm:

yum install yum-protectbase.noarch

Then proceed with the installation of the CA certificates.

Installation of the CA certificates

The CA certificate can be installed issuing:

yum install ca-policy-egi-core 

Installation of the WMS software

Install the WMS metapackage:

yum install emi-wms

Configuration

Using the YAIM configuration tool


For a detailed description on how to configure the middleware with YAIM, please check the YAIM guide.


The necessary YAIM modules needed to configure a certain node type are automatically installed with the middleware.

Configuration of a WMS node

Install host certificate


The WMS node requires the host certificate/key files to be installed. Contact your national Certification Authority (CA) to understand how to obtain a host certificate if you do not have one already.


Once you have obtained a valid certificate:

  • hostcert.pem - containing the machine public key
  • hostkey.pem - containing the machine private key
make sure to place the two files in the target node into the /etc/grid-security directory. Then set the proper mode and ownerships doing:
chown root.root /etc/grid-security/hostkey.pem

chmod 600 /etc/grid-security/hostcert.pem

chmod 400 /etc/grid-security/hostkey.pem

chown root.root /etc/grid-security/hostcert.pem

Configure the siteinfo.def file

Set your siteinfo.def file, which is the input file used by yaim. The yaim variables relevant for WMS are the following:

  • $WMS_HOST -> the WMS hostname, ex. : 'egee-rb-01.$MY_DOMAIN'
  • $PX_HOST -> the hostname of a server myproxy, ex.: 'myproxy.$MY_DOMAIN'
  • $BDII_HOST -> the hostname of the site bdii to be used, ex: 'sitebdii.$MY_DOMAIN'
  • $LB_HOST -> the hostname of the LB server to be used, ex: 'lb-server.$MY_DOMAIN:9000' This variable is set as a service specific variable in the file services/glite-wms, located one directory below the one where the 'site-info.def' file is locate

Run yaim

After having filled the siteinfo.def file, run yaim:

/opt/glite/yaim/bin/yaim -c -s <site-info.def> -n WMS

Configuration of the WMS CLI

The WMS CLI is part of the EMI-UI. To configure it please refer to xxx.

Operating the system

How to start the WMS service

A system administrator can start the WMS service by issuing:

service gLite start

A system administrator can stop the WMS service by issuing:

service gLite stop

Daemons

Scripts to check the daemons status and to start/stop are located in the ${GLITE_WMS_LOCATION}/etc/init.d/ directory (i.e. ${GLITE_WMS_LOCATION}/etc/init.d/glite-wms-wm start/stop/status). Glite production installation also provide a more generic service, called gLite, to manage all of them simultaneously, try service gLite status/start/stop On a typical WMS node the following services must be running:

  • glite-lb-locallogger:
     glite-lb-logd running
    glite-lb-interlogd running
  • glite-lb-proxy:
     glite-lb-proxy running as 4137
  • glite-proxy-renewald:
     glite-proxy-renewd running
  • globus-gridftp:
     globus-gridftp-server (pid 3107) is running...
  • glite-wms-jc:
     JobController running in pid: 10008
    CondorG master running in pid: 10063 10062
    CondorG schedd running in pid: 10070
  • glite-wms-lm:
     Logmonitor running...
  • glite-wms-wm:
     /opt/glite/bin/glite-wms-workload_manager (pid 9957) is running...
  • glite-wms-wmproxy:
    WMProxy httpd listening on port 7443
    httpd (pid 22223 22222 22221 22220 22219 22218 22217) is running ....
    ===
    WMProxy Server running instances:
    UID PID PPID C STIME TTY TIME CMD
  • glite-wms-ice:
    /opt/glite/bin/glite-wms-ice-safe (pid 10103) is running...

Init scripts

The init scripts are located under /etc/init.d and are the following:

/etc/init.d/globus-gridftp
/etc/init.d/glite-wms-wmproxy
/etc/init.d/glite-wms-wm
/etc/init.d/glite-wms-lm
/etc/init.d/glite-wms-jc
/etc/init.d/glite-wms-ice
/etc/init.d/glite-proxy-renewald
/etc/init.d/glite-lb-locallogger
/etc/init.d/glite-lb-bkserverd

Configuration Files

The configuration files are located under /etc/glite-wms and are the following:

/etc/glite-wms/glite_wms.conf
/etc/glite-wms/glite_wms_wmproxy.gacl
/etc/glite-wms/glite_wms_wmproxy_httpd.conf
/etc/glite-wms/wmproxy_logrotate.conf
/etc/glite-wms/.drain

For configuration files related to other services running on the wms node, please refer to Service Reference Card.

glite_wms.conf

This is the general configuration file for the WMS. The syntax is based on the ClassAd language. The parameter names are case insensitive. It is organised in sections: one for every running service plus a common section.

[
    Common = [...];
    JobController = [...];
    LogMonitor = [...];
    NetworkServer = [...];
    WorkloadManager = [...];
    WorkloadManagerProxy = [...];
    ICE = [...]
]

The value of a parameter can be expressed in terms of environment variables, with the typical UNIX shell syntax: a $ sign followed by the name of the variable in brackets (e.g. ${HOME}).

Common section

In general there is no need to change this section.

DGUser: the user under which a WMS process runs

LBProxy: Boolean attribute to switch from LB and LBProxy. If the value of this attribute is true, LBProxy is used for logging and query operations about jobs

WorkloadManagerProxy section

Very important paramenters are those that configure the so called limiter ( OperationLoadScripts) used to inhibit submission in the case that some system load limits are hit.
Since the WMS and LB suggested deployment is to have them on two separate physical machines a foundamental parameter is also LBServer.

The relevant parameters available in this section are the following:

LogFile: String attribute containing the path of the WMProxy log file

LogLevel: Integer attribute containing a value from 0 to 6 (Optional). The integer value represents the WMProxy log file verbosity level: from 0 (fatal) to 6 (debug: maximum verbosity)

SandboxStagingPath: Root directory where job sandboxes are stored. It MUST be in the form: <DocumentRoot>/<single directory name>, where DocumentRoot is set as inside glite_wms_wmproxy_httpd.conf configuration file. The directory MUST be accessible by the user under which WMProxy is running (usually it is the "glite" user). The user running WMProxy is determined by the value of the environment variable GLITE_USER, if not differently set with User directive inside glite_wms_wmproxy_httpd.conf configuration file

ListMatchRootPath: Directory path where temporary pipes for list-match operations are created. The directory MUST be accessible by the user under which WMProxy is running (usually it is the "glite" user). The user running WMProxy is determined by the value of the environment variable GLITE_USER, if not differently set with User directive inside glite_wms_wmproxy_httpd.conf configuration file

GridFTPPort: Port number where gridFTP server is listening

MinPerusalTimeInterval: Integer value representing the time interval (in seconds) between two savings of job partial execution output. This attribute affects the WMProxy and other components behaviour only if perusal functionality are explicity requested by the user via the JDL, see EnableFilePerusal JDL attribute

LBServer: Address or list of addresses of the LB Server[s] to be used for storing job's information in the format of <host>[:<port>] (default value for port is 9000). This attribute is needed only if LB Server is not running in the WMProxy server host, or if more than one LB Servers must be used. Selection of the LB Server to use is made randomically from the list by the WMProxy, for any different service request. WMproxy maintains a list of weights associated to the available LB Servers so that failing LB Servers have decreasing probability of being selected. If the Service Discovery is enabled, the LB Servers found using the Service Discovery are added in the list.

Note that the following lines have same meaning:

LBServer = "ghemon.cnaf.infn.it:9000";
LBServer = {"ghemon.cnaf.infn.it:9000"};

WeightsCacheValidity: Time in seconds (n) indicating the validity of the weights (i.e. probability to be selected) associated to the available LB Servers. When last weights update (i.e. last received request) has occurred more than n seconds ago then the weights are restored to the same value for all LB Servers

LBLocalLogger: Address of LB Local Logger in the format of <host>[:<port>] (default value for port is 9002). This attribute is needed only if LB Local Logger runs on another host and LBProxy is not enabled

AsyncJobStart: Boolean attribute used to switch from synchronous/asynchronous job start behavior. When set to true, during job start operation the control is returned to user immediately after the request has been received, while the actual execution of the operation (that could be quite time consuming) is performed asynchronously

EnableServiceDiscovery: Boolean attribute to enable Service Discovery. If the value of this attribute is true, the Service Discovery is enabled, i.e. WMProxy invokes Service Discovery for finding available LB Servers

ServiceDiscoveryInfoValidity: Time in seconds (n) indicating the validity of the information provided by the Service Discovery. A call to Service Discovery for updated information is done every n seconds

LBServiceDiscoveryType: Type key for LB Servers to be discovered by Service Discovery

MaxServedRequests: Long attribute limiting the number of operation served by each WMProxy instance before exiting and releasing possibly allocated memory. This value is overriden by GLITE_WMS_WMPROXY_MAX_SERVED_REQUESTS environment variable, if set. This feature can be disabled by setting a lower-or-equal to zero value

OperationLoadScripts: ClassAd type attribute where an internal attribute can be specified for any WMProxy provided operation. The names of these attributes are equal to the names of the server operations (e.g. for jobSubmit operation the attribute name to use is "jobSubmit"). This internal attributes are used to provide the path and the name of the script to be executed to verify the load of the WMProxy server for any provided operation. If the server load is too high the requested operation is refused. The path and the name of the script can be followed by user defined options and parameters depending on the specific script needs for arguments.

WMProxy provide a load script that can be used for any of the provided operations. The template load script glite_wms_wmproxy_load_monitor.template is installed by the rpm file glite-wms-wmproxy in the directory ${WMS_LOCATION_SBIN}.

To call the script glite_wms_wmproxy_load_monitor, when the operation jobSubmit is requested, with the options:

--load1 10 --load5 10 --load15 10 --memusage 95 --diskusage 95 --fdnum 500

add the attribute:

OperationLoadScripts [
   jobSubmit = "${WMS_LOCATION_SBIN}/glite_wms_wmproxy_load_monitor 
      --oper jobSubmit --load1 10 --load5 10 --load15 10 
      --memusage 95 --diskusage 95 --fdnum 500";
]

Any kind of load script file can be used. If a user custom script is used, the only rule to follow is that the script exit value must be 0 in the case the operation can continue the execution, 1 in the opposite case (operation refused - Server load too high).
The script files must be executable and must have the proper access permissions

Workload Manager section

Important parameters in this section are:

EnableBulkMM = true; //enable the bulk matchmaking for collection
IsmBlackList // allow to set a list of CEs that are banned
IsmUpdateRate = 600; // information supermarket update rate (in seconds)
WorkerThreads = 5; // enable the multithread for the WM component. Speed up the matchmaking process. 5 is a god compromise between machine load and speed.

CeForwardParameters: the parameters forwarded by the WM to the CE

CeMonitorAsynchPort: the port used to listen to notification arriving from CEMon's. A value of -1 means that listening is disabled

CeMonitorServices: the list of CEMon's the WM listens to

DispatcherType: the WM can read its input using different mechanisms. Currently supported types are "filelist" and "jobdir"

EnableRecovery: specifies if at startup the WM should perform a special recovery procedure for the requests that it finds already in its input.

ExpiryPeriod: the maximum time, expressed in seconds, a submitted job is kept in the overall system, from the time it arrives for the first time at the WM

Input: the input source of new requests. If DispatcherType is "filelist" the source is a file; if DispatcherType is "jobdir" the source is the base dir of a JobDir structure, which is supposed to be already in place when the WM starts. A JobDir structure consists of a base dir under which lie other three subdirectories, named tmp, new, old

IsmBlackList: a list of CEs that have to be excluded in the ISM

IsmDump: if the ISM dump is enabled, the dump, in ClassAd format, will be written to this file. In order to avoid file corruptions, the contents of a dump are built in a temporary file, whose name is the same value of this parameter with the prefix ".tmp|, which only at the end of the operation is renamed to the specified file

IsmIiPurchasingRate: the period between two ISM purchases from the BDII, in seconds

IsmThreads: All the threads releated to the ISM management are taken from the thread pool or created separately

IsmUpdateRate: the period between two updates of the ISM, in seconds. Note that conceptually purchasing just retrieves the list of available resources, whereas an ISM update gathers the resource information for each resource.

JobWrapperTemplateDir: the job wrapper sent to the CE and then executed on Worker Node is based on a bash template which is augmented by the WM with job-specific information. This is the location where all the templates - one at the moment - are stored

LogFile: the name of the file where messages are logged

LogLevel: each logging statement in the code specifies a log level. If that level is less than or equal to LogLevel the message is actually logged, otherwise it is ignored. The levels go from 1 (minimum verbosity) to 6 (maximum verbosity)

MatchRetryPeriod: once a job becomes pending, meaning that there are no resources available, this parameter represents the period between successive match-making attempts, in seconds

MaxOutputSandboxSize: the maximum size of the output sandbox, in bytes. The limit is currently enforced by the job wrapper running on the Worker Node, which doesn't upload more data than what specified here. If the value is -1 there is no limit.

MaxRetryCount: the system limit to the number of deep resubmissions for a job. The actual limit is the minimum between this value and the one specified in the job description

QueueSize: (def=1000) Size of the queue of events "ready" to be managed by the workers thread pool

RuntimeMalloc: allows to the use an alternative malloc library (examples are nedmalloc, google performance tools, ccmalloc), specifying the path to the shared object, to be loaded with LD_PRELOAD. Example: RuntimeMalloc = "/usr/lib64/libtcmalloc_minimal.so".

WorkerThreads: the number of request handler threads

LogMonitor section

Usually there is no need to change the default parameters with the exceptio of: RemoveJobFiles = true;
That by default is set to false.
Setting it to true will force condor to remove unused internal files when the job are in a final state.

The relevant parameters available in this section are the following:

LogFile: String attribute containing the path of the JobController log file

LogLevel: Log verbosity level. If that level is less than or equal to LogLevel the message is actually logged, otherwise it is ignored. The level goes from 1 (minimum verbosity) to 6 (maximum verbosity)

LockFile: Path of the lock file for the service

CondorLogDir: Path of the directory where LogMonitor stores the CondorG log files

CondorLogRecycleDir: Path of the directory where old CondorG log files are saved

JobsPerCondorLog: Max number of job logged in the same CondorG log file

GlobusDownTimeout: Log monitor waits this number of seconds before considering as failed a condor job which has lost contact with the CE and so resubmitting it if possible

MainLoopDuration: LogMonitor loops between the CondorLog files every this number of seconds

MonitorInternalDir: Path of the directory where LogMonitor stores its own files

IdRepositoryName: Name of the file containing pieces of information about the jobs used by LogMonitor and JobController

ExternalLogFile: Path of the directory where extra log files are stored

RemoveJobFiles: If sets to true all files used to submit jobs to condor are removed when they are no more necessary. Set it to "false" only for debug purpose. Files are stored in the SubmitFileDir directory as it is set in the JobController section

Job Controller section

Usually there is no need to change the default parameters.

The relevant parameters available in this section are the following:

LogFile: String attribute containing the path of the JobController log file

LogLevel: Log verbosity level. If that level is less than or equal to LogLevel the message is actually logged, otherwise it is ignored. The level goes from 1 (minimum verbosity) to 6 (maximum verbosity)

LockFile: Path of the lock file for the service

CondorSubmit: Path of the "condor_submit" command

CondorRemove: Path of the "condor_remove" command

CondorDagman: Path of the "condor_dagman" command

DagmanMaxPre: Sets the maximum number of PRE scripts within the DAG that may be running at one time; it is the "-maxpre" parameter of the condor_dagman command

MaximumTimeAllowedForCondorMatch: Sets the number of seconds that a job can wait in the condor queue to be matched before being resubmitted

ContainerRefreshThreshold: Number of jobs that JobController can take in memory before resyncronizing its container with the one phisically saved in the file "IdRepositoryName" (see LM section)

InputType: The JobController can read its input using different mechanisms. Currently supported types are "filelist" and "jobdir"

Input: The input source of new requests. If InputType is "filelist" the source is a file; if InputType is "jobdir" the source is the base dir of a JobDir structure, which is supposed to be already in place when the JobController starts. A JobDir structure consists of a base dir under which lie other three subdirectories, named tmp, new, old

SubmitFileDir: Path of the directory where the submit files for condor are stored

OutputFileDir: Path of the directory where the standard error/output files of the jobs (e.g. the JobWrapper) are stored

ICE section

The parameters for ICE are available in the ICE Configuration Guide

Network Server section

Although the Network Server is no more installed on WMS nodes some configuration paramenters in its section of the global conf file are still needed.

The important parameters in this section are those regardig the contact with the information system. In particular:

  • II_Contact = "egee-bdii.cnaf.infn.it"; set the hostname of the bdii to be contacted
  • II_Port = 2170; set the port on which the bdii is contacted
  • Gris_DN = "mds-vo-name=local, o=grid"; set the path where the bdii is publishing information
  • II_Timeout = 100; Set the timeout for the bdii query. It is important that this value is not too small, it is very dangerous if many bdii queries fail for timeout reasons. The risk is that all the information on the InformationSupermarker expire making all jobs in Waiting Status not to match any CE (they remain in Waiting Status for a long time, until a query to the bdii is successful). By default that value is set to 30, but 100 is a safer.
  • MaxInputSandboxSize = 10000000; this puts a limit in the dimension of the input sandbox of the jdl. Units are byte

glite_wms_wmproxy.gacl

WMS User Authentication is performed by the WMProxy component based on a GACL module.
The fundamental file used to manages the WMS authentication is the /etc/glite-wms/glite_wms_wmproxy.gacl file.
This file contains the name of the VO that are allowed to use the WMS. A .gacl file example that allows the dteam and ops VOs is the following:

<pre><gacl version='0.0.1'>
 <entry>
   <voms>
     <fqan>/ops/Role=NULL/Capability=NULL</fqan>
   </voms>
   <allow>
     <exec/>
   </allow>
 </entry>
 <entry>
   <voms>
     <fqan>/dteam/Role=NULL/Capability=NULL</fqan>
   </voms>
   <allow>
     <exec/>
   </allow>
 </entry>
</pre>

There must be an exact match between the fqan expressed in the gacl file and the one in the user proxy.

The gacl file can also contain the DNs of single users that are allow to use the WMS resources. The following entry in the .gacl file will allow Daniele Cesini to use the WMS even if he is not in the VOs allowed to use the WMS:

<pre><entry>
      <user>
        <dn>/C=IT/O=INFN/OU=Personal Certificate/L=CNAF/CN=Daniele Cesini/Email=daniele.cesini@cnaf.infn.it/</dn>
      </user>
      <allow><exec/></allow>
    </entry>
</pre>

An entry with a DENY tag can be used to ban users or VOs:

<pre><entry>
      <voms>
        <fqan>/dteam/Role=admin/Capability=NULL</fqan>
      </voms>
      <deny><exec/></deny>
    </entry>
</pre>

As the previous examples shows, it is possible to allow/ban users and VOs on the basis of their FQAN (i.e. those returned by the voms-proxy-info --fqan command).

User Mapping on a WMS node is done through lcmaps as in any other gLite services, so fundamental places to look in case of mapping problems are:

  • the gridmap file: /etc/grid-security/grid-mapfile;
  • the lcmaps log: /var/log/glite/lcmaps.log;
  • the gridmapdir: /etc/grid-security/gridmapdir/;
  • the existing pool accounts for a VO or a VO group/role

glite_wms_wmproxy_httpd.conf

This file is a WMProxy specific configuration file configuring the HTTP daemon and Fast CGI

wmproxy_logrotate.conf

This file configures the logrotate tool, that performs the rotation of httpd-wmproxy-access and httpd-wmproxy-errors HTTPD daemon log files

.drain

This file is used to put the WMS in draining mode, so that it does not accept new submission requests but allows any other operations like output retrieval

Log files

The WMS log files, located under /var/log/wms, are the following:

  • workload_manager_events.log contains logs of the workload manager component
  • wmproxy.log contains logs of the wmproxy component
  • httpd-wmproxy-access.log wmproxy httpd access log
  • httpd-wmproxy-errors.log wmproxy httpd error log
  • glite-wms-wmproxy.restart.cron.log log of the /etc/cron.d/glite-wms-wmproxy.restart.cron cron
  • glite-wms-wmproxy-purge-proxycache.log log of the /etc/cron.d/glite-wms-wmproxy-purge-proxycache.cron cron
  • wmproxy_logrotate.log contains logs about the rotation of wmproxy httpd log files
  • renewal.log proxy renewal service log
  • logmonitor_events.log contains logs of the logmonitor component
  • jobcontoller_events.log contains logs of the jobcontroller component
  • ice.log contains logs of the ice component
  • glite-wms-purgeStorage.log log of the /etc/cron.d/glite-wms-purger.cron cron
For information about log files related to other services running on the wms node, please refer to Service Reference Card.

Network ports

Information about network ports is available in the Service Reference Card

Cron jobs

Information about cron jobs is available in the Service Reference Card

 

0.1 Security related operations

Revision 162011-07-04 - FabioCapannini

Line: 1 to 1
 
META TOPICPARENT name="WebHome"
Changed:
<
<

System Administrator Guide for WMS

>
>

System Administrator Guide for WMS for EMI

 
Line: 170 to 170
 /etc/glite-wms/.drain
Added:
>
>
For configuration files related to other services running on the wms node, please refer to Service Reference Card.
 

0.0.1 glite_wms.conf

This is the general configuration file for the WMS. The syntax is based on the ClassAd language. The parameter names are case insensitive. It is organised in sections: one for every running service plus a common section.

Line: 433 to 435
 
  • jobcontoller_events.log contains logs of the jobcontroller component
  • ice.log contains logs of the ice component
  • glite-wms-purgeStorage.log log of the /etc/cron.d/glite-wms-purger.cron cron
Added:
>
>

0.1 Network ports

Information about network ports is available in the Service Reference Card

0.2 Cron jobs

Information about cron jobs is available in the Service Reference Card

System Administrator Guide for WMS for EMI


Installation and configuration

Prerequisites

Operating System

A standard x86_64 SL(C)5 distribution is supposed to be properly installed. An EPEL repository must be installed on the machine.

Node synchronization

A general requirement for the Grid nodes is that they are synchronized. This requirement may be fulfilled in several ways. One of the most common one is using the NTP protocol with a time server.

Cron and logrotate

Many components deployed on the WMS rely on the presence of cron (including support for /etc/cron.* directories) and logrotate. You should make sure these utils are available on your system.

Installation

Repositories

For a successful installation, you will need to configure your package manager to reference a number of repositories (in addition to your OS);

  • the EPEL repository
  • the EMI middleware repository
  • the CA repository
and to REMOVE () or DEACTIVATE (!!!)

  • the DAG repository

The EPEL repository

You can install the EPEL repository, issuing:

rpm -Uvh http://download.fedora.redhat.com/pub/epel/5/i386/epel-release-5-4.noarch.rpm

The EMI middleware repository


The EMI-1 RC4 repository can be found under:

http://emisoft.web.cern.ch/emisoft/dist/EMI/1/RC4/sl5/x86_64

To use yum, the yum repo to be installed in /etc/yum.repos.d can be found at https://twiki.cern.ch/twiki/pub/EMI/EMI-1/rc4.repo

The Certification Authority repository

The most up-to-date version of the list of trusted Certification Authorities (CA) is needed on your node. The relevant yum repo can be installed issuing:

wget http://repository.egi.eu/sw/production/cas/1/current/repo-files/egi-trustanchors.repo -O /etc/yum.repos.d/egi-trustanchors.repo

Important note on automatic updates

An update of an RPM not followed by configuration can cause problems. Therefore WE STRONGLY RECOMMEND NOT TO USE AUTOMATIC UPDATE PROCEDURE OF ANY KIND.


Running the script available at http://forge.cnaf.infn.it/frs/download.php/101/disable_yum.sh (implemented by Giuseppe Platania (INFN Catania) yum autoupdate will be disabled

Installation of a WMS node

First of all, install the yum-protectbase rpm:

yum install yum-protectbase.noarch

Then proceed with the installation of the CA certificates.

Installation of the CA certificates

The CA certificate can be installed issuing:

yum install ca-policy-egi-core 

Installation of the WMS software

Install the WMS metapackage:

yum install emi-wms

Configuration

Using the YAIM configuration tool


For a detailed description on how to configure the middleware with YAIM, please check the YAIM guide.


The necessary YAIM modules needed to configure a certain node type are automatically installed with the middleware.

Configuration of a WMS node

Install host certificate


The WMS node requires the host certificate/key files to be installed. Contact your national Certification Authority (CA) to understand how to obtain a host certificate if you do not have one already.


Once you have obtained a valid certificate:

  • hostcert.pem - containing the machine public key
  • hostkey.pem - containing the machine private key
make sure to place the two files in the target node into the /etc/grid-security directory. Then set the proper mode and ownerships doing:
chown root.root /etc/grid-security/hostkey.pem

chmod 600 /etc/grid-security/hostcert.pem

chmod 400 /etc/grid-security/hostkey.pem

chown root.root /etc/grid-security/hostcert.pem

Configure the siteinfo.def file

Set your siteinfo.def file, which is the input file used by yaim. The yaim variables relevant for WMS are the following:

  • $WMS_HOST -> the WMS hostname, ex. : 'egee-rb-01.$MY_DOMAIN'
  • $PX_HOST -> the hostname of a server myproxy, ex.: 'myproxy.$MY_DOMAIN'
  • $BDII_HOST -> the hostname of the site bdii to be used, ex: 'sitebdii.$MY_DOMAIN'
  • $LB_HOST -> the hostname of the LB server to be used, ex: 'lb-server.$MY_DOMAIN:9000' This variable is set as a service specific variable in the file services/glite-wms, located one directory below the one where the 'site-info.def' file is locate

Run yaim

After having filled the siteinfo.def file, run yaim:

/opt/glite/yaim/bin/yaim -c -s <site-info.def> -n WMS

Configuration of the WMS CLI

The WMS CLI is part of the EMI-UI. To configure it please refer to xxx.

Operating the system

How to start the WMS service

A system administrator can start the WMS service by issuing:

service gLite start

A system administrator can stop the WMS service by issuing:

service gLite stop

Daemons

Scripts to check the daemons status and to start/stop are located in the ${GLITE_WMS_LOCATION}/etc/init.d/ directory (i.e. ${GLITE_WMS_LOCATION}/etc/init.d/glite-wms-wm start/stop/status). Glite production installation also provide a more generic service, called gLite, to manage all of them simultaneously, try service gLite status/start/stop On a typical WMS node the following services must be running:

  • glite-lb-locallogger:
     glite-lb-logd running
    glite-lb-interlogd running
  • glite-lb-proxy:
     glite-lb-proxy running as 4137
  • glite-proxy-renewald:
     glite-proxy-renewd running
  • globus-gridftp:
     globus-gridftp-server (pid 3107) is running...
  • glite-wms-jc:
     JobController running in pid: 10008
    CondorG master running in pid: 10063 10062
    CondorG schedd running in pid: 10070
  • glite-wms-lm:
     Logmonitor running...
  • glite-wms-wm:
     /opt/glite/bin/glite-wms-workload_manager (pid 9957) is running...
  • glite-wms-wmproxy:
    WMProxy httpd listening on port 7443
    httpd (pid 22223 22222 22221 22220 22219 22218 22217) is running ....
    ===
    WMProxy Server running instances:
    UID PID PPID C STIME TTY TIME CMD
  • glite-wms-ice:
    /opt/glite/bin/glite-wms-ice-safe (pid 10103) is running...

Init scripts

The init scripts are located under /etc/init.d and are the following:

/etc/init.d/globus-gridftp
/etc/init.d/glite-wms-wmproxy
/etc/init.d/glite-wms-wm
/etc/init.d/glite-wms-lm
/etc/init.d/glite-wms-jc
/etc/init.d/glite-wms-ice
/etc/init.d/glite-proxy-renewald
/etc/init.d/glite-lb-locallogger
/etc/init.d/glite-lb-bkserverd

Configuration Files

The configuration files are located under /etc/glite-wms and are the following:

/etc/glite-wms/glite_wms.conf
/etc/glite-wms/glite_wms_wmproxy.gacl
/etc/glite-wms/glite_wms_wmproxy_httpd.conf
/etc/glite-wms/wmproxy_logrotate.conf
/etc/glite-wms/.drain

For configuration files related to other services running on the wms node, please refer to Service Reference Card.

glite_wms.conf

This is the general configuration file for the WMS. The syntax is based on the ClassAd language. The parameter names are case insensitive. It is organised in sections: one for every running service plus a common section.

[
    Common = [...];
    JobController = [...];
    LogMonitor = [...];
    NetworkServer = [...];
    WorkloadManager = [...];
    WorkloadManagerProxy = [...];
    ICE = [...]
]

The value of a parameter can be expressed in terms of environment variables, with the typical UNIX shell syntax: a $ sign followed by the name of the variable in brackets (e.g. ${HOME}).

Common section

In general there is no need to change this section.

DGUser: the user under which a WMS process runs

LBProxy: Boolean attribute to switch from LB and LBProxy. If the value of this attribute is true, LBProxy is used for logging and query operations about jobs

WorkloadManagerProxy section

Very important paramenters are those that configure the so called limiter ( OperationLoadScripts) used to inhibit submission in the case that some system load limits are hit.
Since the WMS and LB suggested deployment is to have them on two separate physical machines a foundamental parameter is also LBServer.

The relevant parameters available in this section are the following:

LogFile: String attribute containing the path of the WMProxy log file

LogLevel: Integer attribute containing a value from 0 to 6 (Optional). The integer value represents the WMProxy log file verbosity level: from 0 (fatal) to 6 (debug: maximum verbosity)

SandboxStagingPath: Root directory where job sandboxes are stored. It MUST be in the form: <DocumentRoot>/<single directory name>, where DocumentRoot is set as inside glite_wms_wmproxy_httpd.conf configuration file. The directory MUST be accessible by the user under which WMProxy is running (usually it is the "glite" user). The user running WMProxy is determined by the value of the environment variable GLITE_USER, if not differently set with User directive inside glite_wms_wmproxy_httpd.conf configuration file

ListMatchRootPath: Directory path where temporary pipes for list-match operations are created. The directory MUST be accessible by the user under which WMProxy is running (usually it is the "glite" user). The user running WMProxy is determined by the value of the environment variable GLITE_USER, if not differently set with User directive inside glite_wms_wmproxy_httpd.conf configuration file

GridFTPPort: Port number where gridFTP server is listening

MinPerusalTimeInterval: Integer value representing the time interval (in seconds) between two savings of job partial execution output. This attribute affects the WMProxy and other components behaviour only if perusal functionality are explicity requested by the user via the JDL, see EnableFilePerusal JDL attribute

LBServer: Address or list of addresses of the LB Server[s] to be used for storing job's information in the format of <host>[:<port>] (default value for port is 9000). This attribute is needed only if LB Server is not running in the WMProxy server host, or if more than one LB Servers must be used. Selection of the LB Server to use is made randomically from the list by the WMProxy, for any different service request. WMproxy maintains a list of weights associated to the available LB Servers so that failing LB Servers have decreasing probability of being selected. If the Service Discovery is enabled, the LB Servers found using the Service Discovery are added in the list.

Note that the following lines have same meaning:

LBServer = "ghemon.cnaf.infn.it:9000";
LBServer = {"ghemon.cnaf.infn.it:9000"};

WeightsCacheValidity: Time in seconds (n) indicating the validity of the weights (i.e. probability to be selected) associated to the available LB Servers. When last weights update (i.e. last received request) has occurred more than n seconds ago then the weights are restored to the same value for all LB Servers

LBLocalLogger: Address of LB Local Logger in the format of <host>[:<port>] (default value for port is 9002). This attribute is needed only if LB Local Logger runs on another host and LBProxy is not enabled

AsyncJobStart: Boolean attribute used to switch from synchronous/asynchronous job start behavior. When set to true, during job start operation the control is returned to user immediately after the request has been received, while the actual execution of the operation (that could be quite time consuming) is performed asynchronously

EnableServiceDiscovery: Boolean attribute to enable Service Discovery. If the value of this attribute is true, the Service Discovery is enabled, i.e. WMProxy invokes Service Discovery for finding available LB Servers

ServiceDiscoveryInfoValidity: Time in seconds (n) indicating the validity of the information provided by the Service Discovery. A call to Service Discovery for updated information is done every n seconds

LBServiceDiscoveryType: Type key for LB Servers to be discovered by Service Discovery

MaxServedRequests: Long attribute limiting the number of operation served by each WMProxy instance before exiting and releasing possibly allocated memory. This value is overriden by GLITE_WMS_WMPROXY_MAX_SERVED_REQUESTS environment variable, if set. This feature can be disabled by setting a lower-or-equal to zero value

OperationLoadScripts: ClassAd type attribute where an internal attribute can be specified for any WMProxy provided operation. The names of these attributes are equal to the names of the server operations (e.g. for jobSubmit operation the attribute name to use is "jobSubmit"). This internal attributes are used to provide the path and the name of the script to be executed to verify the load of the WMProxy server for any provided operation. If the server load is too high the requested operation is refused. The path and the name of the script can be followed by user defined options and parameters depending on the specific script needs for arguments.

WMProxy provide a load script that can be used for any of the provided operations. The template load script glite_wms_wmproxy_load_monitor.template is installed by the rpm file glite-wms-wmproxy in the directory ${WMS_LOCATION_SBIN}.

To call the script glite_wms_wmproxy_load_monitor, when the operation jobSubmit is requested, with the options:

--load1 10 --load5 10 --load15 10 --memusage 95 --diskusage 95 --fdnum 500

add the attribute:

OperationLoadScripts [
   jobSubmit = "${WMS_LOCATION_SBIN}/glite_wms_wmproxy_load_monitor 
      --oper jobSubmit --load1 10 --load5 10 --load15 10 
      --memusage 95 --diskusage 95 --fdnum 500";
]

Any kind of load script file can be used. If a user custom script is used, the only rule to follow is that the script exit value must be 0 in the case the operation can continue the execution, 1 in the opposite case (operation refused - Server load too high).
The script files must be executable and must have the proper access permissions

Workload Manager section

Important parameters in this section are:

EnableBulkMM = true; //enable the bulk matchmaking for collection
IsmBlackList // allow to set a list of CEs that are banned
IsmUpdateRate = 600; // information supermarket update rate (in seconds)
WorkerThreads = 5; // enable the multithread for the WM component. Speed up the matchmaking process. 5 is a god compromise between machine load and speed.

CeForwardParameters: the parameters forwarded by the WM to the CE

CeMonitorAsynchPort: the port used to listen to notification arriving from CEMon's. A value of -1 means that listening is disabled

CeMonitorServices: the list of CEMon's the WM listens to

DispatcherType: the WM can read its input using different mechanisms. Currently supported types are "filelist" and "jobdir"

EnableRecovery: specifies if at startup the WM should perform a special recovery procedure for the requests that it finds already in its input.

ExpiryPeriod: the maximum time, expressed in seconds, a submitted job is kept in the overall system, from the time it arrives for the first time at the WM

Input: the input source of new requests. If DispatcherType is "filelist" the source is a file; if DispatcherType is "jobdir" the source is the base dir of a JobDir structure, which is supposed to be already in place when the WM starts. A JobDir structure consists of a base dir under which lie other three subdirectories, named tmp, new, old

IsmBlackList: a list of CEs that have to be excluded in the ISM

IsmDump: if the ISM dump is enabled, the dump, in ClassAd format, will be written to this file. In order to avoid file corruptions, the contents of a dump are built in a temporary file, whose name is the same value of this parameter with the prefix ".tmp|, which only at the end of the operation is renamed to the specified file

IsmIiPurchasingRate: the period between two ISM purchases from the BDII, in seconds

IsmThreads: All the threads releated to the ISM management are taken from the thread pool or created separately

IsmUpdateRate: the period between two updates of the ISM, in seconds. Note that conceptually purchasing just retrieves the list of available resources, whereas an ISM update gathers the resource information for each resource.

JobWrapperTemplateDir: the job wrapper sent to the CE and then executed on Worker Node is based on a bash template which is augmented by the WM with job-specific information. This is the location where all the templates - one at the moment - are stored

LogFile: the name of the file where messages are logged

LogLevel: each logging statement in the code specifies a log level. If that level is less than or equal to LogLevel the message is actually logged, otherwise it is ignored. The levels go from 1 (minimum verbosity) to 6 (maximum verbosity)

MatchRetryPeriod: once a job becomes pending, meaning that there are no resources available, this parameter represents the period between successive match-making attempts, in seconds

MaxOutputSandboxSize: the maximum size of the output sandbox, in bytes. The limit is currently enforced by the job wrapper running on the Worker Node, which doesn't upload more data than what specified here. If the value is -1 there is no limit.

MaxRetryCount: the system limit to the number of deep resubmissions for a job. The actual limit is the minimum between this value and the one specified in the job description

QueueSize: (def=1000) Size of the queue of events "ready" to be managed by the workers thread pool

RuntimeMalloc: allows to the use an alternative malloc library (examples are nedmalloc, google performance tools, ccmalloc), specifying the path to the shared object, to be loaded with LD_PRELOAD. Example: RuntimeMalloc = "/usr/lib64/libtcmalloc_minimal.so".

WorkerThreads: the number of request handler threads

LogMonitor section

Usually there is no need to change the default parameters with the exceptio of: RemoveJobFiles = true;
That by default is set to false.
Setting it to true will force condor to remove unused internal files when the job are in a final state.

The relevant parameters available in this section are the following:

LogFile: String attribute containing the path of the JobController log file

LogLevel: Log verbosity level. If that level is less than or equal to LogLevel the message is actually logged, otherwise it is ignored. The level goes from 1 (minimum verbosity) to 6 (maximum verbosity)

LockFile: Path of the lock file for the service

CondorLogDir: Path of the directory where LogMonitor stores the CondorG log files

CondorLogRecycleDir: Path of the directory where old CondorG log files are saved

JobsPerCondorLog: Max number of job logged in the same CondorG log file

GlobusDownTimeout: Log monitor waits this number of seconds before considering as failed a condor job which has lost contact with the CE and so resubmitting it if possible

MainLoopDuration: LogMonitor loops between the CondorLog files every this number of seconds

MonitorInternalDir: Path of the directory where LogMonitor stores its own files

IdRepositoryName: Name of the file containing pieces of information about the jobs used by LogMonitor and JobController

ExternalLogFile: Path of the directory where extra log files are stored

RemoveJobFiles: If sets to true all files used to submit jobs to condor are removed when they are no more necessary. Set it to "false" only for debug purpose. Files are stored in the SubmitFileDir directory as it is set in the JobController section

Job Controller section

Usually there is no need to change the default parameters.

The relevant parameters available in this section are the following:

LogFile: String attribute containing the path of the JobController log file

LogLevel: Log verbosity level. If that level is less than or equal to LogLevel the message is actually logged, otherwise it is ignored. The level goes from 1 (minimum verbosity) to 6 (maximum verbosity)

LockFile: Path of the lock file for the service

CondorSubmit: Path of the "condor_submit" command

CondorRemove: Path of the "condor_remove" command

CondorDagman: Path of the "condor_dagman" command

DagmanMaxPre: Sets the maximum number of PRE scripts within the DAG that may be running at one time; it is the "-maxpre" parameter of the condor_dagman command

MaximumTimeAllowedForCondorMatch: Sets the number of seconds that a job can wait in the condor queue to be matched before being resubmitted

ContainerRefreshThreshold: Number of jobs that JobController can take in memory before resyncronizing its container with the one phisically saved in the file "IdRepositoryName" (see LM section)

InputType: The JobController can read its input using different mechanisms. Currently supported types are "filelist" and "jobdir"

Input: The input source of new requests. If InputType is "filelist" the source is a file; if InputType is "jobdir" the source is the base dir of a JobDir structure, which is supposed to be already in place when the JobController starts. A JobDir structure consists of a base dir under which lie other three subdirectories, named tmp, new, old

SubmitFileDir: Path of the directory where the submit files for condor are stored

OutputFileDir: Path of the directory where the standard error/output files of the jobs (e.g. the JobWrapper) are stored

ICE section

The parameters for ICE are available in the ICE Configuration Guide

Network Server section

Although the Network Server is no more installed on WMS nodes some configuration paramenters in its section of the global conf file are still needed.

The important parameters in this section are those regardig the contact with the information system. In particular:

  • II_Contact = "egee-bdii.cnaf.infn.it"; set the hostname of the bdii to be contacted
  • II_Port = 2170; set the port on which the bdii is contacted
  • Gris_DN = "mds-vo-name=local, o=grid"; set the path where the bdii is publishing information
  • II_Timeout = 100; Set the timeout for the bdii query. It is important that this value is not too small, it is very dangerous if many bdii queries fail for timeout reasons. The risk is that all the information on the InformationSupermarker expire making all jobs in Waiting Status not to match any CE (they remain in Waiting Status for a long time, until a query to the bdii is successful). By default that value is set to 30, but 100 is a safer.
  • MaxInputSandboxSize = 10000000; this puts a limit in the dimension of the input sandbox of the jdl. Units are byte

glite_wms_wmproxy.gacl

WMS User Authentication is performed by the WMProxy component based on a GACL module.
The fundamental file used to manages the WMS authentication is the /etc/glite-wms/glite_wms_wmproxy.gacl file.
This file contains the name of the VO that are allowed to use the WMS. A .gacl file example that allows the dteam and ops VOs is the following:

<pre><gacl version='0.0.1'>
 <entry>
   <voms>
     <fqan>/ops/Role=NULL/Capability=NULL</fqan>
   </voms>
   <allow>
     <exec/>
   </allow>
 </entry>
 <entry>
   <voms>
     <fqan>/dteam/Role=NULL/Capability=NULL</fqan>
   </voms>
   <allow>
     <exec/>
   </allow>
 </entry>
</pre>

There must be an exact match between the fqan expressed in the gacl file and the one in the user proxy.

The gacl file can also contain the DNs of single users that are allow to use the WMS resources. The following entry in the .gacl file will allow Daniele Cesini to use the WMS even if he is not in the VOs allowed to use the WMS:

<pre><entry>
      <user>
        <dn>/C=IT/O=INFN/OU=Personal Certificate/L=CNAF/CN=Daniele Cesini/Email=daniele.cesini@cnaf.infn.it/</dn>
      </user>
      <allow><exec/></allow>
    </entry>
</pre>

An entry with a DENY tag can be used to ban users or VOs:

<pre><entry>
      <voms>
        <fqan>/dteam/Role=admin/Capability=NULL</fqan>
      </voms>
      <deny><exec/></deny>
    </entry>
</pre>

As the previous examples shows, it is possible to allow/ban users and VOs on the basis of their FQAN (i.e. those returned by the voms-proxy-info --fqan command).

User Mapping on a WMS node is done through lcmaps as in any other gLite services, so fundamental places to look in case of mapping problems are:

  • the gridmap file: /etc/grid-security/grid-mapfile;
  • the lcmaps log: /var/log/glite/lcmaps.log;
  • the gridmapdir: /etc/grid-security/gridmapdir/;
  • the existing pool accounts for a VO or a VO group/role

glite_wms_wmproxy_httpd.conf

This file is a WMProxy specific configuration file configuring the HTTP daemon and Fast CGI

wmproxy_logrotate.conf

This file configures the logrotate tool, that performs the rotation of httpd-wmproxy-access and httpd-wmproxy-errors HTTPD daemon log files

.drain

This file is used to put the WMS in draining mode, so that it does not accept new submission requests but allows any other operations like output retrieval

Log files

The WMS log files, located under /var/log/wms, are the following:

  • workload_manager_events.log contains logs of the workload manager component
  • wmproxy.log contains logs of the wmproxy component
  • httpd-wmproxy-access.log wmproxy httpd access log
  • httpd-wmproxy-errors.log wmproxy httpd error log
  • glite-wms-wmproxy.restart.cron.log log of the /etc/cron.d/glite-wms-wmproxy.restart.cron cron
  • glite-wms-wmproxy-purge-proxycache.log log of the /etc/cron.d/glite-wms-wmproxy-purge-proxycache.cron cron
  • wmproxy_logrotate.log contains logs about the rotation of wmproxy httpd log files
  • renewal.log proxy renewal service log
  • logmonitor_events.log contains logs of the logmonitor component
  • jobcontoller_events.log contains logs of the jobcontroller component
  • ice.log contains logs of the ice component
  • glite-wms-purgeStorage.log log of the /etc/cron.d/glite-wms-purger.cron cron
For information about log files related to other services running on the wms node, please refer to Service Reference Card.

Network ports

Information about network ports is available in the Service Reference Card

Cron jobs

Information about cron jobs is available in the Service Reference Card

0.3 Security related operations

0.3.1 How to select specific VO resources

It can be useful to force the WMS to only select resources specific to a certain VO as the matchmaking time is consequntly reduced.

It can be enabled by providing ad additional ldap clause which will be added in the search filter at purchasing time.

The default search filter is:

(|(objectclass=gluecesebind)(objectclass=gluecluster)(objectclass=gluesubcluster)(objectclass=gluevoview)(objectclass=gluece))

The idea is to supply system administrators with the possibility to specify an additional ldap clause which will be added in logical AND to the latest two clauses of the default filter in order to match gluece/gluevoview objectclasses specific attributes.
To such an aim the configuration file supplies users with:

IsmIILDAPCEFilterExt for handling the additional search filter while purchasing information about CE from the BDII

As an example, by specifying the following:

IsmIILDAPCEFilterExt = "(|(GlueCEAccessControlBaseRule=VO:cms)(GlueCEAccessControlBaseRule=VOMS:/cms/*))"

the search filter during the purchasing would be:

(|(objectclass=gluecesebind)(objectclass=gluecluster)(objectclass=gluesubcluster)(&(|(objectclass=gluevoview)(objectclass=gluece))
(|(GlueCEAccessControlBaseRule=VO:cms)(GlueCEAccessControlBaseRule=VOMS:/cms/*))))

and thus the WMS would select only resources (i.e. CE/Views) belonging to CMS.

 -- FabioCapannini - 2011-04-28 \ No newline at end of file

Revision 152011-06-17 - FabioCapannini

Line: 1 to 1
 
META TOPICPARENT name="WebHome"
Changed:
<
<

System Administrator Guide for WMS for EMI-1 release

>
>

System Administrator Guide for WMS

 

Revision 142011-06-16 - FabioCapannini

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

System Administrator Guide for WMS for EMI-1 release

Line: 166 to 166
 
/etc/glite-wms/glite_wms.conf
/etc/glite-wms/glite_wms_wmproxy.gacl
/etc/glite-wms/glite_wms_wmproxy_httpd.conf
Added:
>
>
/etc/glite-wms/wmproxy_logrotate.conf /etc/glite-wms/.drain
 

0.0.1 glite_wms.conf

Line: 405 to 407
 
  • the lcmaps log: /var/log/glite/lcmaps.log;
  • the gridmapdir: /etc/grid-security/gridmapdir/;
  • the existing pool accounts for a VO or a VO group/role
Added:
>
>

0.0.1 glite_wms_wmproxy_httpd.conf

This file is a WMProxy specific configuration file configuring the HTTP daemon and Fast CGI

0.0.2 wmproxy_logrotate.conf

This file configures the logrotate tool, that performs the rotation of httpd-wmproxy-access and httpd-wmproxy-errors HTTPD daemon log files

0.0.3 .drain

This file is used to put the WMS in draining mode, so that it does not accept new submission requests but allows any other operations like output retrieval

 

0.1 Log files

The WMS log files, located under /var/log/wms, are the following:

Revision 132011-06-10 - FabioCapannini

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

System Administrator Guide for WMS for EMI-1 release

Line: 187 to 187
 In general there is no need to change this section.

DGUser: the user under which a WMS process runs

Added:
>
>
LBProxy: Boolean attribute to switch from LB and LBProxy. If the value of this attribute is true, LBProxy is used for logging and query operations about jobs
 

0.0.0.1 WorkloadManagerProxy section

Very important paramenters are those that configure the so called limiter ( OperationLoadScripts) used to inhibit submission in the case that some system load limits are hit.
Since the WMS and LB suggested deployment is to have them on two separate physical machines a foundamental parameter is also LBServer.

Added:
>
>
The relevant parameters available in this section are the following:

LogFile: String attribute containing the path of the WMProxy log file

LogLevel: Integer attribute containing a value from 0 to 6 (Optional). The integer value represents the WMProxy log file verbosity level: from 0 (fatal) to 6 (debug: maximum verbosity)

SandboxStagingPath: Root directory where job sandboxes are stored. It MUST be in the form: <DocumentRoot>/<single directory name>, where DocumentRoot is set as inside glite_wms_wmproxy_httpd.conf configuration file. The directory MUST be accessible by the user under which WMProxy is running (usually it is the "glite" user). The user running WMProxy is determined by the value of the environment variable GLITE_USER, if not differently set with User directive inside glite_wms_wmproxy_httpd.conf configuration file

ListMatchRootPath: Directory path where temporary pipes for list-match operations are created. The directory MUST be accessible by the user under which WMProxy is running (usually it is the "glite" user). The user running WMProxy is determined by the value of the environment variable GLITE_USER, if not differently set with User directive inside glite_wms_wmproxy_httpd.conf configuration file

GridFTPPort: Port number where gridFTP server is listening

MinPerusalTimeInterval: Integer value representing the time interval (in seconds) between two savings of job partial execution output. This attribute affects the WMProxy and other components behaviour only if perusal functionality are explicity requested by the user via the JDL, see EnableFilePerusal JDL attribute

LBServer: Address or list of addresses of the LB Server[s] to be used for storing job's information in the format of <host>[:<port>] (default value for port is 9000). This attribute is needed only if LB Server is not running in the WMProxy server host, or if more than one LB Servers must be used. Selection of the LB Server to use is made randomically from the list by the WMProxy, for any different service request. WMproxy maintains a list of weights associated to the available LB Servers so that failing LB Servers have decreasing probability of being selected. If the Service Discovery is enabled, the LB Servers found using the Service Discovery are added in the list.

Note that the following lines have same meaning:

LBServer = "ghemon.cnaf.infn.it:9000";
LBServer = {"ghemon.cnaf.infn.it:9000"};

WeightsCacheValidity: Time in seconds (n) indicating the validity of the weights (i.e. probability to be selected) associated to the available LB Servers. When last weights update (i.e. last received request) has occurred more than n seconds ago then the weights are restored to the same value for all LB Servers

LBLocalLogger: Address of LB Local Logger in the format of <host>[:<port>] (default value for port is 9002). This attribute is needed only if LB Local Logger runs on another host and LBProxy is not enabled

AsyncJobStart: Boolean attribute used to switch from synchronous/asynchronous job start behavior. When set to true, during job start operation the control is returned to user immediately after the request has been received, while the actual execution of the operation (that could be quite time consuming) is performed asynchronously

EnableServiceDiscovery: Boolean attribute to enable Service Discovery. If the value of this attribute is true, the Service Discovery is enabled, i.e. WMProxy invokes Service Discovery for finding available LB Servers

ServiceDiscoveryInfoValidity: Time in seconds (n) indicating the validity of the information provided by the Service Discovery. A call to Service Discovery for updated information is done every n seconds

LBServiceDiscoveryType: Type key for LB Servers to be discovered by Service Discovery

MaxServedRequests: Long attribute limiting the number of operation served by each WMProxy instance before exiting and releasing possibly allocated memory. This value is overriden by GLITE_WMS_WMPROXY_MAX_SERVED_REQUESTS environment variable, if set. This feature can be disabled by setting a lower-or-equal to zero value

OperationLoadScripts: ClassAd type attribute where an internal attribute can be specified for any WMProxy provided operation. The names of these attributes are equal to the names of the server operations (e.g. for jobSubmit operation the attribute name to use is "jobSubmit"). This internal attributes are used to provide the path and the name of the script to be executed to verify the load of the WMProxy server for any provided operation. If the server load is too high the requested operation is refused. The path and the name of the script can be followed by user defined options and parameters depending on the specific script needs for arguments.

WMProxy provide a load script that can be used for any of the provided operations. The template load script glite_wms_wmproxy_load_monitor.template is installed by the rpm file glite-wms-wmproxy in the directory ${WMS_LOCATION_SBIN}.

To call the script glite_wms_wmproxy_load_monitor, when the operation jobSubmit is requested, with the options:

--load1 10 --load5 10 --load15 10 --memusage 95 --diskusage 95 --fdnum 500

add the attribute:

OperationLoadScripts [
   jobSubmit = "${WMS_LOCATION_SBIN}/glite_wms_wmproxy_load_monitor 
      --oper jobSubmit --load1 10 --load5 10 --load15 10 
      --memusage 95 --diskusage 95 --fdnum 500";
]

Any kind of load script file can be used. If a user custom script is used, the only rule to follow is that the script exit value must be 0 in the case the operation can continue the execution, 1 in the opposite case (operation refused - Server load too high).
The script files must be executable and must have the proper access permissions

 

0.0.0.1 Workload Manager section

Important parameters in this section are:

Line: 298 to 339
 SubmitFileDir: Path of the directory where the submit files for condor are stored

OutputFileDir: Path of the directory where the standard error/output files of the jobs (e.g. the JobWrapper) are stored

Added:
>
>

0.0.0.1 ICE section

The parameters for ICE are available in the ICE Configuration Guide

 

0.0.0.1 Network Server section

Although the Network Server is no more installed on WMS nodes some configuration paramenters in its section of the global conf file are still needed.

Revision 122011-06-10 - FabioCapannini

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

System Administrator Guide for WMS for EMI-1 release

Line: 186 to 186
  In general there is no need to change this section.
Changed:
<
<
DGUser: the user under which a WMS process runs
>
>
DGUser: the user under which a WMS process runs
 

0.0.0.1 WorkloadManagerProxy section

Very important paramenters are those that configure the so called limiter ( OperationLoadScripts) used to inhibit submission in the case that some system load limits are hit.
Since the WMS and LB suggested deployment is to have them on two separate physical machines a foundamental parameter is also LBServer.

0.0.0.2 Workload Manager section

Changed:
<
<
Important parameters in this section are:
EnableBulkMM = true; //enable the bulk matchmaking for collection
IsmBlackList // allow to set a list of CEs that are banned
IsmUpdateRate = 600; // information supermarket update rate (in seconds)
WorkerThreads = 5; // enable the multithread for the WM component. Speed up the matchmaking process. 5 is a god compromise between machine load and speed.
>
>
Important parameters in this section are:
 
Added:
>
>
EnableBulkMM = true; //enable the bulk matchmaking for collection
IsmBlackList // allow to set a list of CEs that are banned
IsmUpdateRate = 600; // information supermarket update rate (in seconds)
WorkerThreads = 5; // enable the multithread for the WM component. Speed up the matchmaking process. 5 is a god compromise between machine load and speed.

CeForwardParameters: the parameters forwarded by the WM to the CE

CeMonitorAsynchPort: the port used to listen to notification arriving from CEMon's. A value of -1 means that listening is disabled

CeMonitorServices: the list of CEMon's the WM listens to

DispatcherType: the WM can read its input using different mechanisms. Currently supported types are "filelist" and "jobdir"

EnableRecovery: specifies if at startup the WM should perform a special recovery procedure for the requests that it finds already in its input.

ExpiryPeriod: the maximum time, expressed in seconds, a submitted job is kept in the overall system, from the time it arrives for the first time at the WM

Input: the input source of new requests. If DispatcherType is "filelist" the source is a file; if DispatcherType is "jobdir" the source is the base dir of a JobDir structure, which is supposed to be already in place when the WM starts. A JobDir structure consists of a base dir under which lie other three subdirectories, named tmp, new, old

IsmBlackList: a list of CEs that have to be excluded in the ISM

IsmDump: if the ISM dump is enabled, the dump, in ClassAd format, will be written to this file. In order to avoid file corruptions, the contents of a dump are built in a temporary file, whose name is the same value of this parameter with the prefix ".tmp|, which only at the end of the operation is renamed to the specified file

IsmIiPurchasingRate: the period between two ISM purchases from the BDII, in seconds

IsmThreads: All the threads releated to the ISM management are taken from the thread pool or created separately

IsmUpdateRate: the period between two updates of the ISM, in seconds. Note that conceptually purchasing just retrieves the list of available resources, whereas an ISM update gathers the resource information for each resource.

JobWrapperTemplateDir: the job wrapper sent to the CE and then executed on Worker Node is based on a bash template which is augmented by the WM with job-specific information. This is the location where all the templates - one at the moment - are stored

LogFile: the name of the file where messages are logged

LogLevel: each logging statement in the code specifies a log level. If that level is less than or equal to LogLevel the message is actually logged, otherwise it is ignored. The levels go from 1 (minimum verbosity) to 6 (maximum verbosity)

MatchRetryPeriod: once a job becomes pending, meaning that there are no resources available, this parameter represents the period between successive match-making attempts, in seconds

MaxOutputSandboxSize: the maximum size of the output sandbox, in bytes. The limit is currently enforced by the job wrapper running on the Worker Node, which doesn't upload more data than what specified here. If the value is -1 there is no limit.

MaxRetryCount: the system limit to the number of deep resubmissions for a job. The actual limit is the minimum between this value and the one specified in the job description

QueueSize: (def=1000) Size of the queue of events "ready" to be managed by the workers thread pool

RuntimeMalloc: allows to the use an alternative malloc library (examples are nedmalloc, google performance tools, ccmalloc), specifying the path to the shared object, to be loaded with LD_PRELOAD. Example: RuntimeMalloc = "/usr/lib64/libtcmalloc_minimal.so".

WorkerThreads: the number of request handler threads

 

0.0.0.1 LogMonitor section

Usually there is no need to change the default parameters with the exceptio of: RemoveJobFiles = true;
That by default is set to false.
Setting it to true will force condor to remove unused internal files when the job are in a final state.

Added:
>
>
The relevant parameters available in this section are the following:

LogFile: String attribute containing the path of the JobController log file

LogLevel: Log verbosity level. If that level is less than or equal to LogLevel the message is actually logged, otherwise it is ignored. The level goes from 1 (minimum verbosity) to 6 (maximum verbosity)

LockFile: Path of the lock file for the service

CondorLogDir: Path of the directory where LogMonitor stores the CondorG log files

CondorLogRecycleDir: Path of the directory where old CondorG log files are saved

JobsPerCondorLog: Max number of job logged in the same CondorG log file

GlobusDownTimeout: Log monitor waits this number of seconds before considering as failed a condor job which has lost contact with the CE and so resubmitting it if possible

MainLoopDuration: LogMonitor loops between the CondorLog files every this number of seconds

MonitorInternalDir: Path of the directory where LogMonitor stores its own files

IdRepositoryName: Name of the file containing pieces of information about the jobs used by LogMonitor and JobController

ExternalLogFile: Path of the directory where extra log files are stored

RemoveJobFiles: If sets to true all files used to submit jobs to condor are removed when they are no more necessary. Set it to "false" only for debug purpose. Files are stored in the SubmitFileDir directory as it is set in the JobController section

 

0.0.0.1 Job Controller section

Usually there is no need to change the default parameters.

Added:
>
>
The relevant parameters available in this section are the following:

LogFile: String attribute containing the path of the JobController log file

LogLevel: Log verbosity level. If that level is less than or equal to LogLevel the message is actually logged, otherwise it is ignored. The level goes from 1 (minimum verbosity) to 6 (maximum verbosity)

LockFile: Path of the lock file for the service

CondorSubmit: Path of the "condor_submit" command

CondorRemove: Path of the "condor_remove" command

CondorDagman: Path of the "condor_dagman" command

DagmanMaxPre: Sets the maximum number of PRE scripts within the DAG that may be running at one time; it is the "-maxpre" parameter of the condor_dagman command

MaximumTimeAllowedForCondorMatch: Sets the number of seconds that a job can wait in the condor queue to be matched before being resubmitted

ContainerRefreshThreshold: Number of jobs that JobController can take in memory before resyncronizing its container with the one phisically saved in the file "IdRepositoryName" (see LM section)

InputType: The JobController can read its input using different mechanisms. Currently supported types are "filelist" and "jobdir"

Input: The input source of new requests. If InputType is "filelist" the source is a file; if InputType is "jobdir" the source is the base dir of a JobDir structure, which is supposed to be already in place when the JobController starts. A JobDir structure consists of a base dir under which lie other three subdirectories, named tmp, new, old

SubmitFileDir: Path of the directory where the submit files for condor are stored

OutputFileDir: Path of the directory where the standard error/output files of the jobs (e.g. the JobWrapper) are stored

 

0.0.0.1 Network Server section

Although the Network Server is no more installed on WMS nodes some configuration paramenters in its section of the global conf file are still needed.

Revision 112011-06-08 - FabioCapannini

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

System Administrator Guide for WMS for EMI-1 release

Line: 118 to 118
 
/opt/glite/yaim/bin/yaim -c -s <site-info.def> -n WMS
Changed:
<
<

0.0.1 Configuration of the CREAM CLI

>
>

0.0.1 Configuration of the WMS CLI

 
Changed:
<
<
The CREAM CLI is part of the EMI-UI. To configure it please refer to xxx.
>
>
The WMS CLI is part of the EMI-UI. To configure it please refer to xxx.
 

1 Operating the system

Line: 170 to 170
 

0.0.1 glite_wms.conf

Changed:
<
<
This is the general configuration file for the WMS. It is organised in sections: one for every running service plus a common section.
>
>
This is the general configuration file for the WMS. The syntax is based on the ClassAd language. The parameter names are case insensitive. It is organised in sections: one for every running service plus a common section.
[
    Common = [...];
    JobController = [...];
    LogMonitor = [...];
    NetworkServer = [...];
    WorkloadManager = [...];
    WorkloadManagerProxy = [...];
    ICE = [...]
]
 
Added:
>
>
The value of a parameter can be expressed in terms of environment variables, with the typical UNIX shell syntax: a $ sign followed by the name of the variable in brackets (e.g. ${HOME}).
 

0.0.0.1 Common section

In general there is no need to change this section.

Added:
>
>
DGUser: the user under which a WMS process runs
 

0.0.0.1 WorkloadManagerProxy section

Very important paramenters are those that configure the so called limiter ( OperationLoadScripts) used to inhibit submission in the case that some system load limits are hit.
Since the WMS and LB suggested deployment is to have them on two separate physical machines a foundamental parameter is also LBServer.

Revision 102011-06-08 - FabioCapannini

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

System Administrator Guide for WMS for EMI-1 release

Line: 208 to 208
 
<pre><gacl version='0.0.1'>
 <entry>
   <voms>
Changed:
<
<
ops/Role=NULL
>
>
/ops/Role=NULL/Capability=NULL
 
Line: 216 to 216
 
Changed:
<
<
dteam/Role=NULL
>
>
/dteam/Role=NULL/Capability=NULL
 
Line: 225 to 225
 
Changed:
<
<
This file can also contain the DNs of single users that are allow to use the WMS resources. The following entry in the .gacl file will allow Daniele Cesini to use the WMS even if he is not in the VOs allowed to use the WMS:
>
>
There must be an exact match between the fqan expressed in the gacl file and the one in the user proxy.

The gacl file can also contain the DNs of single users that are allow to use the WMS resources. The following entry in the .gacl file will allow Daniele Cesini to use the WMS even if he is not in the VOs allowed to use the WMS:

 
<pre><entry>
      <user>
        <dn>/C=IT/O=INFN/OU=Personal Certificate/L=CNAF/CN=Daniele Cesini/Email=daniele.cesini@cnaf.infn.it/</dn>

Revision 92011-05-23 - FabioCapannini

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

System Administrator Guide for WMS for EMI-1 release

Line: 252 to 252
 
  • the lcmaps log: /var/log/glite/lcmaps.log;
  • the gridmapdir: /etc/grid-security/gridmapdir/;
  • the existing pool accounts for a VO or a VO group/role
Added:
>
>

0.1 Log files

The WMS log files, located under /var/log/wms, are the following:

  • workload_manager_events.log contains logs of the workload manager component
  • wmproxy.log contains logs of the wmproxy component
  • httpd-wmproxy-access.log wmproxy httpd access log
  • httpd-wmproxy-errors.log wmproxy httpd error log
  • glite-wms-wmproxy.restart.cron.log log of the /etc/cron.d/glite-wms-wmproxy.restart.cron cron
  • glite-wms-wmproxy-purge-proxycache.log log of the /etc/cron.d/glite-wms-wmproxy-purge-proxycache.cron cron
  • wmproxy_logrotate.log contains logs about the rotation of wmproxy httpd log files
  • renewal.log proxy renewal service log
  • logmonitor_events.log contains logs of the logmonitor component
  • jobcontoller_events.log contains logs of the jobcontroller component
  • ice.log contains logs of the ice component
  • glite-wms-purgeStorage.log log of the /etc/cron.d/glite-wms-purger.cron cron
 -- FabioCapannini - 2011-04-28

Revision 82011-05-20 - FabioCapannini

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

System Administrator Guide for WMS for EMI-1 release

Line: 180 to 180
  Very important paramenters are those that configure the so called limiter ( OperationLoadScripts) used to inhibit submission in the case that some system load limits are hit.
Since the WMS and LB suggested deployment is to have them on two separate physical machines a foundamental parameter is also LBServer.
Added:
>
>

0.0.0.1 Workload Manager section

Important parameters in this section are:
EnableBulkMM = true; //enable the bulk matchmaking for collection
IsmBlackList // allow to set a list of CEs that are banned
IsmUpdateRate = 600; // information supermarket update rate (in seconds)
WorkerThreads = 5; // enable the multithread for the WM component. Speed up the matchmaking process. 5 is a god compromise between machine load and speed.

0.0.0.2 LogMonitor section

Usually there is no need to change the default parameters with the exceptio of: RemoveJobFiles = true;
That by default is set to false.
Setting it to true will force condor to remove unused internal files when the job are in a final state.

0.0.0.3 Job Controller section

Usually there is no need to change the default parameters.

0.0.0.4 Network Server section

Although the Network Server is no more installed on WMS nodes some configuration paramenters in its section of the global conf file are still needed.

The important parameters in this section are those regardig the contact with the information system. In particular:

  • II_Contact = "egee-bdii.cnaf.infn.it"; set the hostname of the bdii to be contacted
  • II_Port = 2170; set the port on which the bdii is contacted
  • Gris_DN = "mds-vo-name=local, o=grid"; set the path where the bdii is publishing information
  • II_Timeout = 100; Set the timeout for the bdii query. It is important that this value is not too small, it is very dangerous if many bdii queries fail for timeout reasons. The risk is that all the information on the InformationSupermarker expire making all jobs in Waiting Status not to match any CE (they remain in Waiting Status for a long time, until a query to the bdii is successful). By default that value is set to 30, but 100 is a safer.
  • MaxInputSandboxSize = 10000000; this puts a limit in the dimension of the input sandbox of the jdl. Units are byte

0.0.1 glite_wms_wmproxy.gacl

WMS User Authentication is performed by the WMProxy component based on a GACL module.
The fundamental file used to manages the WMS authentication is the /etc/glite-wms/glite_wms_wmproxy.gacl file.
This file contains the name of the VO that are allowed to use the WMS. A .gacl file example that allows the dteam and ops VOs is the following:

<pre><gacl version='0.0.1'>
 <entry>
   <voms>
     <fqan>ops/Role=NULL</fqan>
   </voms>
   <allow>
     <exec/>
   </allow>
 </entry>
 <entry>
   <voms>
     <fqan>dteam/Role=NULL</fqan>
   </voms>
   <allow>
     <exec/>
   </allow>
 </entry>
</pre>

This file can also contain the DNs of single users that are allow to use the WMS resources. The following entry in the .gacl file will allow Daniele Cesini to use the WMS even if he is not in the VOs allowed to use the WMS:

<pre><entry>
      <user>
        <dn>/C=IT/O=INFN/OU=Personal Certificate/L=CNAF/CN=Daniele Cesini/Email=daniele.cesini@cnaf.infn.it/</dn>
      </user>
      <allow><exec/></allow>
    </entry>
</pre>

An entry with a DENY tag can be used to ban users or VOs:

<pre><entry>
      <voms>
        <fqan>/dteam/Role=admin/Capability=NULL</fqan>
      </voms>
      <deny><exec/></deny>
    </entry>
</pre>

As the previous examples shows, it is possible to allow/ban users and VOs on the basis of their FQAN (i.e. those returned by the voms-proxy-info --fqan command).

User Mapping on a WMS node is done through lcmaps as in any other gLite services, so fundamental places to look in case of mapping problems are:

  • the gridmap file: /etc/grid-security/grid-mapfile;
  • the lcmaps log: /var/log/glite/lcmaps.log;
  • the gridmapdir: /etc/grid-security/gridmapdir/;
  • the existing pool accounts for a VO or a VO group/role
 -- FabioCapannini - 2011-04-28 \ No newline at end of file

Revision 72011-05-11 - FabioCapannini

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

System Administrator Guide for WMS for EMI-1 release

Line: 146 to 146
 
  • glite-wms-wm:
     /opt/glite/bin/glite-wms-workload_manager (pid 9957) is running...
  • glite-wms-wmproxy:
    WMProxy httpd listening on port 7443
    httpd (pid 22223 22222 22221 22220 22219 22218 22217) is running ....
    ===
    WMProxy Server running instances:
    UID PID PPID C STIME TTY TIME CMD
  • glite-wms-ice:
    /opt/glite/bin/glite-wms-ice-safe (pid 10103) is running...
Added:
>
>

0.1 Init scripts

The init scripts are located under /etc/init.d and are the following:

/etc/init.d/globus-gridftp
/etc/init.d/glite-wms-wmproxy
/etc/init.d/glite-wms-wm
/etc/init.d/glite-wms-lm
/etc/init.d/glite-wms-jc
/etc/init.d/glite-wms-ice
/etc/init.d/glite-proxy-renewald
/etc/init.d/glite-lb-locallogger
/etc/init.d/glite-lb-bkserverd

0.2 Configuration Files

The configuration files are located under /etc/glite-wms and are the following:

/etc/glite-wms/glite_wms.conf
/etc/glite-wms/glite_wms_wmproxy.gacl
/etc/glite-wms/glite_wms_wmproxy_httpd.conf

0.2.1 glite_wms.conf

This is the general configuration file for the WMS. It is organised in sections: one for every running service plus a common section.

0.2.1.1 Common section

In general there is no need to change this section.

0.2.1.2 WorkloadManagerProxy section

Very important paramenters are those that configure the so called limiter ( OperationLoadScripts) used to inhibit submission in the case that some system load limits are hit.
Since the WMS and LB suggested deployment is to have them on two separate physical machines a foundamental parameter is also LBServer.

 -- FabioCapannini - 2011-04-28

Revision 62011-05-06 - FabioCapannini

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

System Administrator Guide for WMS for EMI-1 release

Line: 112 to 112
 
  • $PX_HOST -> the hostname of a server myproxy, ex.: 'myproxy.$MY_DOMAIN'
  • $BDII_HOST -> the hostname of the site bdii to be used, ex: 'sitebdii.$MY_DOMAIN'
  • $LB_HOST -> the hostname of the LB server to be used, ex: 'lb-server.$MY_DOMAIN:9000' This variable is set as a service specific variable in the file services/glite-wms, located one directory below the one where the 'site-info.def' file is locate
Changed:
<
<

0.0.0.1 Run yaim

>
>

0.0.0.1 Run yaim

  After having filled the siteinfo.def file, run yaim:
/opt/glite/yaim/bin/yaim -c -s <site-info.def> -n WMS
Added:
>
>

0.0.1 Configuration of the CREAM CLI

The CREAM CLI is part of the EMI-UI. To configure it please refer to xxx.

1 Operating the system

1.1 How to start the WMS service

A system administrator can start the WMS service by issuing:

service gLite start

A system administrator can stop the WMS service by issuing:

service gLite stop

1.2 Daemons

Scripts to check the daemons status and to start/stop are located in the ${GLITE_WMS_LOCATION}/etc/init.d/ directory (i.e. ${GLITE_WMS_LOCATION}/etc/init.d/glite-wms-wm start/stop/status). Glite production installation also provide a more generic service, called gLite, to manage all of them simultaneously, try service gLite status/start/stop On a typical WMS node the following services must be running:

  • glite-lb-locallogger:
     glite-lb-logd running
    glite-lb-interlogd running
  • glite-lb-proxy:
     glite-lb-proxy running as 4137
  • glite-proxy-renewald:
     glite-proxy-renewd running
  • globus-gridftp:
     globus-gridftp-server (pid 3107) is running...
  • glite-wms-jc:
     JobController running in pid: 10008
    CondorG master running in pid: 10063 10062
    CondorG schedd running in pid: 10070
  • glite-wms-lm:
     Logmonitor running...
  • glite-wms-wm:
     /opt/glite/bin/glite-wms-workload_manager (pid 9957) is running...
  • glite-wms-wmproxy:
    WMProxy httpd listening on port 7443
    httpd (pid 22223 22222 22221 22220 22219 22218 22217) is running ....
    ===
    WMProxy Server running instances:
    UID PID PPID C STIME TTY TIME CMD
  • glite-wms-ice:
    /opt/glite/bin/glite-wms-ice-safe (pid 10103) is running...
 -- FabioCapannini - 2011-04-28

Revision 52011-04-29 - FabioCapannini

Line: 1 to 1
 
META TOPICPARENT name="WebHome"
Changed:
<
<
+- WMS System Administrator Guide
>
>

System Administrator Guide for WMS for EMI-1 release

1 Installation and configuration

1.1 Prerequisites

1.1.1 Operating System

A standard x86_64 SL(C)5 distribution is supposed to be properly installed. An EPEL repository must be installed on the machine.

1.1.2 Node synchronization

A general requirement for the Grid nodes is that they are synchronized. This requirement may be fulfilled in several ways. One of the most common one is using the NTP protocol with a time server.

1.1.3 Cron and logrotate

Many components deployed on the WMS rely on the presence of cron (including support for /etc/cron.* directories) and logrotate. You should make sure these utils are available on your system.

1.2 Installation

1.2.1 Repositories

For a successful installation, you will need to configure your package manager to reference a number of repositories (in addition to your OS);

  • the EPEL repository
  • the EMI middleware repository
  • the CA repository
and to REMOVE () or DEACTIVATE (!!!)

  • the DAG repository

1.2.1.1 The EPEL repository

You can install the EPEL repository, issuing:

rpm -Uvh http://download.fedora.redhat.com/pub/epel/5/i386/epel-release-5-4.noarch.rpm

1.2.1.2 The EMI middleware repository


The EMI-1 RC4 repository can be found under:

http://emisoft.web.cern.ch/emisoft/dist/EMI/1/RC4/sl5/x86_64

To use yum, the yum repo to be installed in /etc/yum.repos.d can be found at https://twiki.cern.ch/twiki/pub/EMI/EMI-1/rc4.repo

1.2.1.3 The Certification Authority repository

The most up-to-date version of the list of trusted Certification Authorities (CA) is needed on your node. The relevant yum repo can be installed issuing:

wget http://repository.egi.eu/sw/production/cas/1/current/repo-files/egi-trustanchors.repo -O /etc/yum.repos.d/egi-trustanchors.repo

1.2.1.4 Important note on automatic updates

An update of an RPM not followed by configuration can cause problems. Therefore WE STRONGLY RECOMMEND NOT TO USE AUTOMATIC UPDATE PROCEDURE OF ANY KIND.


Running the script available at http://forge.cnaf.infn.it/frs/download.php/101/disable_yum.sh (implemented by Giuseppe Platania (INFN Catania) yum autoupdate will be disabled

1.2.2 Installation of a WMS node

First of all, install the yum-protectbase rpm:

yum install yum-protectbase.noarch

Then proceed with the installation of the CA certificates.

1.2.2.1 Installation of the CA certificates

The CA certificate can be installed issuing:

yum install ca-policy-egi-core 

1.2.2.2 Installation of the WMS software

Install the WMS metapackage:

yum install emi-wms

1.3 Configuration

1.3.1 Using the YAIM configuration tool


For a detailed description on how to configure the middleware with YAIM, please check the YAIM guide.


The necessary YAIM modules needed to configure a certain node type are automatically installed with the middleware.

1.3.2 Configuration of a WMS node

1.3.2.1 Install host certificate


The WMS node requires the host certificate/key files to be installed. Contact your national Certification Authority (CA) to understand how to obtain a host certificate if you do not have one already.


Once you have obtained a valid certificate:

  • hostcert.pem - containing the machine public key
  • hostkey.pem - containing the machine private key
make sure to place the two files in the target node into the /etc/grid-security directory. Then set the proper mode and ownerships doing:
chown root.root /etc/grid-security/hostkey.pem

chmod 600 /etc/grid-security/hostcert.pem

chmod 400 /etc/grid-security/hostkey.pem

chown root.root /etc/grid-security/hostcert.pem

1.3.2.2 Configure the siteinfo.def file

Set your siteinfo.def file, which is the input file used by yaim. The yaim variables relevant for WMS are the following:

  • $WMS_HOST -> the WMS hostname, ex. : 'egee-rb-01.$MY_DOMAIN'
  • $PX_HOST -> the hostname of a server myproxy, ex.: 'myproxy.$MY_DOMAIN'
  • $BDII_HOST -> the hostname of the site bdii to be used, ex: 'sitebdii.$MY_DOMAIN'
  • $LB_HOST -> the hostname of the LB server to be used, ex: 'lb-server.$MY_DOMAIN:9000' This variable is set as a service specific variable in the file services/glite-wms, located one directory below the one where the 'site-info.def' file is locate

1.3.2.3 Run yaim

After having filled the siteinfo.def file, run yaim:

/opt/glite/yaim/bin/yaim -c -s <site-info.def> -n WMS
  -- FabioCapannini - 2011-04-28 \ No newline at end of file

Revision 42011-04-28 - FabioCapannini

Line: 1 to 1
 
META TOPICPARENT name="WebHome"
Added:
>
>
+- WMS System Administrator Guide
 -- FabioCapannini - 2011-04-28 \ No newline at end of file

Revision 32011-04-28 - TWikiAdminUser

Line: 1 to 1
 
META TOPICPARENT name="WebHome"
-- FabioCapannini - 2011-04-28
Deleted:
<
<

  • Set DENYTOPICVIEW =
  • Set ALLOWTOPICVIEW =
  • Set DENYTOPICCHANGE =
  • Set ALLOWTOPICCHANGE = FabioCapannini
  • Set DENYTOPICRENAME =
  • Set ALLOWTOPICRENAME = FabioCapannini

Revision 22011-04-28 - TWikiAdminUser

Line: 1 to 1
 
META TOPICPARENT name="WebHome"
-- FabioCapannini - 2011-04-28
Added:
>
>

  • Set DENYTOPICVIEW =
  • Set ALLOWTOPICVIEW =
  • Set DENYTOPICCHANGE =
  • Set ALLOWTOPICCHANGE = FabioCapannini
  • Set DENYTOPICRENAME =
  • Set ALLOWTOPICRENAME = FabioCapannini

Revision 12011-04-28 - TWikiAdminUser

Line: 1 to 1
Added:
>
>
META TOPICPARENT name="WebHome"
-- FabioCapannini - 2011-04-28
 
This site is powered by the TWiki collaboration platformCopyright © 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback