Difference: WMS_guide (1 vs. 11)

Revision 102009-05-22 - ElisabettaMolinari

Line: 1 to 1
 

The Workload Management System Admin Guide

Introduction

Line: 71 to 71
 This directory on a heavly used WMS can become quite big, on the order of tens of GB. Old rotated log files should be manually removed. Following the default log files that can be found on a typical WMS:
Changed:
<
<
  • wmproxy.log Used in case of authentication or submisison error
>
>
  • wmproxy.log Used in case of authentication or submission error
 
  • workload_manager_events.log Used to check the status of the matchmaking process (from Waiting to Ready status) and the query to the information system to fill in the InformationSuperMarket
Line: 107 to 107
 
  • II_timeout: default by yaim is set to 30, increase it for low-memory machines. 4 GB is the minimum suggested for memory
  • MatchRetryPeriod: once a job becomes pending, meaning that there are no resources available, this parameter represents the period between successive match-making attempts, in seconds. The default value is '1000', in order to decrease the number of periodic retries of unmatched jobs the value of this parameter should be increased. A suggested value (used on several production wms) is several hours, '14400'
Added:
>
>
  • RuntimeMalloc: in the WM section, allows to use an alternative malloc library (i.e. nedmalloc, google performance tools and many more), run-time redirecting with LD_PRELOAD. Possible values are, for example, RuntimeMalloc? = "/usr/lib/libtcmalloc_minimal.so" if you use Google malloc.
 

Select only specific VO resources

Sometimes it could be usefull forcing the WMS to select only resources specific to given VO. This would obviously reduce the matchmaking time and can be achieved providing an additional ldap clause which will be added in the search filter at purchasing time. The default search filter used is:

Revision 92009-04-29 - ElisabettaMolinari

Line: 1 to 1
 

The Workload Management System Admin Guide

Introduction

Line: 61 to 61
 
/opt/glite/bin/glite-wms-ice-safe (pid 10103) is running...

File Systems/Directories

Added:
>
>
  • ${GLITE_LOCATION_VAR}/sandboxdir: it is where the job sandboxes are located, they are automatically purged when a get-output of a job is done. If a VO does not take care of getting the job output back from the WMS node SB dir can become a serious problem for the HD occupancy. Another situation in which the SB dir can become problematic is when a job (or a certain number of jobs) has a huge output sandbox and the control on the OSB is not enabled on the glite_wms.conf file (see WMProxy configuration parameter section). In any case it is a good habit to purge periodically the Sandbox Dir.
  • ${GLITE_LOCATION_VAR}/workloadmanager/: it is where the input file list or the jobdir are located depending on the value of 'DispatcherType' attribute in the 'glite_wms.conf' file
  • ${GLITE_LOCATION_VAR}/logmonitor/: this directory contains mainly condor log files
  • ${GLITE_LOCATION_VAR}/jocontrol/
 

Log files locations

Log files are located under ${GLITE_LOCATION_LOG}, typically being '/var/log/glite'.
Line: 97 to 101
 
  • ${GLITE_LOCATION}/etc/glite_wms_wmproxy.gacl: this file is used for authorization purposes in order to check for requesting user rights.
  • ${GLITE_LOCATION}/etc/glite_wms_wmproxy_httpd.conf: this file is a WMProxy specific configuration file configuring the HTTP daemon and Fast CGI
  • {GLITE_LOCATION}/etc/wmproxy_logrotate.conf: this file configures the logrotate tool, that performs the rotation of httpd-wmproxy-access and httpd-wmproxy-errors HTTPD daemon log files.
Changed:
<
<
>
>
  • ${GLITE_LOCATION_VAR}/.drain: this file is used to put the WMS in draining mode, so that it does not accept new submission request but allows any other operation as the output retrieval.
 

Tuning of some configuration parameters

  • II_timeout: default by yaim is set to 30, increase it for low-memory machines.
Line: 126 to 130
  and thus the WMS would select only resources (i.e. CE/Views) belonging to CMS.
Changed:
<
<
>
>

Cron Jobs

Several cron jobs are installed by yaim:
  • /etc/cron.d/glite-wms-purger.cron: it periodically purges the sandbox dirs performing a check on the job status
  • glite-wms-wmproxy-purge-proxycache.cron: expired proxies are purged by this cron job
  • fetch-crl: cron job that retrieves the CRLs periodically
 

Troubleshooting

Some common error messages and troublehooting operations that can be performed on a WMSLB istance are described here

Revision 82009-04-29 - ElisabettaMolinari

Line: 1 to 1
 

The Workload Management System Admin Guide

Introduction

Line: 90 to 90
 

Configuration Guide

Changed:
<
<
The general configuration file for the WMS is located in /opt/glite/etc/glite_wms.conf This file is organized in section, one for every running service plus a Common section For a general description of the glite_wms.conf configuration file, the configuration parameters and their default values see here: https://twiki.cnaf.infn.it/cgi-bin/twiki/view/EgeeJra1It/WMSConfFile
>
>
The general configuration file for the WMS is located in ${GLITE_LOCATION}/etc/glite_wms.conf.
This file is organized in sections, one for every running service plus a Common section. For a general description of the glite_wms.conf configuration file, the configuration parameters and their default values see here: https://twiki.cnaf.infn.it/cgi-bin/twiki/view/EgeeJra1It/WMSConfFile
Other configuration files useful to know for troubleshooting:
  • ${GLITE_LOCATION}/etc/glite_wms_wmproxy.gacl: this file is used for authorization purposes in order to check for requesting user rights.
  • ${GLITE_LOCATION}/etc/glite_wms_wmproxy_httpd.conf: this file is a WMProxy specific configuration file configuring the HTTP daemon and Fast CGI
  • {GLITE_LOCATION}/etc/wmproxy_logrotate.conf: this file configures the logrotate tool, that performs the rotation of httpd-wmproxy-access and httpd-wmproxy-errors HTTPD daemon log files.

 

Tuning of some configuration parameters

  • II_timeout: default by yaim is set to 30, increase it for low-memory machines. 4 GB is the minimum suggested for memory
Line: 123 to 129
 

Troubleshooting

Added:
>
>
Some common error messages and troublehooting operations that can be performed on a WMSLB istance are described here
 

WMS Monitoring

Changed:
<
<
>
>
A tool to monitor the glite-WMSLB istances is available, it has been developed and currently maintained by INFN. For an extensive description of the tool go here
 

Service Monitoring Guide

Revision 72009-04-16 - ElisabettaMolinari

Line: 1 to 1
Changed:
<
<

The Workload Management System Guide

>
>

The Workload Management System Admin Guide

 
Changed:
<
<

Installation Guide

Following are some instructions on how to install the latest WMS in certification using native linux installation tool yum and the configuration tool yaim

Update to a more recent version/patch

>
>

Introduction

Service Overview

The Workload Management System (WMS) comprises a set of Grid middleware components responsible for the distribution and management of tasks across Grid resources, in such a way that applications are conveniently, efficiently and effectively executed. Following the list of sub-services the WMS is composed of:

  • Workload Management – WM: Core component of the Workload Management, its purpose is to accept and satisfy requests for job management coming from its clients
  • WMProxy: Web service interface to submit jobs to the WM
  • Job Controller – JC: Acts as an interface to condor for the WM
  • Log Monitor – LM: Directly connected to JC acts as a job monitoring tool parsing condor log files
  • Local Logger: copy events to be sent to the LB server into a local disk file
  • LBProxy: keeps a local view of the job state to be sent to the LB server
  • Proxy Renewal: Service to renew the proxy of a long-running job
  • ICE: (Interface to CREAM Environment) is the WMS service dealing when interacting with CREAM based CEs.

Installation and configuration

Hardware Requirements

  • 4 GB RAM is the minimum suggested for memory
  • quad-core processor is recommended to better handle parallel matchmaking and all the different sub-services running on a WMS
  • Min disk space: depends on load and type of jobs submitted.
    • under '${GLITE_LOCATION_VAR}/sandboxdir' 30-40GB is the min on several wms used in production with the cron job to purge job sandboxes once a week enabled in order to accomodate submitted job sandbox dirs.

Install & Configure

  • Install and configure OS and basic services according to the https://twiki.cern.ch/twiki/bin/view/LCG/GenericInstallGuide310
  • Install the glite-WMS metapackage from the appropriate gLite software repository
  • Configure the WMS node by running '/opt/glite/yaim/bin/yaim -c -s site-info.def -n WMS' Following a list of the WMS specific variables that can be set in the 'site-info.def' file:
    • $WMS_HOST -> the WMS hostname, ex. : 'egee-rb-01.$MY_DOMAIN'
    • $PX_HOST -> the hostname of a server myproxy, ex.: 'myproxy.$MY_DOMAIN'
    • $BDII_HOST -> the hostname of the site bdii to be used, ex: 'sitebdii.$MY_DOMAIN'
    • $LB_HOST -> the hostname of the LB server to be used, ex: 'lb-server.$MY_DOMAIN:9000' This variable is set as a service specific variable in the file services/glite-wms, located one directory below the one where the 'site-info.def' file is located

Daemons and services running

Scripts to check the daemons status and to start/stop are located in the ${GLITE_WMS_LOCATION}/etc/init.d/ directory (i.e. ${GLITE_WMS_LOCATION}/etc/init.f/glite-wms-wm start/stop/status). Glite production installation also provide a more generic service, called gLite, to manage all of them simultaneously, try service gLite status/start/stop On a typical WMS node the following services must be running:
  • glite-lb-locallogger:
     glite-lb-logd running
           glite-lb-interlogd running
  • glite-lb-proxy:
     glite-lb-proxy running as 4137
  • glite-proxy-renewald:
     glite-proxy-renewd running
  • globus-gridftp:
     globus-gridftp-server (pid 3107) is running...
  • glite-wms-jc:
     JobController running in pid: 10008
           CondorG master running in pid: 10063 10062
           CondorG schedd running in pid: 10070
  • glite-wms-lm:
     Logmonitor running...
  • glite-wms-wm:
     /opt/glite/bin/glite-wms-workload_manager (pid 9957) is running...
  • glite-wms-wmproxy:
    WMProxy httpd listening on port 7443
    httpd (pid 22223 22222 22221 22220 22219 22218 22217) is running ....
    ===
    WMProxy Server running instances:
    UID        PID  PPID  C STIME TTY          TIME CMD
  • glite-wms-ice:
    /opt/glite/bin/glite-wms-ice-safe (pid 10103) is running...

File Systems/Directories

Log files locations

Log files are located under ${GLITE_LOCATION_LOG}, typically being '/var/log/glite'. This directory on a heavly used WMS can become quite big, on the order of tens of GB. Old rotated log files should be manually removed. Following the default log files that can be found on a typical WMS:

  • wmproxy.log Used in case of authentication or submisison error

  • workload_manager_events.log Used to check the status of the matchmaking process (from Waiting to Ready status) and the query to the information system to fill in the InformationSuperMarket

  • ice.log used to check jobs that matched a CREAM based CE and are sent to it via ICE

  • jobcontoller_events.log Used to check the jobs events once arrived on condor

  • httpd-wmproxy-errors.log Used in case of problems in contacting the WMProxy service

  • httpd-wmproxy-access.log
  • logmonitor_events.log Aggregate information about each job coming from various log files

  • glite-wms-wmproxy-purge-proxycache.log
  • lcmaps.log Used when there are problems in the mapping of remote users to local pool accounts

Other log files that can be useful in case of trouble are the condor log in:

  • /var/local/condor/log/
  • /var/glite/logmonitor/CondorG.log/
 

Configuration Guide

Changed:
<
<

Configuration files

Following are the configuration files for the main services running on a WMS node:
>
>
The general configuration file for the WMS is located in /opt/glite/etc/glite_wms.conf This file is organized in section, one for every running service plus a Common section For a general description of the glite_wms.conf configuration file, the configuration parameters and their default values see here: https://twiki.cnaf.infn.it/cgi-bin/twiki/view/EgeeJra1It/WMSConfFile
 

Tuning of some configuration parameters

  • II_timeout: default by yaim is set to 30, increase it for low-memory machines. 4 GB is the minimum suggested for memory
  • MatchRetryPeriod: once a job becomes pending, meaning that there are no resources available, this parameter represents the period between successive match-making attempts, in seconds. The default value is '1000', in order to decrease the number of periodic retries of unmatched jobs the value of this parameter should be increased. A suggested value (used on several production wms) is several hours, '14400'
Changed:
<
<
Select only specific VO resources
>
>

Select only specific VO resources

 Sometimes it could be usefull forcing the WMS to select only resources specific to given VO. This would obviously reduce the matchmaking time and can be achieved providing an additional ldap clause which will be added in the search filter at purchasing time. The default search filter used is:
Line: 45 to 120
  and thus the WMS would select only resources (i.e. CE/Views) belonging to CMS.
Added:
>
>

Troubleshooting

WMS Monitoring

 

Service Monitoring Guide

Revision 62008-05-13 - ElisabettaMolinari

Line: 1 to 1
 

The Workload Management System Guide

Installation Guide

Line: 17 to 17
 
Added:
>
>
  • '/opt/glite/etc/glite-lb-index.conf' LB indexes configuration file. In order to set different indexes, modify the glite-lb-index.conf configuration file and then run '$/opt/glite/bin/glite-lb-bkindex -r /opt/glite/etc/glite-lb-index.conf', where -r mean really perform reindexing
 

Tuning of some configuration parameters

  • II_timeout: default by yaim is set to 30, increase it for low-memory machines. 4 GB is the minimum suggested for memory

Revision 52008-05-13 - ElisabettaMolinari

Line: 1 to 1
 

The Workload Management System Guide

Installation Guide

Line: 16 to 16
 Following are the configuration files for the main services running on a WMS node:
Changed:
<
<
  • '/opt/glite/etc/lcmaps/lcmaps.db' and '/op/glite/etc/lcmaps/lcmaps.db.gridftp' are the lcmaps configuration files for wmproxy and gridftp respectively
>
>
  • '/opt/glite/etc/lcmaps/lcmaps.db' and '/op/glite/etc/lcmaps/lcmaps.db.gridftp' are the lcmaps configuration files for wmproxy and gridftp respectively. Currently after running yaim configuration script, they have to be synchronized as explained in bug https://savannah.cern.ch/bugs/?35244 It will be fixed in the near future
 

Tuning of some configuration parameters

  • II_timeout: default by yaim is set to 30, increase it for low-memory machines. 4 GB is the minimum suggested for memory

Revision 42008-05-13 - ElisabettaMolinari

Line: 1 to 1
 

The Workload Management System Guide

Installation Guide

Line: 12 to 12
 

Configuration Guide

Added:
>
>

Configuration files

Following are the configuration files for the main services running on a WMS node:
 

Tuning of some configuration parameters

Deleted:
<
<
For a general description of the glite_wms.conf configuration file, the configuration parameters and their defult values see here: https://twiki.cnaf.infn.it/cgi-bin/twiki/view/EgeeJra1It/WMS_configuration_file
 
  • II_timeout: default by yaim is set to 30, increase it for low-memory machines. 4 GB is the minimum suggested for memory
  • MatchRetryPeriod: once a job becomes pending, meaning that there are no resources available, this parameter represents the period between successive match-making attempts, in seconds. The default value is '1000', in order to decrease the number of periodic retries of unmatched jobs the value of this parameter should be increased. A suggested value (used on several production wms) is several hours, '14400'

Revision 32008-05-06 - ElisabettaMolinari

Line: 1 to 1
 

The Workload Management System Guide

Installation Guide

Line: 8 to 8
  Get it from here https://grid-deployment.web.cern.ch/grid-deployment/certification/repos/glite-WMS.repo
  • Run 'yum update' , 'yum install cert-glite-WMS' and 'yum install lcg-CA'
  • Run '/opt/glite/yaim/bin/yaim -c -s site-info.def -n WMS'
Added:
>
>

Update to a more recent version/patch

 

Configuration Guide

Tuning of some configuration parameters

For a general description of the glite_wms.conf configuration file, the configuration parameters and their defult values see here: https://twiki.cnaf.infn.it/cgi-bin/twiki/view/EgeeJra1It/WMS_configuration_file

Revision 22008-04-11 - SalvatoreMonforte

Line: 1 to 1
 

The Workload Management System Guide

Installation Guide

Line: 14 to 14
 
  • II_timeout: default by yaim is set to 30, increase it for low-memory machines. 4 GB is the minimum suggested for memory
  • MatchRetryPeriod: once a job becomes pending, meaning that there are no resources available, this parameter represents the period between successive match-making attempts, in seconds. The default value is '1000', in order to decrease the number of periodic retries of unmatched jobs the value of this parameter should be increased. A suggested value (used on several production wms) is several hours, '14400'
Added:
>
>
Select only specific VO resources
Sometimes it could be usefull forcing the WMS to select only resources specific to given VO. This would obviously reduce the matchmaking time and can be achieved providing an additional ldap clause which will be added in the search filter at purchasing time. The default search filter used is:

(|(objectclass=gluecesebind)(objectclass=gluecluster)(objectclass=gluesubcluster)(objectclass=gluevoview)(objectclass=gluece))

The idea is to supply system administrators with the possibility to specify an additional ldap clause which will be added in logical AND to the latest two clauses of the default filter in order to match gluece/gluevoview objectclasses specific attributes. To such an aim the configuration file supply users with:

  • IsmIILDAPCEFilterExt for handling the additional search filter while purchasing information about CE from the BDII

As an example, by specifying the following:

IsmIILDAPCEFilterExt = "(|(GlueCEAccessControlBaseRule=VO:cms)(GlueCEAccessControlBaseRule=VOMS:/cms/*))"

the search filter during the purchasing would be:

(|(objectclass=gluecesebind)(objectclass=gluecluster)(objectclass=gluesubcluster) (&(|(objectclass=gluevoview)(objectclass=gluece)) (|(GlueCEAccessControlBaseRule=VO:cms)(GlueCEAccessControlBaseRule=VOMS:/cms/*))))

and thus the WMS would select only resources (i.e. CE/Views) belonging to CMS.

 

Service Monitoring Guide

Revision 12008-04-11 - ElisabettaMolinari

Line: 1 to 1
Added:
>
>

The Workload Management System Guide

Installation Guide

Following are some instructions on how to install the latest WMS in certification using native linux installation tool yum and the configuration tool yaim

Configuration Guide

Tuning of some configuration parameters

For a general description of the glite_wms.conf configuration file, the configuration parameters and their defult values see here: https://twiki.cnaf.infn.it/cgi-bin/twiki/view/EgeeJra1It/WMS_configuration_file
  • II_timeout: default by yaim is set to 30, increase it for low-memory machines. 4 GB is the minimum suggested for memory
  • MatchRetryPeriod: once a job becomes pending, meaning that there are no resources available, this parameter represents the period between successive match-making attempts, in seconds. The default value is '1000', in order to decrease the number of periodic retries of unmatched jobs the value of this parameter should be increased. A suggested value (used on several production wms) is several hours, '14400'

Service Monitoring Guide

-- ElisabettaMolinari - 11 Apr 2008

 
This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback