Tags:
, view all tags

The WMS configuration file

The behavior of most of the processes running on the WMS node is driven by parameters set in a common configuration file, usually /opt/glite/etc/glite_wms.conf. Currently the syntax is based on the ClassAd language. The parameter names are case insensitive.

The file contains multiple sections, one per service plus a common one:

[
    Common = [...];
    JobController = [...];
    LogMonitor = [...];
    NetworkServer = [...];
    WorkloadManager = [...];
    WorkloadManagerProxy = [...];
    ICE = [...]
]

The value of a parameter can be expressed in terms of environment variables, with the typical UNIX shell syntax: a $ sign followed by the name of the variable in brackets (e.g. ${HOME}).

The following paragraphs describe the parameters available for each component. If a default value exists, this is shown in parenthesis at the end of the description.

Common configuration

LogFile
the name of the file where messages are logged
LogLevel
each logging statement in the code specifies a log level. If that level is less than or equal to LogLevel the message is actually logged, otherwise it is ignored. The levels go from 1 (minimum verbosity) to 6 (maximum verbosity)
DGUser
the user under which a WMS process runs

The parameters in the Common section may be overridden in the specific component sections. This usually happens with LogFile and LogLevel.

WorkloadManager configuration

The attributes available in this section, in alphabetical order, are:

BrokerLib
the library implementing the brokering functionality. What is specified here is loaded with dlopen(). ("libglite_wms_helper_broker_ism.so")
CeForwardParameters
the parameters forwarded by the WM to the CE
CeMonitorAsynchPort
the port used to listen to notification arriving from CEMon's. A value of -1 means that listening is disabled (-1)
CeMonitorServices
the list of CEMon's the WM listens to
DisablePurchasingFromGris
??? (false)
DispatcherType
the WM can read its input using different mechanisms. Currently supported types are "filelist" and "jobdir" ("filelist")
DliServiceName
??? ("data-location-interface")
EnableBulkMM
specifies if bulk matchmaking, i.e. matching multiple similar jobs in a collection in one shot, should be applied (false)
EnableIsmDump
specifies if a periodic dump of the ISM should be done. It also specifies if, at startup, the WM should read a dump produced during a previous run, if available (true)
EnablePurchasingFromRgma
purchase CE information from RGMA (false)
EnableRecovery
specifies if at startup the WM should perform a special recovery procedure for the requests that it finds already in its input. The recovery procedure is currently not reliable, so it should be disabled (false)
EnableStatusCheck
specifies if, for each request the WM reads from input, it should check the status of the request (e.g. for a submit the only acceptable status is WAITING). As for the recovery, this check is not very reliable, so it should be disabled (false)
ExpiryPeriod
the maximum time, expressed in seconds, a submitted job is kept in the overall system, from the time it arrives for the first time at the WM (86400, i.e. one day)
Input
the input source of new requests. If DispatcherType is "filelist" the source is a file; if DispatcherType is "jobdir" the source is the base dir of a JobDir structure, which is supposed to be already in place when the WM starts. A JobDir structure consists of a base dir under which lie other three subdirectories, named tmp, new, old ("${EDG_WL_TMP}/workload_manager/input.fl")
IsmBlackList
a list of CEs that have to be excluded in the ISM
IsmCEMonAsynchPurchasingRate
??? (30)
IsmCEMonPurchasingRate
??? (120)
IsmDump
if the ISM dump is enabled, the dump, in ClassAd format, will be written to this file. In order to avoid file corruptions, the contents of a dump are built in a temporary file, whose name is the same value of this parameter with the prefix ".tmp|, which only at the end of the operation is renamed to the specified file ("${GLITE_WMS_TMP}/workload_manager/ismdump.fl" - but it's not in filelist format!)
IsmDumpRate
the period between two ISM dumps, in seconds. The default value is way too short (50)
IsmIiPurchasingRate
the period between two ISM purchases from the BDII, in seconds (240)
IsmRgmaPurchasingRate
the period between two ISM purchases from RGMA, in seconds (120)
IsmUpdateRate
the period between two updates of the ISM, in seconds. Note that conceptually purchasing just retrieves the list of available resources, wheres an ISM update gathers the resource information for each resource. The default value is too short (50)
JobWrapperTemplateDir
the job wrapper sent to the CE and then executed on Worker Node is based on a bash template which is augmented by the WM with job-specific information. This is the location where all the templates - one at the moment - are stored ("${GLITE_WMS_LOCATION}/etc/templates")
LogFile
the name of the file where messages are logged
LogLevel
each logging statement in the code specifies a log level. If that level is less than or equal to LogLevel the message is actually logged, otherwise it is ignored. The levels go from 1 (minimum verbosity) to 6 (maximum verbosity)
MatchRetryPeriod
once a job becomes pending, meaning that there are no resources available, this parameter represents the period between successive match-making attempts, in seconds (1000)
MaxOutputSandboxSize
the maximum size of the output sandbox, in bytes. The limit is currently enforced by the job wrapper running on the Worker Node, which doesn't upload more data than what specified here. If the value is -1 there is no limit. Currently the mechanism doesn't work well, so it is suggested to set this parameter to -1 (100000000)
MaxRetryCount
the system limit to the number of deep resubmissions for a job. The actual limit is the minimum between this value and the one specified in the job description (10)
MaxShallowRetryCount
the system limit to the number of shallow resubmissions for a job. The actual limit is the minimum between this value and the one specified in the job description (10)
PboxHostName
the host where a G-PBox service runs
PboxPortNum
the port on PboxHostName where the G-PBox service listens (6699)
PboxSafeMode
??? (false)
PipeDepth
the WM internally is structured with one dispatcher thread and one or more request handlers, communicating through a bounded queue. This parameter specifies the upper bound to the size of that queue (10)
RgmaConsumerLifeCycle
??? (30)
RgmaConsumerTtl
??? (300)
RgmaQueryTimeout
??? (30)
SiServiceName
??? ("org.glite.SEIndex")
TokenFile
the shallow resubmission mechanism works by removing an empty file on the gridftp server running on the WSM machine from the job wrapper running on a Worker Node. This parameter specifies the name of that file ("token.txt")
WorkerThreads
the number of request handler threads (10)

JobController configuration

LogMonitor configuration

NetworkServer configuration

WorkloadManagerProxy configuration

The attributes available in this section, divided by significant groups are:

Logging:

LogFile
String attribute containing the path of the WMProxy log file (Optional)
LogLevel
Integer attribute containing a value from 0 to 6 (Optional). The integer value represents the WMProxy log file verbosity level: from 0 (fatal) to 6 (debug: maximum verbosity)

Sandbox:

SandboxStagingPath
Root directory where job sandboxes are stored. It MUST be in the form: /, where DocumentRoot is set as inside glite_wms_wmproxy_httpd.conf configuration file. The directory MUST be accessible by the user under which WMProxy is running (usually it is the "glite" user). The user running WMProxy is determined by the value of the environment variable GLITE_USER, if not differently set with User directive inside glite_wms_wmproxy_httpd.conf configuration file
MaxInputSandboxSize
Maximum number of bytes for input sandboxes on a per-job basis (Optional - default value is 10000000). This attribute, even if optional, SHOULD be properly set (if quota are not set for users on the WMS node) according to the storage capacity of WMS node in order to avoid filling up of the WMS disk. NOTE: this value is a per job one.

List Match:

$ ListMatchRootPath - Directory path where temporary pipes for list-match operations are created. The directory MUST be accessible by the user under which WMProxy is running (usually it is the "glite" user). The user running WMProxy is determined by the value of the environment variable GLITE_USER, if not differently set with User directive inside glite_wms_wmproxy_httpd.conf configuration file (Optional - default value is /tmp)

Ports:

HTTPSPort
The HTTPS port where the WMProxy service is listening
GridFTPPort
Port number where gridFTP server is listening (Optional - default value is gridFTP standard port 2811)
DefaultProtocol
The protocol used for input sandbox file transfering. Currently supported protocols are gsiftp and https. (Optional - default value is gsiftp).

Perusal:

MinPerusalTimeInterval
Integer value representing the time interval (in seconds) between two savings of job partial execution output. This attribute affects the WMProxy and other componets behaviour only if perusal functionality are explicity requested by the user via the JDL, see EnableFilePerusal JDL attribute (Optional - default value is 10 seconds)

LB:

LBProxy
Boolean attribute to switch from LB and LBProxy. If the value of this attribute is true, LBProxy is used by WMProxy for logging and query operations about jobs (Optional - default value is true)
LBServer
Address or list of addresses of the LB Server[s] to be used for storing job's information in the format of [:] (default value for port is 9000). This attribute is needed only if LB Server is not running in the WMProxy server host, or if more than one LB Servers must be used. Selection of the LB Server to use is made randomically from the list by the WMProxy, for any different service request. WMproxy maintains a list of weights associated to the available LB Servers so that failing LB Servers have decreasing probability of being selected. If the Service Discovery is enabled, the LB Servers found using the Service Discovery are added in the list.

Note that the following lines have same meaning:

LBServer = "ghemon.cnaf.infn.it:9000"; LBServer = {"ghemon.cnaf.infn.it:9000"};

WeightsCacheValidity
Time in seconds (n) indicating the validity of the weights (i.e. probability to be selected) associated to the available LB Servers. When last weights update (i.e. last received request) has occurred more than n seconds ago then the weights are restored to the same value for all LB Servers (Optional - default value is 21600 seconds)
WeightsCachePath
Location of the directory on the WMProxy node where the LB Servers weights file is stored (Optional - default value is directory /var/glite/wmproxy)
LBLocalLogger
Address of LB Local Logger in the format of [:] (default value for port is 9002). This attribute is needed only if LB Local Logger runs on another host and LBProxy is not enabled

Job Start Options:

AsyncJobStart
Boolean attribute used to switch from synchronous/asynchronous job start behavior. When set to true, during job start operation the control is returned to user immediately after the request has been received, while the actual execution of the operation (that could be quite time consuming) is performed asynchronously

Requirements:

SDJRequirements
This attribute contains an expression to be ended with the standard Requirements attribute in the case of Short Deadline Jobs (SDJ). If JDL attribute ShortDeadlineJob is set to true, the SDJRequirements expression is ended as it is, otherwise NOT operator is previously applied. Default value is RegExp("*sdj$", other.GlueCEUniqueID) in order to target queues specifically configured for SDJ jobs

Service Discovery:

EnableServiceDiscovery
Boolean attribute to enable Service Discovery. If the value of this attribute is true, the Service Discovery is enabled, i.e. WMProxy invokes Service Discovery for finding available LB Servers
ServiceDiscoveryInfoValidity
Time in seconds (n) indicating the validity of the information provided by the Service Discovery. A call to Service Discovery for updated information is done every n seconds.
LBServiceDiscoveryType
Type key for LB Servers to be discovered by Service Discovery (Optional - default value is org.glite.lb.server)

Served Requests:

MaxServedRequests
Long attribute limiting the number of operation served by each WMProxy instance before exiting and releasing possibly allocated memory. This value is overriden by GLITE_WMS_WMPROXY_MAX_SERVED_REQUESTS environment variable, if set. This feature can be disabled by setting a lower-or-equal to zero value. (Optional - default value is 100 requests, minimum allowed value is 40)

Load Scripts:

OperationLoadScripts
ClassAd type attribute where an internal attribute can be specified for any WMProxy provided operation. The names of these attributes are equal to the names of the server operations (e.g. for jobSubmit operation the attribute name to use is "jobSubmit"). This internal attributes are used to provide the path and the name of the script to be executed to verify the load of the WMProxy server for any provided operation. If the server load is too high the requested operation is refused. The path and the name of the script can be followed by user defined options and parameters depending on the specific script needs for arguments.

WMProxy provide a load script that can be used for any of the provided operations. The template load script glite_wms_wmproxy_load_monitor.template is installed by the rpm file glite-wms-wmproxy in the directory ${GLITE_LOCATION}/sbin.

To call the script glite_wms_wmproxy_load_monitor, when the operation jobSubmit is requested, with the options:

--load1 10 --load5 10 --load15 10 --memusage 95 --diskusage 95 --fdnum 500

add the attribute:

OperationLoadScripts [
   jobSubmit = "${GLITE_LOCATION}/sbin/glite_wms_wmproxy_load_monitor 
      --oper jobSubmit --load1 10 --load5 10 --load15 10 
      --memusage 95 --diskusage 95 --fdnum 500";
]
Any kind of load script file can be used. If a user custom script is used, the only rule to follow is that the script exit value must be 0 in the case the operation can continue the execution, 1 in the opposite case (operation refused - Server load too high).

The script files must be executable and must have the proper access permissions.

ICE configuration

-- FrancescoGiacomini - 29 Oct 2007

Edit | Attach | PDF | History: r15 | r5 < r4 < r3 < r2 | Backlinks | Raw View | More topic actions...
Topic revision: r3 - 2007-10-30 - AlessandroMaraschini
 
  • Edit
  • Attach
This site is powered by the TWiki collaboration platformCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback