glite_wms_R_3_1_100
WMPROXY - UI
- BUG: #41072
: After two consecutive fails due to error System load is too high
the glite_wms_job_submit command returns this error:
-----------------------------------------
22 July 2008, 12:31:19 -I- PID: 16522 (Debug) - Calling the WMProxy delegationns__getProxyReq service
-----------------------------------------
<?xml version="1.0" encoding="UTF-8"?>
<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:delegationns="http://www.gridsite.org/namespaces/delegation-1" xmlns:ns1="http://glite.org/wms/wmproxy"><SOAP-ENV:Body><delegationns:getProxyReq><delegationID>RSXb2eAtvje_8qidBLPN4g</delegationID></delegationns:getProxyReq></SOAP-ENV:Body></SOAP-ENV:Envelope>
- Sometimes wmproxy returns this error message submitting a job:
Connection failed: HTTP Error
<html><head>
<title>500 Internal Server Error</title>
</head><body>
<h1>Internal Server Error</h1>
<p>The server encountered an internal error or
misconfiguration and was unable to complete
your request.</p>
<p>Please contact the server administrator,
[no address given] and inform them of the time the error occurred,
and anything you might have done that may have
caused the error.</p>
<p>More information about this error may be available
in the server error log.</p>
</body></html>
Error code: SOAP-ENV:Server
To help debugging:
27 Jun, 13:42:53 -D- PID: 5326 - "wmpgsoapoperations::ns1__jobStart": ------------------------------- Fault Description --------------------------------
27 Jun, 13:42:53 -D- PID: 5326 - "wmpgsoapoperations::ns1__jobStart": Method: jobStart
27 Jun, 13:42:53 -D- PID: 5326 - "wmpgsoapoperations::ns1__jobStart": Code: 1502
27 Jun, 13:42:53 -D- PID: 5326 - "wmpgsoapoperations::ns1__jobStart": Description: requirements: unable to complete the operation: the attribute has not been initialised yet
27 Jun, 13:42:53 -D- PID: 5326 - "wmpgsoapoperations::ns1__jobStart": Stack:
27 Jun, 13:42:53 -D- PID: 5326 - "wmpgsoapoperations::ns1__jobStart": AdEmptyException: requirements: unable to complete the operation: the attribute has not been initialised yet
at Ad::delAttribute(const string& attr_name)[Ad.cpp:180]
27 Jun, 13:42:53 -D- PID: 5326 - "wmpgsoapoperations::ns1__jobStart": at JobAd::delAttribute(const string& attr_name)[JobAd.cpp:292]
27 Jun, 13:42:53 -D- PID: 5326 - "wmpgsoapoperations::ns1__jobStart": at submit()[wmpcoreoperations.cpp:1936]
27 Jun, 13:42:53 -D- PID: 5326 - "wmpgsoapoperations::ns1__jobStart": at jobStart()[wmpcoreoperations.cpp:1105]
27 Jun, 13:42:53 -D- PID: 5326 - "wmpgsoapoperations::ns1__jobStart": ----------------------------------------------------------------------------------
- One collection over 60 submissions remains in "Waiting" status due to a LB problem (its nodes are in status "Submitted"); this are the wmproxy logs: (Bug #40982
)
23 Jun, 18:58:18 -D- PID: 24834 - "WMPEventlogger::isAborted": Quering LB Proxy...
23 Jun, 18:58:18 -D- PID: 24834 - "WMPEventlogger::isAborted": LBProxy is enabled
Unable to query LB and LBProxy
edg_wll_QueryEvents[Proxy]
Exit code: 4
LB[Proxy] Error: Interrupted system call
(edg_wll_plain_read())
23 Jun, 18:58:18 -D- PID: 24834 - "wmpcoreoperations::jobStart": Logging LOG_ENQUEUE_FAIL
23 Jun, 18:58:18 -D- PID: 24834 - "WMPEventlogger::logEvent": Logging to LB Proxy...
23 Jun, 18:58:18 -D- PID: 24834 - "WMPEventlogger::logEvent": Logging Enqueue FAIL event...
23 Jun, 18:58:19 -D- PID: 24834 - "WMPEventlogger::testAndLog": LB call succeeded
23 Jun, 18:58:19 -D- PID: 24834 - "wmpcoreoperations::jobStart": Removing lock...
23 Jun, 18:58:19 -I- PID: 24834 - "wmpgsoapoperations::ns1__jobStart": jobStart operation completed
23 Jun, 18:58:19 -I- PID: 24834 - "wmproxy::main": -------- Exiting Server Instance -------
23 Jun, 18:58:19 -I- PID: 24834 - "wmproxy::main": Signal code received: 15
23 Jun, 18:58:19 -I- PID: 24834 - "wmproxy::main": ----------------------------------------
WM
it is not possible to cancel it:
**********************************************************************
LOGGING INFORMATION:
Printing info for the Job : https://devel17.cnaf.infn.it:9000/b9w6Uh1EPfWm6ZFHqhiioQ
---
Event: RegJob
- Source = NetworkServer
- Timestamp = Wed Jul 23 11:50:42 2008 CEST
---
Event: RegJob
- Source = NetworkServer
- Timestamp = Wed Jul 23 11:50:42 2008 CEST
---
Event: UserTag
- Source = NetworkServer
- Timestamp = Wed Jul 23 11:50:42 2008 CEST
---
Event: Accepted
- Source = NetworkServer
- Timestamp = Wed Jul 23 11:50:43 2008 CEST
---
Event: EnQueued
- Result = START
- Source = NetworkServer
- Timestamp = Wed Jul 23 11:50:43 2008 CEST
---
Event: EnQueued
- Result = OK
- Source = NetworkServer
- Timestamp = Wed Jul 23 11:50:43 2008 CEST
---
Event: Cancel
- Source = NetworkServer
- Timestamp = Wed Jul 23 12:31:13 2008 CEST
---
Event: Cancel
- Source = WorkloadManager
- Timestamp = Wed Jul 23 12:31:13 2008 CEST
---
Event: Cancel
- Source = JobController
- Timestamp = Wed Jul 23 12:31:15 2008 CEST
---
Event: Cancel
- Source = JobController
- Timestamp = Wed Jul 23 12:31:15 2008 CEST
**********************************************************************
- After the recovery of a collection the "pending" nodes are forgotten.
JC/LM
ICE
LB
- The status of the collection is not correctly computed
- glite-lb-interlogd doesn't stop (you need a "kill -9")
Configuration
- Add glite-wms-ice to the metapackage dependencies
- Add glite-info-provider-service to the metapackage dependencies
- Bug #39308
- Remove "rgma" fron the enviroment variable GLITE_SD_PLUGIN
- Add "globus gridftp" at the gliteservices list (
gLiteservices
)
- Modify the cron purger file
# Execute the 'purger' command at every day except on Sunday with a frequency of six hours
3 */6 * * mon-sat glite . /opt/glite/etc/profile.d/grid-env.sh ; /opt/glite/sbin/glite-wms-purgeStorage.sh -l /var/log/glite/glite-wms-purgeStorage.log -p /var/glite/SandboxDir -t 604800 > /dev/null 2>&1
# Execute the 'purger' command on each Sunday (sun) forcing removal of dag nodes,
# orphan dag nodes without performing any status checking (threshold of 2 weeks).
0 1 * * sun glite . /opt/glite/etc/profile.d/grid-env.sh ; /opt/glite/sbin/glite-wms-purgeStorage.sh -l /var/log/glite/glite-wms-purgeStorage.log -p /var/glite/SandboxDir -o -s -n -t 1296000 > /dev/null 2>&1
- Set the ICE configuration section
- Remove
IsmBlackList
parameter
- Set
jobdir
as the default for the DispatcherType parameter of wm
- Add the parameter
EnableRecovery = true;
ti the wm configuration section
- Add this cron to create the host proxy:
0 */6 * * * glite . /opt/glite/etc/profile.d/grid-env.sh ; /opt/glite/sbin/glite-wms-create-proxy.sh /var/glite/wms.proxy /var/log/glite/create_proxy.log
- Rotate the logs of the purgeStorage
- Remove from the metapackage these rpms:
- checkpointing
- partitioner
- interactive (and thirdparty-bypass)
- rgma-* (and service-discovery-rgma-c)
- Check bugs #35244
- BUG #30900
: default value on WMS conf file is again: MinPerusalTimeInterval = 5;
Other
- BUG #39215
- If you use in the cron-purger the option
-n
, no check is done against option -t
: e.g. nodes are purged also if they are in running state
- On script glite-wms-purgeStorage.sh the command glite-wms-create-proxy.sh has to be called with its path:
[glite@devel19 ~]$ /opt/glite/sbin/glite-wms-purgeStorage.sh -l /var/log/glite/glite-wms-purgeStorage.log -p /var/glite/SandboxDir -t 302400
Certificate will expire
/opt/glite/sbin/glite-wms-purgeStorage.sh: line 27: glite-wms-create-proxy.sh: command not found
- Probably on /opt/glite/sbin/glite-wms-create-proxy.sh it is better to use the
--force
option for command mv
, instead the move
commands should be blocked here:
[glite@devel19 ~]$ mv /var/glite/wms.proxy.14913 /var/glite/wms.proxy
mv: overwrite `/var/glite/wms.proxy', overriding mode 0400?
- Bug #40967
: Problems in script glite_wms_wmproxy_load_monitor
--
AlessioGianelle - 20 Jun 2008