Tags:
,
view all tags
%TOC% ---++ glite_wms_R_3_1_100 ---+++ WMPROXY * Command =glite-wms-job-info --jdl= doesn't work with nodes of a collection [[https://savannah.cern.ch/bugs/?38689][#38689]] <verbatim> [ale@cream-15 UI]$ glite-wms-job-info --jdl https://devel17.cnaf.infn.it:9000/RAeQ1KgqrjHK4MYzB8t8Hw Connecting to the service https://devel19.cnaf.infn.it:7443/glite_wms_wmproxy_server Error - WMProxy Server Error Unable to read file: /var/glite/SandboxDir/RA/https_3a_2f_2fdevel17.cnaf.infn.it_3a9000_2fRAeQ1KgqrjHK4MYzB8t8Hw/JDLToStart (please contact server administrator) Method: getJDL </verbatim> * Sometimes wmproxy returns this error message submitting a job: <verbatim> Connection failed: HTTP Error <html><head> <title>500 Internal Server Error</title> </head><body> <h1>Internal Server Error</h1> <p>The server encountered an internal error or misconfiguration and was unable to complete your request.</p> <p>Please contact the server administrator, [no address given] and inform them of the time the error occurred, and anything you might have done that may have caused the error.</p> <p>More information about this error may be available in the server error log.</p> </body></html> Error code: SOAP-ENV:Server </verbatim> * I submit a job with a proxy of a VO which is not enabled on the WMS; the error message is not so clear: <verbatim> Warning - Unable to delegate the credential to the endpoint: https://devel19.cnaf.infn.it:7443/glite_wms_wmproxy_server Unknown Soap fault Method: Soap Error </verbatim> * BUG [[https://savannah.cern.ch/bugs/?23443][#23443]]: %RED%NOT FIXED%ENDCOLOR% * Required documents are not put into the glite doc template in edms * For the R6 (JDL howto) document a broken link is given * Submitting a "parametric" jobs it always failed with this error: <verbatim> ************************************************************* BOOKKEEPING INFORMATION: Status info for the Job : https://devel17.cnaf.infn.it:9000/wzOz1QuyFAEJ5JnwXHUhng Current Status: Aborted Status Reason: requirements: unable to complete the operation: the attribute has not been initialised yet Submitted: Wed Jun 25 16:25:03 2008 CEST ************************************************************* </verbatim> To help debugging: <verbatim> 27 Jun, 13:42:53 -D- PID: 5326 - "wmpgsoapoperations::ns1__jobStart": ------------------------------- Fault Description -------------------------------- 27 Jun, 13:42:53 -D- PID: 5326 - "wmpgsoapoperations::ns1__jobStart": Method: jobStart 27 Jun, 13:42:53 -D- PID: 5326 - "wmpgsoapoperations::ns1__jobStart": Code: 1502 27 Jun, 13:42:53 -D- PID: 5326 - "wmpgsoapoperations::ns1__jobStart": Description: requirements: unable to complete the operation: the attribute has not been initialised yet 27 Jun, 13:42:53 -D- PID: 5326 - "wmpgsoapoperations::ns1__jobStart": Stack: 27 Jun, 13:42:53 -D- PID: 5326 - "wmpgsoapoperations::ns1__jobStart": AdEmptyException: requirements: unable to complete the operation: the attribute has not been initialised yet at Ad::delAttribute(const string& attr_name)[Ad.cpp:180] 27 Jun, 13:42:53 -D- PID: 5326 - "wmpgsoapoperations::ns1__jobStart": at JobAd::delAttribute(const string& attr_name)[JobAd.cpp:292] 27 Jun, 13:42:53 -D- PID: 5326 - "wmpgsoapoperations::ns1__jobStart": at submit()[wmpcoreoperations.cpp:1936] 27 Jun, 13:42:53 -D- PID: 5326 - "wmpgsoapoperations::ns1__jobStart": at jobStart()[wmpcoreoperations.cpp:1105] 27 Jun, 13:42:53 -D- PID: 5326 - "wmpgsoapoperations::ns1__jobStart": ---------------------------------------------------------------------------------- </verbatim> * The status of a collection (or a dag) is corretly set to CLEARED after a "glite-wms-get-output", but its nodes remain in "Done (Success)" status. <verbatim> ************************************************************* BOOKKEEPING INFORMATION: Status info for the Job : https://devel17.cnaf.infn.it:9000/xe-J0mjeFjU86p2mstn8-A Current Status: Cleared Status Reason: user retrieved output sandbox Destination: dagman Submitted: Wed Jun 25 16:08:43 2008 CEST ************************************************************* - Nodes information for: Status info for the Job : https://devel17.cnaf.infn.it:9000/6NeMTtefAmTCRI0N1waKzA Current Status: Done (Success) Logged Reason(s): - Job terminated successfully Exit code: 0 Status Reason: Job terminated successfully Destination: ce-01.roma3.infn.it:2119/jobmanager-lcgpbs-cert Submitted: Wed Jun 25 16:08:43 2008 CEST ************************************************************* Status info for the Job : https://devel17.cnaf.infn.it:9000/OpPpNrh1EYNwPhjQQaVazQ Current Status: Done (Success) Logged Reason(s): - Job terminated successfully Exit code: 0 Status Reason: Job terminated successfully Destination: atlasce01.na.infn.it:2119/jobmanager-lcgpbs-cert Submitted: Wed Jun 25 16:08:43 2008 CEST ************************************************************* </verbatim> * One collection over 60 submissions remains in "Waiting" status due to a LB problem (its nodes are in status "Submitted"); this are the wmproxy logs: <verbatim> 23 Jun, 18:58:18 -D- PID: 24834 - "WMPEventlogger::isAborted": Quering LB Proxy... 23 Jun, 18:58:18 -D- PID: 24834 - "WMPEventlogger::isAborted": LBProxy is enabled Unable to query LB and LBProxy edg_wll_QueryEvents[Proxy] Exit code: 4 LB[Proxy] Error: Interrupted system call (edg_wll_plain_read()) 23 Jun, 18:58:18 -D- PID: 24834 - "wmpcoreoperations::jobStart": Logging LOG_ENQUEUE_FAIL 23 Jun, 18:58:18 -D- PID: 24834 - "WMPEventlogger::logEvent": Logging to LB Proxy... 23 Jun, 18:58:18 -D- PID: 24834 - "WMPEventlogger::logEvent": Logging Enqueue FAIL event... 23 Jun, 18:58:19 -D- PID: 24834 - "WMPEventlogger::testAndLog": LB call succeeded 23 Jun, 18:58:19 -D- PID: 24834 - "wmpcoreoperations::jobStart": Removing lock... 23 Jun, 18:58:19 -I- PID: 24834 - "wmpgsoapoperations::ns1__jobStart": jobStart operation completed 23 Jun, 18:58:19 -I- PID: 24834 - "wmproxy::main": -------- Exiting Server Instance ------- 23 Jun, 18:58:19 -I- PID: 24834 - "wmproxy::main": Signal code received: 15 23 Jun, 18:58:19 -I- PID: 24834 - "wmproxy::main": ---------------------------------------- </verbatim> * There is this warning in wmproxy submitting collections: <verbatim> 19 Jun, 14:58:32 -W- PID: 2746 - "wmpcoreoperations::submit": Unable to find SDJRequirements in configuration file </verbatim> * Fond a problem in the wmproxy. After some collections submissions it stops working and with the top command you can see that the glite-wms-proxy processes are using all the CPU (It seems that the problem is related with the porting to gsoap 2.7.10): <verbatim> [Thu Jun 19 15:43:50 2008] [warn] FastCGI: (dynamic) server "/opt/glite/bin/glite_wms_wmproxy_server" (pid 7938) terminated due to uncaught signal '11' (Segmentation fault) </verbatim> ---+++ WM * If you remove a pending node the request is not removed from the "old" directory of the jobdir: <verbatim> /var/glite/workload_manager/jobdir/old: total 16 -rw-r--r-- 1 glite glite 250 Jul 9 13:37 20080709T113710.627592_3086920576 -rw-r--r-- 1 glite glite 250 Jul 9 13:37 20080709T113711.225739_3086920576 -rw-r--r-- 1 glite glite 250 Jul 9 13:37 20080709T113711.794866_3086920576 -rw-r--r-- 1 glite glite 250 Jul 9 13:37 20080709T113712.361657_3086920576 </verbatim> * After the recovery of a collection the "pending" nodes are forgotten. * BUG [[https://savannah.cern.ch/bugs/?27215][#27215]]: %GREEN%FIXED%ENDCOLOR% (see also [[https://savannah.cern.ch/bugs/?38359][#38359]] %GREEN%FIXED%ENDCOLOR%) * If the sum of the dimensions of the first two files is exactly equal to the limit, the job stays in state RUNNING for ever and ever. * BUG [[https://savannah.cern.ch/bugs/?38509][#38509]]: The WM's recovery procedure hangs if no relevant events are found for a given request * BUG [[https://savannah.cern.ch/bugs/?38366][#38366]]: Recovery doesn't work with a list-match request. %GREEN%FIXED%ENDCOLOR% <verbatim> 25 Jun, 15:31:50 -I: [Info] operator()(dispatcher.cpp:754): Dispatcher: starting 25 Jun, 15:31:50 -I: [Info] operator()(dispatcher.cpp:757): Dispatcher: doing recovery 25 Jun, 15:31:50 -I: [Info] operator()(RequestHandler.cpp:389): RequestHandler: starting 25 Jun, 15:31:50 -I: [Info] operator()(recovery.cpp:613): recovering https://localhost:6000/4xvi9qZqMbrNgQeuzOIGbA 25 Jun, 15:31:50 -I: [Info] operator()(RequestHandler.cpp:389): RequestHandler: starting 25 Jun, 15:31:50 -I: [Info] operator()(RequestHandler.cpp:389): RequestHandler: starting 25 Jun, 15:31:50 -I: [Info] operator()(RequestHandler.cpp:389): RequestHandler: starting 25 Jun, 15:31:50 -I: [Info] operator()(RequestHandler.cpp:389): RequestHandler: starting 25 Jun, 15:31:50 -I: [Info] main(main.cpp:292): running... 25 Jun, 15:31:50 -W: [Warning] single_request_recovery(recovery.cpp:317): cannot create LB context (2) 25 Jun, 15:31:50 -E: [Error] operator()(dispatcher.cpp:779): Dispatcher: cannot create LB context (2). Exiting... 25 Jun, 15:31:50 -I: [Info] operator()(RequestHandler.cpp:467): RequestHandler: End of input. Exiting... 25 Jun, 15:31:50 -I: [Info] operator()(RequestHandler.cpp:467): RequestHandler: End of input. Exiting... 25 Jun, 15:31:50 -I: [Info] operator()(RequestHandler.cpp:467): RequestHandler: End of input. Exiting... 25 Jun, 15:31:50 -I: [Info] operator()(RequestHandler.cpp:467): RequestHandler: End of input. Exiting... 25 Jun, 15:31:50 -I: [Info] operator()(RequestHandler.cpp:467): RequestHandler: End of input. Exiting... 25 Jun, 15:31:50 -I: [Info] main(main.cpp:295): TaskQueue destroyed </verbatim> ---+++ JC/LM ---+++ ICE ---+++ LB * The status of the collection is not correctly computed * glite-lb-interlogd doesn't stop (you need a "kill -9") ---+++ Configuration * Add glite-wms-ice at the metapackage dependencies * Remove "rgma" fron the enviroment variable GLITE_SD_PLUGIN * Add "globus gridftp" at the gliteservices list (=gLiteservices=) * Remove from the metapackage these rpms: * checkpointing * partitioner * interactive (and thirdparty-bypass) * rgma-* (and service-discovery-rgma-c) * glite-lb-interlogd doesn't stop (you need a "kill -9") * Modify the cron purger file <verbatim> # Execute the 'purger' command at every day except on Sunday with a frequency of six hours 3 */6 * * mon-sat glite . /opt/glite/etc/profile.d/grid-env.sh ; /opt/glite/sbin/glite-wms-purgeStorage.sh -l /var/log/glite/glite-wms-purgeStorage.log -p /var/glite/SandboxDir -t 604800 > /dev/null 2>&1 # Execute the 'purger' command on each Sunday (sun) forcing removal of dag nodes, # orphan dag nodes without performing any status checking (threshold of 2 weeks). 0 1 * * sun glite . /opt/glite/etc/profile.d/grid-env.sh ; /opt/glite/sbin/glite-wms-purgeStorage.sh -l /var/log/glite/glite-wms-purgeStorage.log -p /var/glite/SandboxDir -o -s -n -t 1296000 > /dev/null 2>&1 </verbatim> * Set the ICE configuration section * Remove =IsmBlackList= parameter * Set =jobdir= as the default for the DispatcherType parameter of wm * Add the parameter =EnableRecovery = true;= ti the wm configuration section * Add this cron to create the host proxy: <verbatim> 0 */6 * * * glite . /opt/glite/etc/profile.d/grid-env.sh ; /opt/glite/sbin/glite-wms-create-proxy.sh /var/glite/wms.proxy /var/log/glite/create_proxy.log </verbatim> * Rotate the logs of the purgeStorage * Check bugs [[https://savannah.cern.ch/bugs/?35244][#35244]] * BUG [[https://savannah.cern.ch/bugs/?30900][#30900]]: default value on WMS conf file is again: MinPerusalTimeInterval = 5; ---+++ Other * On script glite-wms-purgeStorage.sh the command glite-wms-create-proxy.sh has to be called with its path: <verbatim> [glite@devel19 ~]$ /opt/glite/sbin/glite-wms-purgeStorage.sh -l /var/log/glite/glite-wms-purgeStorage.log -p /var/glite/SandboxDir -t 302400 Certificate will expire /opt/glite/sbin/glite-wms-purgeStorage.sh: line 27: glite-wms-create-proxy.sh: command not found </verbatim> * Probably on /opt/glite/sbin/glite-wms-create-proxy.sh it is better to use the =--force= option for command =mv=, instead the =move= commands should be blocked here: <verbatim> [glite@devel19 ~]$ mv /var/glite/wms.proxy.14913 /var/glite/wms.proxy mv: overwrite `/var/glite/wms.proxy', overriding mode 0400? </verbatim> -- Main.AlessioGianelle - 20 Jun 2008
Edit
|
Attach
|
PDF
|
H
istory
:
r25
|
r17
<
r16
<
r15
<
r14
|
B
acklinks
|
V
iew topic
|
More topic actions...
Topic revision: r15 - 2008-07-11
-
AlessioGianelle
Home
Site map
CEMon web
CREAM web
Cloud web
Cyclops web
DGAS web
EgeeJra1It web
Gows web
GridOversight web
IGIPortal web
IGIRelease web
MPI web
Main web
MarcheCloud web
MarcheCloudPilotaCNAF web
Middleware web
Operations web
Sandbox web
Security web
SiteAdminCorner web
TWiki web
Training web
UserSupport web
VOMS web
WMS web
WMSMonitor web
WeNMR web
EgeeJra1It Web
Create New Topic
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
P
View
Raw View
Print version
Find backlinks
History
More topic actions
Edit
Raw edit
Attach file or image
Edit topic preference settings
Set new parent
More topic actions
Account
Log In
Edit
Attach
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback