Difference: WmsBugs3dot1dot100 (1 vs. 25)

Revision 252011-02-24 - AlessioGianelle

Line: 1 to 1
Changed:
<
<
META TOPICPARENT name="TestWokPlan"
>
>
META TOPICPARENT name="TestPage"
 

glite_wms_R_3_1_100

Revision 242008-10-03 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"
Line: 214 to 214
 
  • After the recovery of a collection the "pending" nodes are forgotten.
Deleted:
<
<
  • BUG #27215: FIXED (see also #38359 FIXED)
    • If the sum of the dimensions of the first two files is exactly equal to the limit, the job stays in state RUNNING for ever and ever.
 
Deleted:
<
<
  • BUG #38509: The WM's recovery procedure hangs if no relevant events are found for a given request
 
Deleted:
<
<
  • BUG #38366: Recovery doesn't work with a list-match request. FIXED
    25 Jun, 15:31:50 -I: [Info] operator()(dispatcher.cpp:754): Dispatcher: starting
    25 Jun, 15:31:50 -I: [Info] operator()(dispatcher.cpp:757): Dispatcher: doing recovery
    25 Jun, 15:31:50 -I: [Info] operator()(RequestHandler.cpp:389): RequestHandler: starting
    25 Jun, 15:31:50 -I: [Info] operator()(recovery.cpp:613): recovering https://localhost:6000/4xvi9qZqMbrNgQeuzOIGbA
    25 Jun, 15:31:50 -I: [Info] operator()(RequestHandler.cpp:389): RequestHandler: starting
    25 Jun, 15:31:50 -I: [Info] operator()(RequestHandler.cpp:389): RequestHandler: starting
    25 Jun, 15:31:50 -I: [Info] operator()(RequestHandler.cpp:389): RequestHandler: starting
    25 Jun, 15:31:50 -I: [Info] operator()(RequestHandler.cpp:389): RequestHandler: starting
    25 Jun, 15:31:50 -I: [Info] main(main.cpp:292): running...
    25 Jun, 15:31:50 -W: [Warning] single_request_recovery(recovery.cpp:317): cannot create LB context (2)
    25 Jun, 15:31:50 -E: [Error] operator()(dispatcher.cpp:779): Dispatcher: cannot create LB context (2). Exiting...
    25 Jun, 15:31:50 -I: [Info] operator()(RequestHandler.cpp:467): RequestHandler: End of input. Exiting...
    25 Jun, 15:31:50 -I: [Info] operator()(RequestHandler.cpp:467): RequestHandler: End of input. Exiting...
    25 Jun, 15:31:50 -I: [Info] operator()(RequestHandler.cpp:467): RequestHandler: End of input. Exiting...
    25 Jun, 15:31:50 -I: [Info] operator()(RequestHandler.cpp:467): RequestHandler: End of input. Exiting...
    25 Jun, 15:31:50 -I: [Info] operator()(RequestHandler.cpp:467): RequestHandler: End of input. Exiting...
    25 Jun, 15:31:50 -I: [Info] main(main.cpp:295): TaskQueue destroyed
    
 

JC/LM

Revision 232008-09-04 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"

glite_wms_R_3_1_100

WMPROXY - UI

Changed:
<
<
  • After two consecutive fails due to error System load is too high the glite_wms_job_submit command returns this error:
>
>
  • BUG: #41072: After two consecutive fails due to error System load is too high the glite_wms_job_submit command returns this error:
 
22 July 2008, 12:31:19 -I- PID: 16522 (Debug) - Calling the WMProxy delegationns__getProxyReq service

Revision 222008-09-03 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"
Line: 13 to 13
 <SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:delegationns="http://www.gridsite.org/namespaces/delegation-1" xmlns:ns1="http://glite.org/wms/wmproxy"><SOAP-ENV:Body>RSXb2eAtvje_8qidBLPN4g</SOAP-ENV:Body></SOAP-ENV:Envelope>
Deleted:
<
<
  • If you try a job-info of a parametrics node, you receive an Aborted signal:
    [ale@cream-15 src]$ glite-wms-job-info --jdl https://devel17.cnaf.infn.it:9000/xgTd_SELrAYKvwXShOuFXw
    terminate called after throwing an instance of 'glite::wms::wmproxyapi::BaseException*'
    Aborted
    
 
  • Command glite-wms-job-info --jdl doesn't work with nodes of a collection #38689
    [ale@cream-15 UI]$ glite-wms-job-info --jdl https://devel17.cnaf.infn.it:9000/RAeQ1KgqrjHK4MYzB8t8Hw
    
Line: 281 to 275
 
    • partitioner
    • interactive (and thirdparty-bypass)
    • rgma-* (and service-discovery-rgma-c)
Deleted:
<
<
  • glite-lb-interlogd doesn't stop (you need a "kill -9")
 

Revision 212008-09-03 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"

glite_wms_R_3_1_100

WMPROXY - UI

Changed:
<
<
  • After two consecutive fails due to error System load is too high the glite_wms_job_submit command returns this error:
>
>
  • After two consecutive fails due to error System load is too high the glite_wms_job_submit command returns this error:
 
22 July 2008, 12:31:19 -I- PID: 16522 (Debug) - Calling the WMProxy delegationns__getProxyReq service
Line: 14 to 13
 <SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:delegationns="http://www.gridsite.org/namespaces/delegation-1" xmlns:ns1="http://glite.org/wms/wmproxy"><SOAP-ENV:Body>RSXb2eAtvje_8qidBLPN4g</SOAP-ENV:Body></SOAP-ENV:Envelope>
Changed:
<
<
  • If you try a job-info of a parametrics node, you receive an Aborted signal:
>
>
  • If you try a job-info of a parametrics node, you receive an Aborted signal:
 [ale@cream-15 src]$ glite-wms-job-info --jdl https://devel17.cnaf.infn.it:9000/xgTd_SELrAYKvwXShOuFXw terminate called after throwing an instance of 'glite::wms::wmproxyapi::BaseException*' Aborted
Changed:
<
<
  • Command glite-wms-job-info --jdl doesn't work with nodes of a collection #38689
>
>
  • Command glite-wms-job-info --jdl doesn't work with nodes of a collection #38689
 [ale@cream-15 UI]$ glite-wms-job-info --jdl https://devel17.cnaf.infn.it:9000/RAeQ1KgqrjHK4MYzB8t8Hw

Connecting to the service https://devel19.cnaf.infn.it:7443/glite_wms_wmproxy_server

Line: 36 to 32
 Method: getJDL
Changed:
<
<
  • Sometimes wmproxy returns this error message submitting a job:
>
>
  • Sometimes wmproxy returns this error message submitting a job:
 Connection failed: HTTP Error 500 Internal Server Error
Line: 57 to 52
 Error code: SOAP-ENV:Server
Changed:
<
<
  • I submit a job with a proxy of a VO which is not enabled on the WMS; the error message is not so clear:
>
>
  • I submit a job with a proxy of a VO which is not enabled on the WMS; the error message is not so clear:
 Warning - Unable to delegate the credential to the endpoint: https://devel19.cnaf.infn.it:7443/glite_wms_wmproxy_server Unknown Soap fault Method: Soap Error
Changed:
<
<
  • BUG #23443: NOT FIXED
    • Required documents are not put into the glite doc template in edms
    • For the R6 (JDL howto) document a broken link is given

  • BUG #39217. Submitting a "parametric" jobs it always failed with this error:
>
>
  • BUG #39217. Submitting a "parametric" jobs it always failed with this error:
 *********************************************************** BOOKKEEPING INFORMATION:
Line: 80 to 69
 ***********************************************************
Changed:
<
<
To help debugging:
>
>
To help debugging:
 27 Jun, 13:42:53 -D- PID: 5326 - "wmpgsoapoperations::ns1__jobStart": ------------------------------- Fault Description -------------------------------- 27 Jun, 13:42:53 -D- PID: 5326 - "wmpgsoapoperations::ns1__jobStart": Method: jobStart 27 Jun, 13:42:53 -D- PID: 5326 - "wmpgsoapoperations::ns1__jobStart": Code: 1502
Line: 96 to 83
 27 Jun, 13:42:53 -D- PID: 5326 - "wmpgsoapoperations::ns1__jobStart": ----------------------------------------------------------------------------------
Changed:
<
<
  • The status of a collection (or a dag) is corretly set to CLEARED after a "glite-wms-get-output", but its nodes remain in "Done (Success)" status.
>
>
  • The status of a collection (or a dag) is corretly set to CLEARED after a "glite-wms-get-output", but its nodes remain in "Done (Success)" status. (Bug #40951)
 *********************************************************** BOOKKEEPING INFORMATION:
Line: 130 to 116
 ***********************************************************
Changed:
<
<
  • One collection over 60 submissions remains in "Waiting" status due to a LB problem (its nodes are in status "Submitted"); this are the wmproxy logs:
 
>
>
  • One collection over 60 submissions remains in "Waiting" status due to a LB problem (its nodes are in status "Submitted"); this are the wmproxy logs: (Bug #40982)
     
 23 Jun, 18:58:18 -D- PID: 24834 - "WMPEventlogger::isAborted": Quering LB Proxy... 23 Jun, 18:58:18 -D- PID: 24834 - "WMPEventlogger::isAborted": LBProxy is enabled Unable to query LB and LBProxy
Line: 151 to 136
 23 Jun, 18:58:19 -I- PID: 24834 - "wmproxy::main": ----------------------------------------
Changed:
<
<
  • There is this warning in wmproxy submitting collections:
>
>
  • There is this warning in wmproxy submitting collections:
 19 Jun, 14:58:32 -W- PID: 2746 - "wmpcoreoperations::submit": Unable to find SDJRequirements in configuration file
Changed:
<
<
  • Fond a problem in the wmproxy. After some collections submissions it stops working and with the top command you can see that the glite-wms-proxy processes are using all the CPU (It seems that the problem is related with the porting to gsoap 2.7.10):
>
>
  • Fond a problem in the wmproxy. After some collections submissions it stops working and with the top command you can see that the glite-wms-proxy processes are using all the CPU (It seems that the problem is related with the porting to gsoap 2.7.10):
 [Thu Jun 19 15:43:50 2008] [warn] FastCGI: (dynamic) server "/opt/glite/bin/glite_wms_wmproxy_server" (pid 7938) terminated due to uncaught signal '11' (Segmentation fault)

WM

Changed:
<
<
  • If a job is in the WM's queue:
>
>
  • If a job is in the WM's queue:
 *********************************************************** BOOKKEEPING INFORMATION:
Line: 175 to 157
 ***********************************************************
Changed:
<
<
it is not possible to cancel it:
>
>
it is not possible to cancel it:
 ******************************************************************** LOGGING INFORMATION:
Line: 229 to 209
 ********************************************************************
Changed:
<
<
  • If you remove a pending node the request is not removed from the "old" directory of the jobdir:
>
>
  • If you remove a pending node the request is not removed from the "old" directory of the jobdir:
 /var/glite/workload_manager/jobdir/old: total 16 -rw-r--r-- 1 glite glite 250 Jul 9 13:37 20080709T113710.627592_3086920576
Line: 246 to 225
 
  • BUG #38509: The WM's recovery procedure hangs if no relevant events are found for a given request
Changed:
<
<
  • BUG #38366: Recovery doesn't work with a list-match request. FIXED
>
>
  • BUG #38366: Recovery doesn't work with a list-match request. FIXED
 25 Jun, 15:31:50 -I: [Info] operator()(dispatcher.cpp:754): Dispatcher: starting 25 Jun, 15:31:50 -I: [Info] operator()(dispatcher.cpp:757): Dispatcher: doing recovery 25 Jun, 15:31:50 -I: [Info] operator()(RequestHandler.cpp:389): RequestHandler: starting
Line: 282 to 260
 
  • Bug #39308
    • Remove "rgma" fron the enviroment variable GLITE_SD_PLUGIN
    • Add "globus gridftp" at the gliteservices list (gLiteservices)
Changed:
<
<
    • Modify the cron purger file
>
>
    • Modify the cron purger file
 # Execute the 'purger' command at every day except on Sunday with a frequency of six hours 3 */6 * * mon-sat glite . /opt/glite/etc/profile.d/grid-env.sh ; /opt/glite/sbin/glite-wms-purgeStorage.sh -l /var/log/glite/glite-wms-purgeStorage.log -p /var/glite/SandboxDir -t 604800 > /dev/null 2>&1
Line: 295 to 272
 
    • Remove IsmBlackList parameter
    • Set jobdir as the default for the DispatcherType parameter of wm
    • Add the parameter EnableRecovery = true; ti the wm configuration section
Changed:
<
<
    • Add this cron to create the host proxy:
>
>
    • Add this cron to create the host proxy:
 0 */6 * * * glite . /opt/glite/etc/profile.d/grid-env.sh ; /opt/glite/sbin/glite-wms-create-proxy.sh /var/glite/wms.proxy /var/log/glite/create_proxy.log
    • Rotate the logs of the purgeStorage
Line: 313 to 289
 
  • BUG #39215
    • If you use in the cron-purger the option -n, no check is done against option -t: e.g. nodes are purged also if they are in running state
Changed:
<
<
    • On script glite-wms-purgeStorage.sh the command glite-wms-create-proxy.sh has to be called with its path:
>
>
    • On script glite-wms-purgeStorage.sh the command glite-wms-create-proxy.sh has to be called with its path:
 [glite@devel19 ~]$ /opt/glite/sbin/glite-wms-purgeStorage.sh -l /var/log/glite/glite-wms-purgeStorage.log -p /var/glite/SandboxDir -t 302400 Certificate will expire /opt/glite/sbin/glite-wms-purgeStorage.sh: line 27: glite-wms-create-proxy.sh: command not found
Changed:
<
<
    • Probably on /opt/glite/sbin/glite-wms-create-proxy.sh it is better to use the --force option for command mv, instead the move commands should be blocked here:
>
>
    • Probably on /opt/glite/sbin/glite-wms-create-proxy.sh it is better to use the --force option for command mv, instead the move commands should be blocked here:
 [glite@devel19 ~]$ mv /var/glite/wms.proxy.14913 /var/glite/wms.proxy mv: overwrite `/var/glite/wms.proxy', overriding mode 0400?
Changed:
<
<
>
>
  • Bug #40967: Problems in script glite_wms_wmproxy_load_monitor
 

Revision 202008-09-01 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"
Line: 277 to 277
 

Configuration

Changed:
<
<
  • Add glite-wms-ice at the metapackage dependencies
>
>
  • Add glite-wms-ice to the metapackage dependencies
  • Add glite-info-provider-service to the metapackage dependencies
  • Bug #39308
 
  • Remove "rgma" fron the enviroment variable GLITE_SD_PLUGIN
  • Add "globus gridftp" at the gliteservices list (gLiteservices)
Deleted:
<
<
  • Remove from the metapackage these rpms:
    • checkpointing
    • partitioner
    • interactive (and thirdparty-bypass)
    • rgma-* (and service-discovery-rgma-c)
  • glite-lb-interlogd doesn't stop (you need a "kill -9")
 
  • Modify the cron purger file
# Execute the 'purger' command at every day except on Sunday with a frequency of six hours
Line: 304 to 300
 0 */6 * * * glite . /opt/glite/etc/profile.d/grid-env.sh ; /opt/glite/sbin/glite-wms-create-proxy.sh /var/glite/wms.proxy /var/log/glite/create_proxy.log
  • Rotate the logs of the purgeStorage
Added:
>
>
  • Remove from the metapackage these rpms:
    • checkpointing
    • partitioner
    • interactive (and thirdparty-bypass)
    • rgma-* (and service-discovery-rgma-c)
  • glite-lb-interlogd doesn't stop (you need a "kill -9")
 

Revision 192008-07-28 - ElisabettaMolinari

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"
Line: 309 to 309
 

Other

Changed:
<
<
>
>
 
    • If you use in the cron-purger the option -n, no check is done against option -t: e.g. nodes are purged also if they are in running state
    • On script glite-wms-purgeStorage.sh the command glite-wms-create-proxy.sh has to be called with its path:

Revision 182008-07-23 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"
Line: 68 to 68
 
    • Required documents are not put into the glite doc template in edms
    • For the R6 (JDL howto) document a broken link is given
Changed:
<
<
  • Submitting a "parametric" jobs it always failed with this error:
>
>
  • BUG #39217. Submitting a "parametric" jobs it always failed with this error:
 
*************************************************************
BOOKKEEPING INFORMATION:
Line: 164 to 164
 

WM

Added:
>
>
  • If a job is in the WM's queue:
*************************************************************
BOOKKEEPING INFORMATION:

Status info for the Job : https://devel17.cnaf.infn.it:9000/b9w6Uh1EPfWm6ZFHqhiioQ
Current Status:     Waiting
Submitted:          Wed Jul 23 11:50:42 2008 CEST
*************************************************************

it is not possible to cancel it:

**********************************************************************
LOGGING INFORMATION:

Printing info for the Job : https://devel17.cnaf.infn.it:9000/b9w6Uh1EPfWm6ZFHqhiioQ

        ---
Event: RegJob
- Source                     =    NetworkServer
- Timestamp                  =    Wed Jul 23 11:50:42 2008 CEST
        ---
Event: RegJob
- Source                     =    NetworkServer
- Timestamp                  =    Wed Jul 23 11:50:42 2008 CEST
        ---
Event: UserTag
- Source                     =    NetworkServer
- Timestamp                  =    Wed Jul 23 11:50:42 2008 CEST
        ---
Event: Accepted
- Source                     =    NetworkServer
- Timestamp                  =    Wed Jul 23 11:50:43 2008 CEST
        ---
Event: EnQueued
- Result                     =    START
- Source                     =    NetworkServer
- Timestamp                  =    Wed Jul 23 11:50:43 2008 CEST
        ---
Event: EnQueued
- Result                     =    OK
- Source                     =    NetworkServer
- Timestamp                  =    Wed Jul 23 11:50:43 2008 CEST
        ---
Event: Cancel
- Source                     =    NetworkServer
- Timestamp                  =    Wed Jul 23 12:31:13 2008 CEST
        ---
Event: Cancel
- Source                     =    WorkloadManager
- Timestamp                  =    Wed Jul 23 12:31:13 2008 CEST
        ---
Event: Cancel
- Source                     =    JobController
- Timestamp                  =    Wed Jul 23 12:31:15 2008 CEST
        ---
Event: Cancel
- Source                     =    JobController
- Timestamp                  =    Wed Jul 23 12:31:15 2008 CEST

**********************************************************************
 
  • If you remove a pending node the request is not removed from the "old" directory of the jobdir:
/var/glite/workload_manager/jobdir/old:
Line: 244 to 309
 

Other

Added:
>
>
 
  • If you use in the cron-purger the option -n, no check is done against option -t: e.g. nodes are purged also if they are in running state
Deleted:
<
<
 
  • On script glite-wms-purgeStorage.sh the command glite-wms-create-proxy.sh has to be called with its path:
[glite@devel19 ~]$ /opt/glite/sbin/glite-wms-purgeStorage.sh -l /var/log/glite/glite-wms-purgeStorage.log -p /var/glite/SandboxDir -t 302400

Revision 172008-07-22 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"

glite_wms_R_3_1_100

Changed:
<
<

WMPROXY

>
>

WMPROXY - UI

  • After two consecutive fails due to error System load is too high the glite_wms_job_submit command returns this error:
-----------------------------------------
22 July 2008, 12:31:19 -I- PID: 16522 (Debug) - Calling the WMProxy delegationns__getProxyReq service
-----------------------------------------
<?xml version="1.0" encoding="UTF-8"?>
<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:delegationns="http://www.gridsite.org/namespaces/delegation-1" xmlns:ns1="http://glite.org/wms/wmproxy"><SOAP-ENV:Body><delegationns:getProxyReq><delegationID>RSXb2eAtvje_8qidBLPN4g</delegationID></delegationns:getProxyReq></SOAP-ENV:Body></SOAP-ENV:Envelope>
 
  • If you try a job-info of a parametrics node, you receive an Aborted signal:
Line: 235 to 244
 

Other

Added:
>
>
  • If you use in the cron-purger the option -n, no check is done against option -t: e.g. nodes are purged also if they are in running state
 
  • On script glite-wms-purgeStorage.sh the command glite-wms-create-proxy.sh has to be called with its path:
[glite@devel19 ~]$ /opt/glite/sbin/glite-wms-purgeStorage.sh -l /var/log/glite/glite-wms-purgeStorage.log -p /var/glite/SandboxDir -t 302400

Revision 162008-07-21 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"

glite_wms_R_3_1_100

WMPROXY

Added:
>
>
  • If you try a job-info of a parametrics node, you receive an Aborted signal:

[ale@cream-15 src]$ glite-wms-job-info --jdl https://devel17.cnaf.infn.it:9000/xgTd_SELrAYKvwXShOuFXw
terminate called after throwing an instance of 'glite::wms::wmproxyapi::BaseException*'
Aborted
 
  • Command glite-wms-job-info --jdl doesn't work with nodes of a collection #38689
[ale@cream-15 UI]$ glite-wms-job-info --jdl https://devel17.cnaf.infn.it:9000/RAeQ1KgqrjHK4MYzB8t8Hw

Revision 152008-07-11 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"

Revision 142008-07-09 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"

glite_wms_R_3_1_100

WMPROXY

Changed:
<
<
  • Command glite-wms-job-info --jdl doesn't work with nodes of a collection
>
>
  • Command glite-wms-job-info --jdl doesn't work with nodes of a collection #38689
 
[ale@cream-15 UI]$ glite-wms-job-info --jdl https://devel17.cnaf.infn.it:9000/RAeQ1KgqrjHK4MYzB8t8Hw

Revision 132008-07-09 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"

glite_wms_R_3_1_100

WMPROXY

Added:
>
>
  • Command glite-wms-job-info --jdl doesn't work with nodes of a collection
[ale@cream-15 UI]$ glite-wms-job-info --jdl https://devel17.cnaf.infn.it:9000/RAeQ1KgqrjHK4MYzB8t8Hw

Connecting to the service https://devel19.cnaf.infn.it:7443/glite_wms_wmproxy_server


Error - WMProxy Server Error
Unable to read file: /var/glite/SandboxDir/RA/https_3a_2f_2fdevel17.cnaf.infn.it_3a9000_2fRAeQ1KgqrjHK4MYzB8t8Hw/JDLToStart
(please contact server administrator)

Method: getJDL
 
  • Sometimes wmproxy returns this error message submitting a job:
Connection failed: HTTP Error
Line: 133 to 147
 

WM

Added:
>
>
  • If you remove a pending node the request is not removed from the "old" directory of the jobdir:
/var/glite/workload_manager/jobdir/old:
total 16
-rw-r--r--  1 glite glite 250 Jul  9 13:37 20080709T113710.627592_3086920576
-rw-r--r--  1 glite glite 250 Jul  9 13:37 20080709T113711.225739_3086920576
-rw-r--r--  1 glite glite 250 Jul  9 13:37 20080709T113711.794866_3086920576
-rw-r--r--  1 glite glite 250 Jul  9 13:37 20080709T113712.361657_3086920576
 
  • After the recovery of a collection the "pending" nodes are forgotten.

Revision 122008-07-08 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"
Line: 135 to 135
 
  • After the recovery of a collection the "pending" nodes are forgotten.
Changed:
<
<
>
>
 
    • If the sum of the dimensions of the first two files is exactly equal to the limit, the job stays in state RUNNING for ever and ever.

  • BUG #38509: The WM's recovery procedure hangs if no relevant events are found for a given request
Line: 199 to 199
 
  • Rotate the logs of the purgeStorage
  • Check bugs #35244
Added:
>
>
 

Other

Revision 112008-07-08 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"
Line: 140 to 140
 
  • BUG #38509: The WM's recovery procedure hangs if no relevant events are found for a given request
Changed:
<
<
  • BUG #38366: Recovery doesn't work with a list-match request
>
>
  • BUG #38366: Recovery doesn't work with a list-match request. FIXED
 
25 Jun, 15:31:50 -I: [Info] operator()(dispatcher.cpp:754): Dispatcher: starting
25 Jun, 15:31:50 -I: [Info] operator()(dispatcher.cpp:757): Dispatcher: doing recovery

Revision 102008-07-07 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"
Line: 135 to 135
 
  • After the recovery of a collection the "pending" nodes are forgotten.
Changed:
<
<
>
>
 
    • If the sum of the dimensions of the first two files is exactly equal to the limit, the job stays in state RUNNING for ever and ever.

  • BUG #38509: The WM's recovery procedure hangs if no relevant events are found for a given request
Changed:
<
<
  • Recovery doesn't work with a list-match request: FIXED
>
>
  • BUG #38366: Recovery doesn't work with a list-match request
 
25 Jun, 15:31:50 -I: [Info] operator()(dispatcher.cpp:754): Dispatcher: starting
25 Jun, 15:31:50 -I: [Info] operator()(dispatcher.cpp:757): Dispatcher: doing recovery

Revision 92008-07-04 - ElisabettaMolinari

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"
Line: 138 to 138
 
  • BUG #27215: NOT FIXED
    • If the sum of the dimensions of the first two files is exactly equal to the limit, the job stays in state RUNNING for ever and ever.
Changed:
<
<
  • Recovery doesn't work with a list-match request:
>
>
  • BUG #38509: The WM's recovery procedure hangs if no relevant events are found for a given request

  • Recovery doesn't work with a list-match request: FIXED
 
25 Jun, 15:31:50 -I: [Info] operator()(dispatcher.cpp:754): Dispatcher: starting
25 Jun, 15:31:50 -I: [Info] operator()(dispatcher.cpp:757): Dispatcher: doing recovery

Revision 82008-06-30 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"
Line: 133 to 133
 

WM

Added:
>
>
  • After the recovery of a collection the "pending" nodes are forgotten.
 
  • BUG #27215: NOT FIXED
    • If the sum of the dimensions of the first two files is exactly equal to the limit, the job stays in state RUNNING for ever and ever.

Revision 72008-06-30 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"

glite_wms_R_3_1_100

WMPROXY

Added:
>
>
  • Sometimes wmproxy returns this error message submitting a job:
Connection failed: HTTP Error
<html><head>
<title>500 Internal Server Error</title>
</head><body>
<h1>Internal Server Error</h1>
<p>The server encountered an internal error or
misconfiguration and was unable to complete
your request.</p>
<p>Please contact the server administrator,
 [no address given] and inform them of the time the error occurred,
and anything you might have done that may have
caused the error.</p>
<p>More information about this error may be available
in the server error log.</p>
</body></html>

Error code: SOAP-ENV:Server
 
  • I submit a job with a proxy of a VO which is not enabled on the WMS; the error message is not so clear:
Warning - Unable to delegate the credential to the endpoint: https://devel19.cnaf.infn.it:7443/glite_wms_wmproxy_server
Line: 184 to 205
 /opt/glite/sbin/glite-wms-purgeStorage.sh: line 27: glite-wms-create-proxy.sh: command not found
Changed:
<
<
>
>
  • Probably on /opt/glite/sbin/glite-wms-create-proxy.sh it is better to use the --force option for command mv, instead the move commands should be blocked here:
[glite@devel19 ~]$ mv /var/glite/wms.proxy.14913 /var/glite/wms.proxy
mv: overwrite `/var/glite/wms.proxy', overriding mode 0400? 
 

Revision 62008-06-27 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"

glite_wms_R_3_1_100

WMPROXY

Added:
>
>
  • I submit a job with a proxy of a VO which is not enabled on the WMS; the error message is not so clear:
Warning - Unable to delegate the credential to the endpoint: https://devel19.cnaf.infn.it:7443/glite_wms_wmproxy_server
Unknown Soap fault
Method: Soap Error
 
  • BUG #23443: NOT FIXED
    • Required documents are not put into the glite doc template in edms
    • For the R6 (JDL howto) document a broken link is given
Line: 104 to 111
 

WM

Added:
>
>
  • BUG #27215: NOT FIXED
    • If the sum of the dimensions of the first two files is exactly equal to the limit, the job stays in state RUNNING for ever and ever.
 
  • Recovery doesn't work with a list-match request:
25 Jun, 15:31:50 -I: [Info] operator()(dispatcher.cpp:754): Dispatcher: starting

Revision 52008-06-27 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"

glite_wms_R_3_1_100

WMPROXY

Added:
>
>
  • BUG #23443: NOT FIXED
    • Required documents are not put into the glite doc template in edms
    • For the R6 (JDL howto) document a broken link is given
 
  • Submitting a "parametric" jobs it always failed with this error:
*************************************************************
Line: 17 to 21
 ***********************************************************
Added:
>
>
To help debugging:

27 Jun, 13:42:53 -D- PID: 5326 - "wmpgsoapoperations::ns1__jobStart": ------------------------------- Fault Description --------------------------------
27 Jun, 13:42:53 -D- PID: 5326 - "wmpgsoapoperations::ns1__jobStart": Method: jobStart
27 Jun, 13:42:53 -D- PID: 5326 - "wmpgsoapoperations::ns1__jobStart": Code: 1502
27 Jun, 13:42:53 -D- PID: 5326 - "wmpgsoapoperations::ns1__jobStart": Description: requirements: unable to complete the operation: the attribute has not been initialised yet
27 Jun, 13:42:53 -D- PID: 5326 - "wmpgsoapoperations::ns1__jobStart": Stack:
27 Jun, 13:42:53 -D- PID: 5326 - "wmpgsoapoperations::ns1__jobStart": AdEmptyException: requirements: unable to complete the operation: the attribute has not been initialised yet
        at Ad::delAttribute(const string& attr_name)[Ad.cpp:180]
27 Jun, 13:42:53 -D- PID: 5326 - "wmpgsoapoperations::ns1__jobStart":   at JobAd::delAttribute(const string& attr_name)[JobAd.cpp:292]
27 Jun, 13:42:53 -D- PID: 5326 - "wmpgsoapoperations::ns1__jobStart":   at submit()[wmpcoreoperations.cpp:1936]
27 Jun, 13:42:53 -D- PID: 5326 - "wmpgsoapoperations::ns1__jobStart":   at jobStart()[wmpcoreoperations.cpp:1105]
27 Jun, 13:42:53 -D- PID: 5326 - "wmpgsoapoperations::ns1__jobStart": ----------------------------------------------------------------------------------
 
  • The status of a collection (or a dag) is corretly set to CLEARED after a "glite-wms-get-output", but its nodes remain in "Done (Success)" status.
*************************************************************
Line: 146 to 166
 

Other

Changed:
<
<
>
>
  • On script glite-wms-purgeStorage.sh the command glite-wms-create-proxy.sh has to be called with its path:
[glite@devel19 ~]$ /opt/glite/sbin/glite-wms-purgeStorage.sh -l /var/log/glite/glite-wms-purgeStorage.log -p /var/glite/SandboxDir -t 302400
Certificate will expire
/opt/glite/sbin/glite-wms-purgeStorage.sh: line 27: glite-wms-create-proxy.sh: command not found
 

Revision 42008-06-25 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"
Changed:
<
<

Version glite_wms_R_3_1_100

>
>
 
Changed:
<
<

2008-06-25 (Ale)

  • The status of the collection is not correctly computed
>
>

glite_wms_R_3_1_100

WMPROXY

  • Submitting a "parametric" jobs it always failed with this error:
*************************************************************
BOOKKEEPING INFORMATION:

Status info for the Job : https://devel17.cnaf.infn.it:9000/wzOz1QuyFAEJ5JnwXHUhng
Current Status:     Aborted
Status Reason:      requirements: unable to complete the operation: the attribute has not been initialised yet
Submitted:          Wed Jun 25 16:25:03 2008 CEST
*************************************************************
 
Changed:
<
<

2008-06-24 (Ale)

>
>
  • The status of a collection (or a dag) is corretly set to CLEARED after a "glite-wms-get-output", but its nodes remain in "Done (Success)" status.
*************************************************************
BOOKKEEPING INFORMATION:
 
Changed:
<
<
  • The status of a collection is corretly set to CLEARED after a "glite-wms-get-output", but its nodes remain in "Done (Success)" status.
  • One collection over 60 submissions remain in "Waiting" status due to a LB problem (its nodes are in status "Submitted"); this are the wmproxy logs:
>
>
Status info for the Job : https://devel17.cnaf.infn.it:9000/xe-J0mjeFjU86p2mstn8-A Current Status: Cleared Status Reason: user retrieved output sandbox Destination: dagman Submitted: Wed Jun 25 16:08:43 2008 CEST ***********************************************************

- Nodes information for: Status info for the Job : https://devel17.cnaf.infn.it:9000/6NeMTtefAmTCRI0N1waKzA Current Status: Done (Success) Logged Reason(s): - Job terminated successfully Exit code: 0 Status Reason: Job terminated successfully Destination: ce-01.roma3.infn.it:2119/jobmanager-lcgpbs-cert Submitted: Wed Jun 25 16:08:43 2008 CEST ***********************************************************

Status info for the Job : https://devel17.cnaf.infn.it:9000/OpPpNrh1EYNwPhjQQaVazQ Current Status: Done (Success) Logged Reason(s): - Job terminated successfully Exit code: 0 Status Reason: Job terminated successfully Destination: atlasce01.na.infn.it:2119/jobmanager-lcgpbs-cert Submitted: Wed Jun 25 16:08:43 2008 CEST ***********************************************************

 
Added:
>
>
  • One collection over 60 submissions remains in "Waiting" status due to a LB problem (its nodes are in status "Submitted"); this are the wmproxy logs:
 
 
23 Jun, 18:58:18 -D- PID: 24834 - "WMPEventlogger::isAborted": Quering LB Proxy...
23 Jun, 18:58:18 -D- PID: 24834 - "WMPEventlogger::isAborted": LBProxy is enabled
Line: 30 to 72
 23 Jun, 18:58:19 -I- PID: 24834 - "wmproxy::main": ----------------------------------------
Changed:
<
<

2008-06-20 (Ale)

  • There is this warning in wmproxy submitting collections:
>
>
  • There is this warning in wmproxy submitting collections:
 
19 Jun, 14:58:32 -W- PID: 2746 - "wmpcoreoperations::submit": Unable to find SDJRequirements in configuration file
Changed:
<
<
  • Fond a problem in the wmproxy. After some collections submissions it stops working and with the top command you can see that the glite-wms-proxy process aer using all the CPU:
>
>
  • Fond a problem in the wmproxy. After some collections submissions it stops working and with the top command you can see that the glite-wms-proxy processes are using all the CPU (It seems that the problem is related with the porting to gsoap 2.7.10):
 
[Thu Jun 19 15:43:50 2008] [warn] FastCGI: (dynamic) server "/opt/glite/bin/glite_wms_wmproxy_server" (pid 7938) terminated due to uncaught signal '11' (Segmentation fault)
Changed:
<
<

2008-06-19 (Ale)

>
>

WM

  • Recovery doesn't work with a list-match request:
25 Jun, 15:31:50 -I: [Info] operator()(dispatcher.cpp:754): Dispatcher: starting
25 Jun, 15:31:50 -I: [Info] operator()(dispatcher.cpp:757): Dispatcher: doing recovery
25 Jun, 15:31:50 -I: [Info] operator()(RequestHandler.cpp:389): RequestHandler: starting
25 Jun, 15:31:50 -I: [Info] operator()(recovery.cpp:613): recovering https://localhost:6000/4xvi9qZqMbrNgQeuzOIGbA
25 Jun, 15:31:50 -I: [Info] operator()(RequestHandler.cpp:389): RequestHandler: starting
25 Jun, 15:31:50 -I: [Info] operator()(RequestHandler.cpp:389): RequestHandler: starting
25 Jun, 15:31:50 -I: [Info] operator()(RequestHandler.cpp:389): RequestHandler: starting
25 Jun, 15:31:50 -I: [Info] operator()(RequestHandler.cpp:389): RequestHandler: starting
25 Jun, 15:31:50 -I: [Info] main(main.cpp:292): running...
25 Jun, 15:31:50 -W: [Warning] single_request_recovery(recovery.cpp:317): cannot create LB context (2)
25 Jun, 15:31:50 -E: [Error] operator()(dispatcher.cpp:779): Dispatcher: cannot create LB context (2). Exiting...
25 Jun, 15:31:50 -I: [Info] operator()(RequestHandler.cpp:467): RequestHandler: End of input. Exiting...
25 Jun, 15:31:50 -I: [Info] operator()(RequestHandler.cpp:467): RequestHandler: End of input. Exiting...
25 Jun, 15:31:50 -I: [Info] operator()(RequestHandler.cpp:467): RequestHandler: End of input. Exiting...
25 Jun, 15:31:50 -I: [Info] operator()(RequestHandler.cpp:467): RequestHandler: End of input. Exiting...
25 Jun, 15:31:50 -I: [Info] operator()(RequestHandler.cpp:467): RequestHandler: End of input. Exiting...
25 Jun, 15:31:50 -I: [Info] main(main.cpp:295): TaskQueue destroyed

JC/LM

ICE

LB

  • The status of the collection is not correctly computed
  • glite-lb-interlogd doesn't stop (you need a "kill -9")

Configuration

 
  • Add glite-wms-ice at the metapackage dependencies
  • Remove "rgma" fron the enviroment variable GLITE_SD_PLUGIN
Line: 73 to 144
 
  • Rotate the logs of the purgeStorage
  • Check bugs #35244
Changed:
<
<
>
>

Other

 

Revision 32008-06-25 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"

Version glite_wms_R_3_1_100

Added:
>
>

2008-06-25 (Ale)

  • The status of the collection is not correctly computed
 

2008-06-24 (Ale)

  • The status of a collection is corretly set to CLEARED after a "glite-wms-get-output", but its nodes remain in "Done (Success)" status.
Line: 68 to 71
 0 */6 * * * glite . /opt/glite/etc/profile.d/grid-env.sh ; /opt/glite/sbin/glite-wms-create-proxy.sh /var/glite/wms.proxy /var/log/glite/create_proxy.log
  • Rotate the logs of the purgeStorage
Added:
>
>
 

Revision 22008-06-24 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"

Version glite_wms_R_3_1_100

Added:
>
>

2008-06-24 (Ale)

  • The status of a collection is corretly set to CLEARED after a "glite-wms-get-output", but its nodes remain in "Done (Success)" status.
  • One collection over 60 submissions remain in "Waiting" status due to a LB problem (its nodes are in status "Submitted"); this are the wmproxy logs:

 
23 Jun, 18:58:18 -D- PID: 24834 - "WMPEventlogger::isAborted": Quering LB Proxy...
23 Jun, 18:58:18 -D- PID: 24834 - "WMPEventlogger::isAborted": LBProxy is enabled
Unable to query LB and LBProxy
edg_wll_QueryEvents[Proxy]
Exit code: 4
LB[Proxy] Error: Interrupted system call
(edg_wll_plain_read())
23 Jun, 18:58:18 -D- PID: 24834 - "wmpcoreoperations::jobStart": Logging LOG_ENQUEUE_FAIL
23 Jun, 18:58:18 -D- PID: 24834 - "WMPEventlogger::logEvent": Logging to LB Proxy...
23 Jun, 18:58:18 -D- PID: 24834 - "WMPEventlogger::logEvent": Logging Enqueue FAIL event...
23 Jun, 18:58:19 -D- PID: 24834 - "WMPEventlogger::testAndLog": LB call succeeded
23 Jun, 18:58:19 -D- PID: 24834 - "wmpcoreoperations::jobStart": Removing lock...
23 Jun, 18:58:19 -I- PID: 24834 - "wmpgsoapoperations::ns1__jobStart": jobStart operation completed

23 Jun, 18:58:19 -I- PID: 24834 - "wmproxy::main": -------- Exiting Server Instance -------
23 Jun, 18:58:19 -I- PID: 24834 - "wmproxy::main": Signal code received: 15
23 Jun, 18:58:19 -I- PID: 24834 - "wmproxy::main": ----------------------------------------
 

2008-06-20 (Ale)

  • There is this warning in wmproxy submitting collections:

Revision 12008-06-20 - AlessioGianelle

Line: 1 to 1
Added:
>
>
META TOPICPARENT name="TestWokPlan"

Version glite_wms_R_3_1_100

2008-06-20 (Ale)

  • There is this warning in wmproxy submitting collections:
19 Jun, 14:58:32 -W- PID: 2746 - "wmpcoreoperations::submit": Unable to find SDJRequirements in configuration file

  • Fond a problem in the wmproxy. After some collections submissions it stops working and with the top command you can see that the glite-wms-proxy process aer using all the CPU:
[Thu Jun 19 15:43:50 2008] [warn] FastCGI: (dynamic) server "/opt/glite/bin/glite_wms_wmproxy_server" (pid 7938) terminated due to uncaught signal '11' (Segmentation fault)

2008-06-19 (Ale)

  • Add glite-wms-ice at the metapackage dependencies
  • Remove "rgma" fron the enviroment variable GLITE_SD_PLUGIN
  • Add "globus gridftp" at the gliteservices list (gLiteservices)
  • Remove from the metapackage these rpms:
    • checkpointing
    • partitioner
    • interactive (and thirdparty-bypass)
    • rgma-* (and service-discovery-rgma-c)
  • glite-lb-interlogd doesn't stop (you need a "kill -9")
  • Modify the cron purger file
# Execute the 'purger' command at every day except on Sunday with a frequency of six hours
3 */6 * * mon-sat glite . /opt/glite/etc/profile.d/grid-env.sh ; /opt/glite/sbin/glite-wms-purgeStorage.sh -l /var/log/glite/glite-wms-purgeStorage.log -p /var/glite/SandboxDir -t 604800 > /dev/null 2>&1

# Execute the 'purger' command on each Sunday (sun) forcing removal of dag nodes,
# orphan dag nodes without performing any status checking (threshold of 2 weeks).
0 1 * * sun glite . /opt/glite/etc/profile.d/grid-env.sh ; /opt/glite/sbin/glite-wms-purgeStorage.sh -l /var/log/glite/glite-wms-purgeStorage.log -p /var/glite/SandboxDir -o -s -n -t 1296000 > /dev/null 2>&1
  • Set the ICE configuration section
  • Remove IsmBlackList parameter
  • Set jobdir as the default for the DispatcherType parameter of wm
  • Add the parameter EnableRecovery = true; ti the wm configuration section
  • Add this cron to create the host proxy:
0 */6 * * * glite . /opt/glite/etc/profile.d/grid-env.sh ; /opt/glite/sbin/glite-wms-create-proxy.sh /var/glite/wms.proxy /var/log/glite/create_proxy.log
  • Rotate the logs of the purgeStorage

-- AlessioGianelle - 20 Jun 2008

META TOPICMOVED by="AlessioGianelle" date="1213954054" from="EgeeJra1It.WmsTest3dot1dot100" to="EgeeJra1It.WmsBugs3dot1dot100"
 
This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback