Difference: WmsTestsP2876 (1 vs. 79)

Revision 792011-03-24 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestPage"
Line: 159 to 159
 

Bug #77055 : "MyProxyServer: wrong type caught for attribute" for parametric jobs TO CERTIFY

Added:
>
>

Bug #77124 : WMS UI doesn't put the DSUpload_.out file in the OutputSandbox attribute list. TO CERTIFY

Bug #77325 : Env variables, ~ character are not correctly expanded in the WMS UI TO CERTIFY

 

Bug #77694 : Resource BDII for WMS needs to be revisit TO CERTIFY

Revision 782011-03-14 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestPage"
Line: 176 to 176
 

Bug #78484 : [ YAIM_WMS ] Multiple parameter configuration added in condor_config.local TO CERTIFY

Added:
>
>

Bug #79141 : various bugs about parametric jobs TO CERTIFY

 

Preliminary tests

Pre Certification tests

Revision 772011-03-04 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestPage"
Line: 141 to 141
 

Bug #74577 : Wrong counter in ICE database is set at the job creation TO CERTIFY

Added:
>
>

Bug #74737 : glite-wms-wmproxy start removes fastcgi socket ! TO CERTIFY

 

Bug #75223 : wrong reason logged TO CERTIFY

Revision 762011-03-01 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestPage"
Line: 165 to 165
 

Bug #78030 : Alternative GLITE_WMS_LOG_DESTINATION in the jobwrapper TO CERTIFY

Added:
>
>

Bug #78047 : LB Query timeout TO CERTIFY

 

Bug #78406 : [ yaim-wms ] yaim should set IsmIiLDAPCEFilterExt according to the supported VO(s) TO CERTIFY

Revision 752011-02-24 - AlessioGianelle

Line: 1 to 1
Changed:
<
<
META TOPICPARENT name="TestWokPlan"
>
>
META TOPICPARENT name="TestPage"
 

Certification report patch 2876

Revision 742011-02-22 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"
Line: 18 to 18
 

Check bugs

Added:
>
>

Bug #33342 : separate retry policies for ISB and OSB TO CERTIFY

Bug #36292 : Not all attributes of a SA/SE coul be used in a gangmatching TO CERTIFY

Bug #39636 : Unable to use GT4-style proxy with WMProxy service TO CERTIFY

Bug #40982 : When a collection is aborted the "Abort" event should be logged for the sub-nodes as well /2 TO CERTIFY

Bug #44599 : WMS should consider MaxTotalJobs TO CERTIFY

Bug #45876 : huge lbproxy database on a dedicated WMS host TO CERTIFY

Bug #45883 : Optimization of resubmission TO CERTIFY

Bug #48636 : job wrapper should log events for truncated files TO CERTIFY

Bug #48640 : glite-wms-wmproxy to support graceful command TO CERTIFY

Bug #48703 : jobwrapper should consider trying the sandbox using different protocols TO CERTIFY

Bug #49844 : WMProxy does not catch signal 25 TO CERTIFY

Bug #50009 : wmproxy.gacl person record allows anyone to pass TO CERTIFY

Bug #52617 : [ yaim-wms ] host{cert,key}.pem in /home/glite TO CERTIFY

Bug #53294 : WMS 3.2 WMProxy logs are useless below level 6 TO CERTIFY

Bug #54728 : WMP finds FQAN inconsistency only if GROUPS are different, not ROLES TO CERTIFY

Bug #55122 : WM running but not processing jobs TO CERTIFY

Bug #55814 : the amount of information logged to the LB needs to be reviewed TO CERTIFY

Bug #56034 : Matchmaking with JobType=Normal does not take NodeNumber into account TO CERTIFY

Bug #56734 : ListMatch should consider also SDJ specification TO CERTIFY

Bug #56933 : WMProxy Server: gSoap needs to be built with WITH_IPV6 flag TO CERTIFY

Bug #58878 : Request for a feature allowing propagation of generic parameters from JDL to LRMs TO CERTIFY

Bug #58968 : Request for handling SMPGranularity attribute in the JDL TO CERTIFY

Bug #59781 : limit maximum sleep time in job wrapper TO CERTIFY

Bug #61315 : [ yaim-wms ] CeForwardParameters should include several more parameters TO CERTIFY

Bug #61557 : user job is not killed when proxy expires TO CERTIFY

Bug #62211 : [ yaim-wms ] Enable Glue 2.0 publishing TO CERTIFY

Bug #62709 : glite_wms_wmproxy_load_monitor has a problem with lvm partitions TO CERTIFY

Bug #64416 : the proxycache purger needs to be made compatible with the latest gridsite releases TO CERTIFY

Bug #68944 : Bug in ICE's start/stop script TO CERTIFY

Bug #70061 : WMS hates collections with 192 nodes! TO CERTIFY

Bug #70331 : glite-wms-create-proxy "ambiguous redirect" TO CERTIFY

Bug #70824 : environment values in JDL cannot have spaces TO CERTIFY

Bug #71438 : proxycache purger fails on symlinked proxy cache TO CERTIFY

Bug #71863 : JobWrapper tries to use "test -eq" for string comparison TO CERTIFY

Bug #73192 : Submission failed due to a credential problem TO CERTIFY

Bug #73699 : Wrong retry count computation TO CERTIFY

Bug #73711 : edg_wll_SetLoggingJobProxy with empty sequence code returns "no state in DB" TO CERTIFY

Bug #73715 : missing ReallyRunning event from LogMonitor TO CERTIFY

Bug #74221 : Perusal doesn't work with nodes of a collection TO CERTIFY

Bug #74259 : Previous matches information is not taken into account if direct submission is used TO CERTIFY

Bug #74577 : Wrong counter in ICE database is set at the job creation TO CERTIFY

Bug #75223 : wrong reason logged TO CERTIFY

Bug #75368 : ICE should log a DONE_FAILED to LB every time the job is going to be resubmitted TO CERTIFY

Bug #75402 : Synchronization loss between real validity of proxy and exp. time saved in ICE's database TO CERTIFY

Bug #77004 : Wrong myproxyserver string processing in ICE TO CERTIFY

Bug #77055 : "MyProxyServer: wrong type caught for attribute" for parametric jobs TO CERTIFY

Bug #77694 : Resource BDII for WMS needs to be revisit TO CERTIFY

Bug #77876 : While purging DAGs/Collections the CLEAR event is only logged for the parent node TO CERTIFY

Bug #78030 : Alternative GLITE_WMS_LOG_DESTINATION in the jobwrapper TO CERTIFY

Bug #78406 : [ yaim-wms ] yaim should set IsmIiLDAPCEFilterExt according to the supported VO(s) TO CERTIFY

Bug #78484 : [ YAIM_WMS ] Multiple parameter configuration added in condor_config.local TO CERTIFY

 

Preliminary tests

Changed:
<
<

Problems and solutions

>
>
Pre Certification tests
 
Deleted:
<
<
  1.  /opt/glite/yaim/functions/config_glite_lb: line 99: /opt/glite/etc/glite-lb-dbsetup.sql: No such file or directory
     ERROR 1146 (42S02) at line 1: Table 'lbserver20.short_fields' doesn't exist
     ERROR 1146 (42S02) at line 1: Table 'lbserver20.long_fields' doesn't exist
     ERROR 1146 (42S02) at line 1: Table 'lbserver20.states' doesn't exist
     ERROR 1146 (42S02) at line 1: Table 'lbserver20.events' doesn't exist
     /opt/glite/yaim/functions/config_glite_lb: line 190: /opt/glite/etc/init.d/glite-lb-bkserverd: No such file or directory
     /opt/glite/yaim/functions/config_glite_lb: line 200: /opt/glite/etc/init.d/glite-lb-bkserverd: No such -> file or directory
        ABORT: Service glite-lb-bkserverd failed to start!
        ERROR: Error during the execution of function: config_glite_lb
        ERROR: Error during the configuration.Exiting.                              [FAILED]
        ERROR: One of the functions returned with error without specifying it's nature ! 
    We nedd to install LB FIXED
  2.  DEBUG: Skipping function: config_glite_lb_setenv because it is not defined
     DEBUG: Skipping function: config_glite_lb because it is not defined
     ERROR: Error during the configuration.Exiting.                              [FAILED]
    install glite-yaim-lb FIXED
  3. ERROR: Unable to execute /etc/init.d/globus-gridftp.
     ERROR: Error during the execution of function: config_globus_gridftp 
    install glite-initscript-globus-gridftp-1.0.2-1.noarch.rpm FIXED
  4.  Syntax error on line 242 of /opt/glite/etc/glite_wms_wmproxy_httpd.conf:
     FastCgiConfig: invalid option: -intial-env 
    fix the template /opt/glite/etc/glite_wms_wmproxy_httpd.conf.template FIXED
  5. start/stop script di ice doesn't work
    fix committed in CVS FIXED
  6. glite-yaim-wms ame:-ame:
    The version of the yaim-wms is written directly into the Makefile during the CVS checkout (see keyword substitution). Since HEAD is used for the development of this component the keyword substitution is not correct. When a tag is available the version will be correctly define FIXED
  7. Starting program: /opt/glite/bin/glite_wms_wmproxy_server
    (no debugging symbols found)
    [Thread debugging using libthread_db enabled]
    [New Thread 0x2b227bb94260 (LWP 25595)]
    
    Program received signal SIGSEGV, Segmentation fault.
    0x00000000004e94a8 in __static_initialization_and_destruction_0 ()
    (gdb) bt
    #0  0x00000000004e94a8 in __static_initialization_and_destruction_0 ()
    #1  0x0000000000565c26 in __do_global_ctors_aux ()
    #2  0x000000000045665b in _init ()
    #3  0x00002b2274847010 in __CTOR_LIST__ () from /opt/glite/lib64/libglite_lb_clientpp_gcc64dbg.so.4
    #4  0x0000000000565ba7 in __libc_csu_init ()
    #5  0x0000003835e1d92e in __libc_start_main () from /lib64/libc.so.6
    #6  0x0000000000457fc9 in _start ()
    (gdb)
    FIXED
  8.  Warning - Unable to register the job to the service: https://cream-44.pd.infn.it:7443/glite_wms_wmproxy_server
    Unable to create job local directory
    (please contact server administrator)
    
    
    [root@cream-03 ~]# ls -l /opt/glite/bin/glite_wms_wmproxy_dirmanager 
          -r-sr-xr-x 1 nobody nobody 46363 Jul  8 11:00 /opt/glite/bin/glite_wms_wmproxy_dirmanager
    FIXED . Implemented postun in ETICS.
  9. Bug: #72573
    [root@cream-03 ~]# service gLite status
    [...]
    *** glite-lb-locallogger:
    glite-lb-logd not running 
    [...]
    FIXED Using new LB tag (patch 4423)
  10. lbproxy not started
    GLITE_LB_TYPE=proxy is the default behaviour FIXED
  11. [Fri Jul 09 18:05:21 2010] [error] Certificate Verification: Error (24): invalid CA certificate
    [Fri Jul 09 18:05:21 2010] [error] Certificate Verification: Error (26): unsupported certificate purpose
    Trying a simple list-match. Is it a warning?
  12. In /opt/glite/etc/lcmaps/lcmaps.db substitute path = ${moddir} with path = /opt/glite/lib64/modules
    FIXED
  13. Remove SDJRequirements and WmsRequirements from section WorkloadManagerProxy of /opt/glite/etc/glite_wms.conf
    FIXED
  14. Put on section WorkloadManager of /opt/glite/etc/glite_wms.conf:
    WmsRequirements  = ( (ShortDeadlineJob =?= TRUE) ? RegExp(".*sdj$", other.GlueCEUniqueID) : !RegExp(".*sdj$", other.GlueCEUniqueID)) && 
    (other.GlueCEPolicyMaxTotalJobs == 0 || other.GlueCEStateTotalJobs < other.GlueCEPolicyMaxTotalJobs);
    FIXED
  15. Correct /opt/glite/etc/glite_wms_wmproxy.gacl: insert a / before the "vo" and the rule /"vo"/Role=NULL/Capability=NULL
    FIXED
  16. Cleared event is not logged:
    27 Jul, 17:41:22 -E- PID: 25792 - "wmputils::doPurge": [Error] remove_path(purger.cpp:256): LB event logging failed LB server (bkserver,lbproxy) store protocol error (1417) - edg_wll_LogEvent(): 
    LB server (bkserver,lbproxy) store protocol error;; Logging library ERROR: 
    LB server (bkserver,lbproxy) store protocol error;; edg_wll_DoLogEvent(): edg_wll_log_connect error
    Transport endpoint is not connected;; edg_wll_gss_connect();; System Error: Connection refused
    27 Jul, 17:41:22 -S- PID: 25792 - "wmpcoreoperations::jobpurge": Unable to complete job purge
    27 Jul, 17:41:22 -D- PID: 25792 - "wmpgsoapoperations::ns1__jobPurge": ------------------------------- Fault Description --------------------------------
    27 Jul, 17:41:22 -D- PID: 25792 - "wmpgsoapoperations::ns1__jobPurge": Method: jobPurge
    27 Jul, 17:41:22 -D- PID: 25792 - "wmpgsoapoperations::ns1__jobPurge": Code: 1202
    27 Jul, 17:41:22 -D- PID: 25792 - "wmpgsoapoperations::ns1__jobPurge": Description: The Operation is not allowed: Unable to complete job purge
    27 Jul, 17:41:22 -D- PID: 25792 - "wmpgsoapoperations::ns1__jobPurge": Stack: 
    27 Jul, 17:41:22 -D- PID: 25792 - "wmpgsoapoperations::ns1__jobPurge": JobOperationException: The Operation is not allowed: Unable to complete job purge
       at jobpurge()[wmpcoreoperations.cpp:2640]
    27 Jul, 17:41:22 -D- PID: 25792 - "wmpgsoapoperations::ns1__jobPurge":    at jobpurge()[wmpcoreoperations.cpp:2546]
    27 Jul, 17:41:22 -D- PID: 25792 - "wmpgsoapoperations::ns1__jobPurge":    at jobPurge()[wmpcoreoperations.cpp:2667]
    27 Jul, 17:41:22 -D- PID: 25792 - "wmpgsoapoperations::ns1__jobPurge": ----------------------------------------------------------------------------------
    27 Jul, 17:41:22 -D- PID: 25792 - "wmpgsoapoperations::ns1__jobPurge": jobPurge operation completed
    FIXED for normal jobs
  17. Submission to LCG CE doesn't work: Got a job held event, reason: Failed to initialize GAHP
    Possibles solutions are
    • install condor-lcg-1.1.0-1
    • set GT2_GAHP = /opt/condor-7.4.1/sbin/gahp_server and GRID_MONITOR = /opt/condor-7.4.1/libexec/glite/grid_monitor.sh on /opt/condor-c/local.<$HOSTNAME>/condor_config.local
      Waiting for Marteen investigation.... FIXED Decided to use the second option. Changes done in yaim-wms.
  18. BUG #73192 Submission failed:
    [ale@egee-rb-03 UI]$ glite-wms-job-submit -a --config ~/UI/etc/wmp_cream-03.conf jdl/env.jdl 
    Connecting to the service https://cream-03.pd.infn.it:7443/glite_wms_wmproxy_server
    
    Warning - Unable to register the job to the service: https://cream-03.pd.infn.it:7443/glite_wms_wmproxy_server
    LB: :2652
    Set logging job failed
    edg_wll_SetLoggingJob
    LB[Proxy] Error: GSSAPI Error
    (failed to load GSI credentials: GSS Major Status: General failure
     (GSS Minor Status Error Chain:
    globus_gsi_gssapi: Error with GSI credential
    globus_gsi_gssapi: Error with gss credential handle
    globus_credential: Valid credentials could not be found in any of the possible locations specified by the credential search order.
    Valid credentials could not be found in any of the possible locations specified by the credential search order.
    Possibles (ugly) workarounds to solve this problem are:
    • chown root.glite /etc/grid-security/host*.pem
    • cp /etc/grid-security/hostcert.pem /home/glite/.globus/usercert.pem and cp /etc/grid-security/hostkey.pem /home/glite/.globus/userkey.pem
      FIXED using new LB client (patch 4423)
  19. WM required huge amount of memory:
    PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND  
    15007 glite     25   0 2657m 2.8g 8320 S  0.0 77.0   2:34.79 glite-wms-workl 
    Monitoring.. Google malloc is used by default and correctly set by yaim
  20. *** glite-lb-bkserverd:
    Starting glite-lb-bkserver ...Warning: MySQL library version mismatch (compiled '50045', runtime '50077')
     done
    Salvet said: Installed MySQL library is slightly newer than library used for glite-lb-bkserver build. This is normal. No Problem
  21. Bug #73206 Collection doesn't work. The only suspicious messages are in /var/log/messages:
     Sep 10 15:50:48 cream-44 glite_wms_wmproxy_server[6179]: ts=2010-09-10T13:50:48Z : event=wms.wmpserver_setJobFileSystem() : userid=18118 jobid=https://devel07.cnaf.infn.it:9000/pvRt15v6hiIBbEKj0UeeAw
    Sep 10 15:50:48 cream-44 glite_wms_wmproxy_server[6179]: ts=2010-09-10T13:50:48Z : event=wms.wmpserver_setSubjobFileSystem() : userid=18118 jobid=https://devel07.cnaf.infn.it:9000/pvRt15v6hiIBbEKj0UeeAw
    Sep 10 15:50:50 cream-44 glite_wms_wmproxy_server[6179]: ts=2010-09-10T13:50:50Z : event=wms.wmpserver_submit() : userid=18118 jobid=https://devel07.cnaf.infn.it:9000/pvRt15v6hiIBbEKj0UeeAw
    Sep 10 15:50:50 cream-44 kernel: glite_wms_wmpro[6179] general protection rip:3e86c797e0 rsp:7fff60c65148 error:0
    FIXED
  22. Bug #72970 I'm running on the WMS an lbserver in "both" mode:
     9197 ?        S      0:00 /opt/glite/bin/glite-lb-bkserverd --notif-il-sock=/tmp/glite-lb-notif.sock --notif-il-fprefix=/var/tmp/glite-lb-notif -c /home/glite/.certs/hostcert.pem -k /home/glite/.certs/hostkey.pem -i /var/glite/glite-lb-bkserverd.pid --dump-prefix /var/glite/dump --purge-prefix /var/glite/purge -B --proxy-il-sock /tmp/glite-lbproxy-ilog.sock --proxy-il-fprefix /tmp/glite-lbproxy-ilog_events --policy /opt/glite/etc/glite-lb/glite-lb-authz.conf
    Then I started locallogger:
    [root@devel08 glite]# /opt/glite/etc/init.d/glite-lb-locallogger start
    Starting glite-lb-logd ...This is LocalLogger, part of Workload Management System in EU DataGrid & EGEE.
     done
    Starting glite-lb-interlogd ... done
    but:
    [root@devel08 glite]# /opt/glite/etc/init.d/glite-lb-locallogger status
    glite-lb-logd not running
    instead:
    [root@devel08 ~]# ps ax | grep lb-log
    10049 ?        Ss     0:00 /opt/glite/bin/glite-lb-logd -i /var/glite/glite-lb-logd.pid -c /home/glite/.certs/hostcert.pem -k /home/glite/.certs/hostkey.pem
    12135 pts/0    S+     0:00 grep lb-log
    FIXED Using new LB tag (path 4423)
  23. Problem during installation:
    Error: Missing Dependency: org.glite.build.common-cpp >= 3.2.1 is needed by package glite-security-lcmaps-without-gsi-1.4.8-5.sl5.x86_64 (ETICS-volatile-build-5b42070f-c48b-4e7b-9819-48810222a0b3-sl5_x86_64_gcc412)
    Error: Missing Dependency: mod_fastcgi >= 2.4.3 is needed by package glite-wms-wmproxy-3.3.0-3.sl5.x86_64 (ETICS-volatile-build-5b42070f-c48b-4e7b-9819-48810222a0b3-sl5_x86_64_gcc412)
    Error: Missing Dependency: mod_fastcgi >= 2.4.3 is needed by package glite-WMS-3.3.0-0.sl5.x86_64 (ETICS-volatile-build-5b42070f-c48b-4e7b-9819-48810222a0b3-sl5_x86_64_gcc412)
    two packages missing... FIXED
  24. Resubmission for jobs submitted with -r option is not done. Jobs are aborted with "hit job shallow retry count (0)" even if ShallowRetryCount was set in the JDL FIXED .
  25. Problems with DAG. DAG job reported as failed:
    Current Status:     Done (Exit Code !=0)
    Exit code:          1
    Status Reason:      Warning: job exit code != 0
    Destination:        dagman
    
    and nodes stuck in "Submitted". FIXED .
  26. BUG #75223 FIXED When a job failed logged reason is wrong
    Event: Done
    - Arrived                    =    Thu Nov 11 14:06:54 2010 CET
    - Exit code                  =    1
    - Host                       =    cream-46.pd.infn.it
    - Reason                     =    LM_log_done_beginThu Nov 11 14:03:17 CET 2010: prologue failed with error 1
    - Source                     =    LogMonitor
    - Src instance               =    unique
    - Status code                =    FAILED
    - Timestamp                  =    Thu Nov 11 14:06:54 2010 CET
    Another type of error:
    Event: Done
    - Arrived                    =    Fri Nov 26 12:15:30 2010 CET
    - Exit code                  =    0
    - Host                       =    pamelawn23.na.infn.it
    - Reason                     =    Fri Nov 26 12:13:45 CET 2010: Cannot
    - Source                     =    LRMS
    - Status code                =    OK
    - Timestamp                  =    Fri Nov 26 12:13:46 2010 CET
    - User                       =    /C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Alessio Gianelle
  27. Problem installing new build (07/11/1010) FIXED .
      --> Missing Dependency: mod_fastcgi >= 2.4.3 is needed by package glite-wms-wmproxy-3.3.0-4.sl5.x86_64 (ETICS-name-patch_2876_1)
  28. Problem starting bdii:
    Starting BDII slapd: Traceback (most recent call last):
      File "/usr/sbin/bdii-update", line 821, in ?
        create_daemon(config['BDII_LOG_FILE'])
      File "/usr/sbin/bdii-update", line 168, in create_daemom
        e = os.open(log_file, os.O_WRONLY | os.O_APPEND | os.O_CREAT, 0644)
    OSError: [Errno 13] Permission denied: '/var/log/bdii/bdii-update.log'
                                                               [  OK  ]
    BDII update process failed to startStarting BDII update pro[FAILED]
    Be sure that the installed version cames from etics repo: bdii-5.0.9-1
  29. BUG #75099 FIXED .
    2010-11-09 13:01:07,649 DEBUG - iceCommandEventQuery::execute() -  TID=[203357600] Database  ID=[1265375986000]
    2010-11-09 13:01:07,650 DEBUG - iceCommandEventQuery::execute() -  TID=[203357600] Exec time ID=[6]
    2010-11-09 13:01:07,651 INFO - scoped_timer iceCommandEventQuery::processEvents() - TID=[203357600] All Events Proc Time 1289304067.650547 1289304067.651501 0.000954
    terminate called after throwing an instance of 'std::out_of_range'
      what():  basic_string::at
    Aborted
    It happen when shallow resubmission is set to -1.
  30. Submission failed Probably FIXED using a new LB server (2.1)
    Warning - Unable to register the job to the service: https://cream-46.pd.infn.it:7443/glite_wms_wmproxy_server
    HTTP Error
    <html><head>
    <title>500 Internal Server Error</title>
    </head><body>
    <h1>Internal Server Error</h1>
    <p>The server encountered an internal error or
    misconfiguration and was unable to complete
    your request.</p>
    <p>Please contact the server administrator,
     [no address given] and inform them of the time the error occurred,
    and anything you might have done that may have
    caused the error.</p>
    <p>More information about this error may be available
    in the server error log.</p>
    </body></html>
    
    Error code: SOAP-ENV:Server
    Looking into the /var/log/messages file of the WMS you should read:
    Nov 10 17:10:44 cream-46 kernel: glite_wms_wmpro[15571]: segfault at 0000000000000000 rip 00002b6b3ca32548 rsp 00007fffa962f320 error 4
    Instead the wmproxy log says:
    10 Nov, 17:10:35 -S- PID: 15571 - "WMPEventlogger::registerJob": Register job failed
    edg_wll_RegisterJobProxy
    Exit code: 22
    LB[Proxy] Error: Invalid argument
    (edg_wll_RegisterJobMaster(): unable to register job
    Resource temporarily unavailable;; Logging library ERROR: 
    Resource temporarily unavailable;; edg_wll_DoLogEventServer(): edg_wll_log_direct_read error
    LB server (bkserver,lbproxy) store protocol error;; edg_wll_log_proto_client_direct(): error reading answer from L&B direct server
    LB server (bkserver,lbproxy) store protocol error;; get_reply_gss(): error reading reply
    LB server (bkserver,lbproxy) store protocol error;; gss_reader(): error reading message
    Transport endpoint is not connected;; edg_wll_gss_read_full;; GSS Error: EOF occured;)
    The httpd-wmproxy-errors.log says:
    [Wed Nov 10 17:10:44 2010] [error] [client 193.206.210.108] FastCGI: incomplete headers (0 bytes) received from server "/opt/glite/bin/glite_wms_wmproxy_server"
  31. Cleared event not logged FIXED .
    [ale@cream-03 ~]$ glite-wms-job-output https://devel07.cnaf.infn.it:9000/IvaPn8c4ezVSiHbQP8FJOg
    
    Connecting to the service https://cream-46.pd.infn.it:7443/glite_wms_wmproxy_server
    
    Warning - JobPurging not allowed
     (The Operation is not allowed: Unable to complete job purge)
    You need to set the WMS DN in both ACTION "READ_ALL" and "LOG_WMS_EVENTS" of glite-lb-authz.conf file of the LB Server
  32. Cleared event not logged FIXED (2011/01/19).
    [ale@cream-03 ~]$ glite-wms-job-output https://devel07.cnaf.infn.it:9000/Hn0VS-oMpYlT7bdAIXNB7g
    
    Connecting to the service https://cream-46.pd.infn.it:7443/glite_wms_wmproxy_server
    
    Warning - JobPurging not allowed
     (Proxy exception: The delegated Proxy has expired)
    This happen when the job proxy has expired.
  33. Bug #75368 FIXED A "DONE OK" job is marked as "ABORTED". The problem is that a failed job should not been aborted (resubmission is possibile only for Done Failed jobs). Another case:
     Event: Abort
    - Arrived                    =    Wed Nov 17 13:31:54 2010 CET
    - Host                       =    cream-46.pd.infn.it
    - Reason                     =    BLAH error: submission command failed (exit code = 1) (stdout:) (stderr:Connection timed out-qsub: cannot connect to server gridba3.ba.infn.it (errno=110) Connection timed out-TERM environment variable not set.-) N/A (jobId = CREAM809848230)
    - Source                     =    LogMonitor
    - Timestamp                  =    Wed Nov 17 13:31:53 2010 CET
    - User                       =    /C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Alessio Gianelle/CN=proxy/CN=proxy
       ---
    Event: Resubmission
    - Arrived                    =    Wed Nov 17 13:31:54 2010 CET
    - Host                       =    cream-46.pd.infn.it
    - Reason                     =    Job resubmitted by ICE
    - Result                     =    WILLRESUB
    - Source                     =    LogMonitor
    - Tag                        =    unavailable
    - Timestamp                  =    Wed Nov 17 13:31:54 2010 CET
    - User                       =    /C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Alessio Gianelle/CN=proxy/CN=proxy
  34. St9exception FIXED (2011/02/01). After the "get-output" of a dag, the nodes are in "DONE-OK" state. If you try a get-output at this point for a node you have (see also 53.):
     [ale@cream-03 ~]$ glite-wms-job-output https://devel17.cnaf.infn.it:9000/-ZpRldEJcENxhNiwMqidiw
    
    Connecting to the service https://cream-46.pd.infn.it:7443/glite_wms_wmproxy_server
    
    Error - getOutputFileList Error
     (St9exception)
    Instead the right message should be found in the wmproxy log file:
    19 Jan, 12:55:46 -D- PID: 22916 - "wmpgsoapoperations::ns1__getOutputFileList": JobId Exception: The Operation is not allowed: The job has not been registered from this Workload Manager Proxy server (https://cream-46.pd.infn.it:7443/glite_wms_wmproxy_server) or it has been purged
    19 Jan, 12:55:46 -D- PID: 22916 - "wmpgsoapoperations::ns1__getOutputFileList": ------------------------------- Fault Description --------------------------------
    19 Jan, 12:55:46 -D- PID: 22916 - "wmpgsoapoperations::ns1__getOutputFileList": Method: getOutputFileList
    19 Jan, 12:55:46 -D- PID: 22916 - "wmpgsoapoperations::ns1__getOutputFileList": Code: 1202
    19 Jan, 12:55:46 -D- PID: 22916 - "wmpgsoapoperations::ns1__getOutputFileList": Description: St9exception
    19 Jan, 12:55:46 -D- PID: 22916 - "wmpgsoapoperations::ns1__getOutputFileList": Stack: 
    19 Jan, 12:55:46 -D- PID: 22916 - "wmpgsoapoperations::ns1__getOutputFileList": ----------------------------------------------------------------------------------
    19 Jan, 12:55:46 -D- PID: 22916 - "wmpgsoapoperations::ns1__getOutputFileList": getOutputFileList operation completed
  35. The state of the nodes doesn't change after the output retrieval of the parent of a collection. Bug #77876 FIXED (2011/02/14)
  36. Dag job doesn't work FIXED . Probably the name of the token is wrongly set.
  37. Bug #73715 FIXED the Really Running event is not logged from JC/LM (instead it works for ICE).
  38. The "type" attribute is case sensitive FIXED installing the new jdl's rpm on the UI
    Error - AdSyntaxException
    The following parsing error(s) have been found:
    'node_type' must be "dag"
  39. Bug #75402 FIXED To verify with the new tag Synchronization loss between real validity of proxy and exp. time saved in ICE's database; this can happen when the copy of the new proxy fails
    2010-11-15 10:57:41,869 INFO - DNProxyManager::setUserProxyIfLonger_Legacy() - Setting user proxy to [ /var/glite/SandboxDir/jI/https_3a_2f_2fdevel17.cnaf.infn.it_3a9000_2fjIwx1BAneLSLa93u3CEeAQ/user.proxy] copied to /var/glite/ice/persist_dir/B23D0D7177A8B6234F1985493FA09FF41A4FA98C.proxy] because the old one is less long-lived.
    2010-11-15 10:57:42,019 ERROR - DNProxyManager::setUserProxyIfLonger_Legacy() - Error copying proxy [/var/glite/SandboxDir/jI/https_3a_2f_2fdevel17.cnaf.infn.it_3a9000_2fjIwx1BAneLSLa93u3CEeAQ/user.proxy] to [/var/glite/ice/persist_dir/B23D0D7177A8B6234F1985493FA09FF41A4FA98C.proxy].
    2010-11-15 10:57:42,019 DEBUG - DNProxyManager::setUserProxyIfLonger_Legacy() - New proxy [/var/glite/SandboxDir/jI/https_3a_2f_2fdevel17.cnaf.infn.it_3a9000_2fjIwx1BAneLSLa93u3CEeAQ/user.proxy] has been copied into [/var/glite/ice/persist_dir/B23D0D7177A8B6234F1985493FA09FF41A4FA98C.proxy] - New Expiration Time is [Tue Nov 16 10:56:34 2010]
  40. Problem with job's proxy expired in ice FIXED To verify with the new tag . When proxy expired the jobs in ice queue are not correctly removed
    2010-11-17 10:08:37,269 DEBUG - iceCommandLBLogging::execute - TID=[] Will not LOG anything to LB for Job [https://devel17.cnaf.infn.it:9000/ZSwHnJmAMblpLYKwz69mFQ] for reason: CreamJobID [CREAM898633002] disappeared from ICE database !
  41. Ice log Cancel events with wrong sequence code FIXED (2011-01-19)
  42. Ice Aborted TO Monitoring It happen that ICE aborted with these messages (see also 49.):
    t1169246528:p13952: Fatal error: [Thread System] GLOBUSTHREAD: pthread_mutex_destroy() failed
    
    [Thread System] mutex is locked (EBUSY)t1295124800:p13952: Fatal error: [Thread System] GLOBUSTHREAD: globus_thread_setspecific() failed
    or
    t1313102144:p12551: Fatal error: [Thread System] GLOBUSTHREAD: globus_thread_setspecific() failed
    
    [Thread System] invalid value passed to thread interface (EINVAL)Aborted (core dumped)
  43. glite-wms-ice-db-rm, various errors FIXED (2011/01/19)
    [root@cream-46 persist_dir]# /opt/glite/bin/glite-wms-ice-db-rm -h
    /opt/glite/bin/glite-wms-ice-db-rm: invalid option -- h
    Type /opt/glite/bin/glite-wms-ice-db-rm -h for help
    
    [root@cream-46 persist_dir]# /opt/glite/bin/glite-wms-ice-db-rm
    Must specify at least one of the options --from-file <pathfile> or <gridjobid>
    
    [root@cream-46 persist_dir]# /opt/glite/bin/glite-wms-ice-db-rm --from-file a
    /opt/glite/bin/glite-wms-ice-db-rm: unrecognized option `--from-file' 
  44. feedback doesn't work FIXED The "old" token is not removed!
    [root@devel09 ~]# ls -l  /var/glite/SandboxDir/B4/https_3a_2f_2fdevel17.cnaf.infn.it_3a9000_2fB4ni_5fIMNzn6cY6W8llClSA/
    total 20
    drwxrwx--- 2 dteam035 glite 4096 Nov 30 10:17 input
    -rw-r--r-- 1 dteam035 dteam  133 Nov 30 10:41 Maradona.output
    drwxrwx--- 2 dteam035 glite 4096 Nov 30 10:17 output
    drwxrwx--- 2 dteam035 glite 4096 Nov 30 10:17 peek
    -rw-r--r-- 1 glite    glite    0 Nov 30 10:19 token.txt_0_
    -rw-r--r-- 1 glite    glite    0 Nov 30 10:17 token.txt_1
  45. Misleading messages in Maradona file: FIXED (2011/02/02)
    LM_log_done_beginThu Dec  2 12:08:53 CET 2010: 0
    LM_log_done_end
    jw exit status = 1
    Then the job fails with this message:
            - Standard output does not contain useful data.Cannot read JobWrapper output, both from Condor and from Maradona.
    Thu Dec  2 12:08:53 CET 2010: 0LM_log_done_beginThu Dec  2 12:08:53 CET 2010: 0LM_log_done_end
  46. Feedback doesn't work with LCG-CE, so these CEs must be excluded in the MM FIXED (2011/02/10) Proposed fix is that yaim set:
    WmsRequirements  = ((ShortDeadlineJob =?= TRUE) ? RegExp(".*sdj$", other.GlueCEUniqueID) : !RegExp(".*sdj$", other.GlueCEUniqueID)) && (other.GlueCEPolicyMaxTotalJobs == 0 || other.GlueCEStateTotalJobs < other.GlueCEPolicyMaxTotalJobs) && ((EnableWmsFeedback =?= TRUE) ? (RegExp("cream", other.GlueCEImplementationName, "i")) : true)
  47. WM defines a rescheduled request as a "Submission" FIXED (2011/01/25)
  48. Bug: #76097 During the first mm of a node the "UserTag CEInfoHostName" is not logged FIXED
  49. Both WM and ICE give on high load: t1168148800:p30771: Fatal error: [Thread System] GLOBUSTHREAD: pthread_mutex_lock() failed. Supposedly fixed by the 'newest' GPT in glite 32. Pay attention. (see also 42.)
  50. repo 110114 Yaim fails FIXED (2011/01/31):
    cp: cannot stat `/opt/glite/sbin/glite_wms_wmproxy_load_monitor.template': No such file or directory
    chmod: cannot access `/opt/glite/sbin/glite_wms_wmproxy_load_monitor': No such file or directory
  51. repo 110114 Submission failed FIXED Problem was with fast-cgi in gsoap (2011/01/27) :
    Status: 500 Internal Server Error
    Server: gSOAP/2.7
    Content-Type: text/xml; charset=utf-8
    Content-Length: 831
    Connection: close
    
    <?xml version="1.0" encoding="UTF-8"?>
    <SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:jsdl="http://schemas.ggf.org/jsdl/2005/11/jsdl" xmlns:jsdlposix="http://schemas.ggf.org/jsdl/2005/11/jsdl-posix" xmlns:delegation1="http://www.gridsite.org/namespaces/delegation-1" xmlns:delegationns="http://www.gridsite.org/namespaces/delegation-2" xmlns:ns1="http://glite.org/wms/wmproxy"><SOAP-ENV:Body><SOAP-ENV:Fault SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"><faultcode>SOAP-ENV:Client</faultcode><faultstring>End of file or no input: 'Invalid argument'</faultstring></SOAP-ENV:Fault></SOAP-ENV:Body></SOAP-ENV:Envelope>[Fri Jan 14 16:45:46 2011] [error] [client 193.206.210.108] FastCGI: incomplete headers (0 bytes) received from server "/opt/glite/bin/glite_wms_wmproxy_server"
  52. repo 110114 Cannot take shallow resubmission token FIXED (2011/01/31). Infact in the sandbox dir of the job the token is named: token.txt_0. (note the dot)
  53. repo 110114 Misleading output message from UI FIXED (2011/02/01) When a user try to retrieve the output files of a job of another user the message is (see also 34.):
     
    [ale@cream-03 UI]$ glite-wms-job-output https://devel17.cnaf.infn.it:9000/cxyj4EodCZx7qY2vretM6g
    
    Connecting to the service https://cream-46.pd.infn.it:7443/glite_wms_wmproxy_server
    
    Error - getOutputFileList Error
     (St9exception)
    Instead in the wmproxy log file you should find the right reason:
    19 Jan, 12:16:55 -D- PID: 17941 - "wmpgsoapoperations::ns1__getOutputFileList": JobId Exception: User not authorized to perform this operation
    19 Jan, 12:16:55 -D- PID: 17941 - "wmpgsoapoperations::ns1__getOutputFileList": ------------------------------- Fault Description --------------------------------
    19 Jan, 12:16:55 -D- PID: 17941 - "wmpgsoapoperations::ns1__getOutputFileList": Method: getOutputFileList
    19 Jan, 12:16:55 -D- PID: 17941 - "wmpgsoapoperations::ns1__getOutputFileList": Code: 1202
    19 Jan, 12:16:55 -D- PID: 17941 - "wmpgsoapoperations::ns1__getOutputFileList": Description: St9exception
    19 Jan, 12:16:55 -D- PID: 17941 - "wmpgsoapoperations::ns1__getOutputFileList": Stack: 
    19 Jan, 12:16:55 -D- PID: 17941 - "wmpgsoapoperations::ns1__getOutputFileList": ----------------------------------------------------------------------------------
    19 Jan, 12:16:55 -D- PID: 17941 - "wmpgsoapoperations::ns1__getOutputFileList": getOutputFileList operation completed
  54. repo 110114 deep resubmission doesn't work FIXED (2011/02/07) In fact when the old token is grabbed, wm doesn't create the new one
     31 Jan, 16:37:21 -I: [Info] operator()(dispatcher_utils.cpp:228): new jobresubmit for https://cream-44.pd.infn.it:9000/Ham9zx-h36iNR0FVtjT9Ng
    31 Jan, 16:37:21 -E: [Error] operator()(submit_request.cpp:536): cannot rename temporary token /var/glite/SandboxDir/Ha/https_3a_2f_2fcream-44.pd.infn.it_3a9000_2fHam9zx-h36iNR0FVtjT9Ng/token.txt_0. (error 2)
    31 Jan, 16:37:21 -I: [Info] postpone(submit_request.cpp:227): postponing https://cream-44.pd.infn.it:9000/Ham9zx-h36iNR0FVtjT9Ng (cannot create token for shallow resubmission)
  55. repo 110114 Bug: #77366: Sometimes submission failed for LB error Hopefully FIXED (2011/02/10) :
    Warning - Unable to register the job to the service: https://cream-46.pd.infn.it:7443/glite_wms_wmproxy_server
    Register COLLECTIONfailed to LB server:devel17.cnaf.infn.it:9000
    edg_wll_RegisterJobProxy/Sync
    Exit code: 22
    LB[Proxy] Error: Invalid argument
    (edg_wll_RegisterJobMaster(): unable to register job
    Resource temporarily unavailable;; Logging library ERROR: 
    Resource temporarily unavailable;; edg_wll_DoLogEventServer(): edg_wll_log_direct_read error
    LB server (bkserver,lbproxy) store protocol error;; edg_wll_log_proto_client_direct(): error reading answer from L&B direct server
    LB server (bkserver,lbproxy) store protocol error;; get_reply_gss(): error reading reply
    LB server (bkserver,lbproxy) store protocol error;; gss_reader(): error reading message
    Transport endpoint is not connected;; edg_wll_gss_read_full;; GSS Error: EOF occured;)
    
    Method: jobRegister
    
    
    Error - Operation failed
    Unable to find any endpoint where to perform service request
  56. repo 110114 Bug #77055 FIXED (2011/01/31) "MyProxyServer: wrong type caught for attribute" for parametric jobs
  57. repo 110114 Feedback doesn't work FIXED (2011/02/07)
    20 Jan, 13:28:13 -I: [Info] operator()(replanner.cpp:226): created replanning request for job https://cream-46.pd.infn.it:9000/P4kfEQQnyw3g9NL7qyfA3Q with token /var/glite/SandboxDir/P4/https_3a_2f_2fcream-46.pd.infn.it_3a9000_2fP4kfEQQnyw3g9NL7qyfA3Q/token.txt_1
    20 Jan, 13:28:14 -I: [Info] operator()(dispatcher_utils.cpp:310): cannot create LB context (2) for [...] 
  58. repo 110114 Before resubmission ICE must be sure that the job's proxy should be valid for at least "n" minutes. Hopefully FIXED (2011/02/10)
  59. WMS repo 110131 & LB patch #4623 Yaim error: sed: can't read /opt/glite/etc/gip/glite-info-generic.conf: No such file or directory FIXED (2011/02/07) Probably this package: glite-info-generic is missing. After configuration we have:
    [root@cream-44 ~]# ls -l /opt/glite/etc/gip/glite-info-generic.conf
    -rw-r--r-- 1 root root 0 Jan 31 15:45 /opt/glite/etc/gip/glite-info-generic.conf
  60. repo 110204 FIXED (2011/02/14) Cron Purger doesn't work:
    [root@cream-46 ~]#   /opt/glite/sbin/glite-wms-purgeStorage --help
    glite-wms-purgeStorage: glite-wms-purgeStorage.cpp:87: bool<unnamed>::lb_proxy(): Assertion `f_conf' failed.
    Aborted
  61. repo 110204 NOT FIXED (2011/02/11) All the queries done for replanning have timeout (open bug #78047)
    10 Feb, 10:35:20 -D: [Debug] get_scheduled_jobs(lb_utils.cpp:153): error (110) while querying for scheduled jobs
    10 Feb, 10:35:20 -D: [Debug] operator()(replanner.cpp:382): no jobs in scheduled state for more than 1800 seconds for replanning 
  62. repo 110204 NOT FIXED (2011/02/11) If a collection is aborted for "request expired" the nodes don't change status. The problem is due to LB query: (probably it is tied to point 61.)
    10 Feb, 07:01:48 -E: [Error] unrecoverable_collection(submit_request.cpp:108): https://cream-46.pd.infn.it:9000/MRy3d5geSzmep67b0VXV1A: unable to retrieve children information from jobstatus
    10 Feb, 07:01:48 -E: [Error] unrecoverable(submit_request.cpp:126): https://cream-46.pd.infn.it:9000/MRy3d5geSzmep67b0VXV1A failed (request expired) 
  63. repo 110204 NOT FIXED (2011/02/11) Submitting a collection with 192 nodes you obtaing this error (see bug #70061):
    
    Status info for the Job : https://cream-44.pd.infn.it:9000/Tck2cFcOKuOjM4gfH9Ermg
    Current Status:     Waiting
    Status Reason:      jobid: unable to complete the operation: the attribute has not been initialised yet
    Submitted:          Fri Feb 11 17:06:23 2011 CET 
  64. repo 110214 NOT FIXED (2011/02/14) glite-WMS metapackage is wrong (too much dependencies and forget google-perftools)
  65. repo 110214 NOT FIXED (2011/02/14) . There is a mistake in /opt/glite/yaim/functions/config_info_service_wms (forget a "\" at line 86), yaim reports this message:
    sed: -e expression #3, char 84: unknown option to `s'
  66. repo 110214 NOT FIXED (2011/02/14) When you cancel a collection some nodes (the ones which where in state Running or DONE OK) are put in state "Cleared".
  67. repo 110214 NOT FIXED (2011/02/14) Wm dies with these messages:
    14 Feb, 16:05:46 -E: [Error] unrecoverable(submit_request.cpp:126): https://devel09.cnaf.infn.it:9000/3hTRNHN65Dv64kYDLKjZkQ failed (hit job shallow retry count (2))
    14 Feb, 16:05:46 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:107): MM for job: https://devel09.cnaf.infn.it:9000/dsX5dYARJ2rAnk--uyYcJg (19/2976 [0] )
    14 Feb, 16:05:46 -E: [Error] handle_synch_signal(signal_handling.cpp:77): Got a synchronous signal (11), stack trace:
    /opt/glite/bin/glite-wms-workload_manager
    /lib64/libpthread.so.0
    classad::EvalState::SetRootScope()
    classad::ClassAd::EvaluateAttr(std::string const&, classad::Value&) const
    glite::wmsutils::classads::evaluate_attribute(classad::ClassAd const&, std::string const&)
    /opt/glite/lib64/libglite_wms_matchmaking.so.0(_ZN5glite3wms11matchmaking17matchmakerISMImpl16checkRequirementERN7classad7Class
    glite::wms::broker::RBSimpleISMImpl::findSuitableCEs(classad::ClassAd const*)
    glite::wms::broker::ResourceBroker::findSuitableCEs(classad::ClassAd const*)
    /opt/glite/lib64/libglite_wms_helper_broker_ism.so
    glite::wms::helper::broker::Helper::resolve(classad::ClassAd const*, boost::shared_ptr<std::string>) const
    glite::wms::helper::Helper::resolve(classad::ClassAd const*, boost::shared_ptr<std::string>) const
    glite::wms::helper::RequestStateMachine::next_step(classad::ClassAd const*, boost::shared_ptr<std::string>)
    glite::wms::helper::Request::Impl::resolve()
    glite::wms::manager::server::Plan(classad::ClassAd const&, boost::shared_ptr<std::string>)
    glite::wms::manager::server::WMReal::submit(classad::ClassAd const&, boost::shared_ptr<_edg_wll_Context>, boost::shared_ptr<std::string>, bool)
    glite::wms::manager::server::SubmitProcessor::operator()()
    boost::function0<void, std::allocator<void> >::operator()() const
    glite::wms::manager::server::Events::run()
    boost::function0<void, std::allocator<boost::function_base> >::operator()() const
    /usr/lib64/libboost_thread.so.2
 

-- AlessioGianelle - 2010-09-08

Deleted:
<
<
META FILEATTACHMENT attachment="status.txt" attr="" comment="Problem 33: status" date="1289471718" name="status.txt" path="status.txt" size="2301" stream="status.txt" tmpFilename="/usr/tmp/CGItemp16567" user="AlessioGianelle" version="1"
META FILEATTACHMENT attachment="loginfo.txt" attr="" comment="Problem 33: loginfo" date="1289471741" name="loginfo.txt" path="loginfo.txt" size="42700" stream="loginfo.txt" tmpFilename="/usr/tmp/CGItemp13678" user="AlessioGianelle" version="1"

Revision 732011-02-14 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"
Line: 456 to 456
 [root@cream-46 ~]# /opt/glite/sbin/glite-wms-purgeStorage --help glite-wms-purgeStorage: glite-wms-purgeStorage.cpp:87: bool::lb_proxy(): Assertion `f_conf' failed. Aborted
Changed:
<
<
  1. repo 110204 FIXED (2011/02/11) All the queries done for replanning have timeout (open bug #78047)
>
>
  1. repo 110204 NOT FIXED (2011/02/11) All the queries done for replanning have timeout (open bug #78047)
 10 Feb, 10:35:20 -D: [Debug] get_scheduled_jobs(lb_utils.cpp:153): error (110) while querying for scheduled jobs 10 Feb, 10:35:20 -D: [Debug] operator()(replanner.cpp:382): no jobs in scheduled state for more than 1800 seconds for replanning
Changed:
<
<
  1. repo 110204 FIXED (2011/02/11) If a collection is aborted for "request expired" the nodes don't change status. The problem is due to LB query: (probably it is tied to point 61.)
>
>
  1. repo 110204 NOT FIXED (2011/02/11) If a collection is aborted for "request expired" the nodes don't change status. The problem is due to LB query: (probably it is tied to point 61.)
 10 Feb, 07:01:48 -E: [Error] unrecoverable_collection(submit_request.cpp:108): https://cream-46.pd.infn.it:9000/MRy3d5geSzmep67b0VXV1A: unable to retrieve children information from jobstatus 10 Feb, 07:01:48 -E: [Error] unrecoverable(submit_request.cpp:126): https://cream-46.pd.infn.it:9000/MRy3d5geSzmep67b0VXV1A failed (request expired)
Changed:
<
<
  1. repo 110204 FIXED (2011/02/11) Submitting a collection with 192 nodes you obtaing this error (see bug #70061):
>
>
  1. repo 110204 NOT FIXED (2011/02/11) Submitting a collection with 192 nodes you obtaing this error (see bug #70061):
  Status info for the Job : https://cream-44.pd.infn.it:9000/Tck2cFcOKuOjM4gfH9Ermg Current Status: Waiting
Line: 472 to 472
 
  1. repo 110214 NOT FIXED (2011/02/14) . There is a mistake in /opt/glite/yaim/functions/config_info_service_wms (forget a "\" at line 86), yaim reports this message:
    sed: -e expression #3, char 84: unknown option to `s'
  2. repo 110214 NOT FIXED (2011/02/14) When you cancel a collection some nodes (the ones which where in state Running or DONE OK) are put in state "Cleared".
Added:
>
>
  1. repo 110214 NOT FIXED (2011/02/14) Wm dies with these messages:
    14 Feb, 16:05:46 -E: [Error] unrecoverable(submit_request.cpp:126): https://devel09.cnaf.infn.it:9000/3hTRNHN65Dv64kYDLKjZkQ failed (hit job shallow retry count (2))
    14 Feb, 16:05:46 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:107): MM for job: https://devel09.cnaf.infn.it:9000/dsX5dYARJ2rAnk--uyYcJg (19/2976 [0] )
    14 Feb, 16:05:46 -E: [Error] handle_synch_signal(signal_handling.cpp:77): Got a synchronous signal (11), stack trace:
    /opt/glite/bin/glite-wms-workload_manager
    /lib64/libpthread.so.0
    classad::EvalState::SetRootScope()
    classad::ClassAd::EvaluateAttr(std::string const&, classad::Value&) const
    glite::wmsutils::classads::evaluate_attribute(classad::ClassAd const&, std::string const&)
    /opt/glite/lib64/libglite_wms_matchmaking.so.0(_ZN5glite3wms11matchmaking17matchmakerISMImpl16checkRequirementERN7classad7Class
    glite::wms::broker::RBSimpleISMImpl::findSuitableCEs(classad::ClassAd const*)
    glite::wms::broker::ResourceBroker::findSuitableCEs(classad::ClassAd const*)
    /opt/glite/lib64/libglite_wms_helper_broker_ism.so
    glite::wms::helper::broker::Helper::resolve(classad::ClassAd const*, boost::shared_ptr<std::string>) const
    glite::wms::helper::Helper::resolve(classad::ClassAd const*, boost::shared_ptr<std::string>) const
    glite::wms::helper::RequestStateMachine::next_step(classad::ClassAd const*, boost::shared_ptr<std::string>)
    glite::wms::helper::Request::Impl::resolve()
    glite::wms::manager::server::Plan(classad::ClassAd const&, boost::shared_ptr<std::string>)
    glite::wms::manager::server::WMReal::submit(classad::ClassAd const&, boost::shared_ptr<_edg_wll_Context>, boost::shared_ptr<std::string>, bool)
    glite::wms::manager::server::SubmitProcessor::operator()()
    boost::function0<void, std::allocator<void> >::operator()() const
    glite::wms::manager::server::Events::run()
    boost::function0<void, std::allocator<boost::function_base> >::operator()() const
    /usr/lib64/libboost_thread.so.2
 

-- AlessioGianelle - 2010-09-08

Revision 722011-02-14 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"
Line: 471 to 471
 
  1. repo 110214 NOT FIXED (2011/02/14) glite-WMS metapackage is wrong (too much dependencies and forget google-perftools)
  2. repo 110214 NOT FIXED (2011/02/14) . There is a mistake in /opt/glite/yaim/functions/config_info_service_wms (forget a "\" at line 86), yaim reports this message:
    sed: -e expression #3, char 84: unknown option to `s'
Added:
>
>
  1. repo 110214 NOT FIXED (2011/02/14) When you cancel a collection some nodes (the ones which where in state Running or DONE OK) are put in state "Cleared".
 

-- AlessioGianelle - 2010-09-08

Revision 712011-02-14 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"
Line: 339 to 339
 19 Jan, 12:55:46 -D- PID: 22916 - "wmpgsoapoperations::ns1__getOutputFileList": Stack: 19 Jan, 12:55:46 -D- PID: 22916 - "wmpgsoapoperations::ns1__getOutputFileList": ---------------------------------------------------------------------------------- 19 Jan, 12:55:46 -D- PID: 22916 - "wmpgsoapoperations::ns1__getOutputFileList": getOutputFileList operation completed
Changed:
<
<
  1. The state of the nodes doesn't change after the output retrieval of the parent of a collection. Bug #77876 NOT FIXED (2011/02/01)
>
>
  1. The state of the nodes doesn't change after the output retrieval of the parent of a collection. Bug #77876 FIXED (2011/02/14)
 
  1. Dag job doesn't work FIXED . Probably the name of the token is wrongly set.
  2. Bug #73715 FIXED the Really Running event is not logged from JC/LM (instead it works for ICE).
  3. The "type" attribute is case sensitive FIXED installing the new jdl's rpm on the UI

Revision 702011-02-14 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"
Line: 452 to 452
 
  1. WMS repo 110131 & LB patch #4623 Yaim error: sed: can't read /opt/glite/etc/gip/glite-info-generic.conf: No such file or directory FIXED (2011/02/07) Probably this package: glite-info-generic is missing. After configuration we have:
    [root@cream-44 ~]# ls -l /opt/glite/etc/gip/glite-info-generic.conf
    -rw-r--r-- 1 root root 0 Jan 31 15:45 /opt/glite/etc/gip/glite-info-generic.conf
Changed:
<
<
  1. repo 110204 NOT FIXED (2011/02/07) Cron Purger doesn't work:
>
>
  1. repo 110204 FIXED (2011/02/14) Cron Purger doesn't work:
 [root@cream-46 ~]# /opt/glite/sbin/glite-wms-purgeStorage --help glite-wms-purgeStorage: glite-wms-purgeStorage.cpp:87: bool::lb_proxy(): Assertion `f_conf' failed. Aborted
Added:
>
>
  1. repo 110204 FIXED (2011/02/11) All the queries done for replanning have timeout (open bug #78047)
    10 Feb, 10:35:20 -D: [Debug] get_scheduled_jobs(lb_utils.cpp:153): error (110) while querying for scheduled jobs
    10 Feb, 10:35:20 -D: [Debug] operator()(replanner.cpp:382): no jobs in scheduled state for more than 1800 seconds for replanning 
  2. repo 110204 FIXED (2011/02/11) If a collection is aborted for "request expired" the nodes don't change status. The problem is due to LB query: (probably it is tied to point 61.)
    10 Feb, 07:01:48 -E: [Error] unrecoverable_collection(submit_request.cpp:108): https://cream-46.pd.infn.it:9000/MRy3d5geSzmep67b0VXV1A: unable to retrieve children information from jobstatus
    10 Feb, 07:01:48 -E: [Error] unrecoverable(submit_request.cpp:126): https://cream-46.pd.infn.it:9000/MRy3d5geSzmep67b0VXV1A failed (request expired) 
  3. repo 110204 FIXED (2011/02/11) Submitting a collection with 192 nodes you obtaing this error (see bug #70061):
    
    Status info for the Job : https://cream-44.pd.infn.it:9000/Tck2cFcOKuOjM4gfH9Ermg
    Current Status:     Waiting
    Status Reason:      jobid: unable to complete the operation: the attribute has not been initialised yet
    Submitted:          Fri Feb 11 17:06:23 2011 CET 
  4. repo 110214 NOT FIXED (2011/02/14) glite-WMS metapackage is wrong (too much dependencies and forget google-perftools)
  5. repo 110214 NOT FIXED (2011/02/14) . There is a mistake in /opt/glite/yaim/functions/config_info_service_wms (forget a "\" at line 86), yaim reports this message:
    sed: -e expression #3, char 84: unknown option to `s'
  -- AlessioGianelle - 2010-09-08

Revision 692011-02-10 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"
Line: 387 to 387
  Then the job fails with this message:
        - Standard output does not contain useful data.Cannot read JobWrapper output, both from Condor and from Maradona.
Thu Dec  2 12:08:53 CET 2010: 0LM_log_done_beginThu Dec  2 12:08:53 CET 2010: 0LM_log_done_end
Changed:
<
<
  1. Feedback doesn't work with LCG-CE, so these CEs must be excluded in the MM NOT FIXED (2011/01/20) Proposed fix is that yaim set:
>
>
  1. Feedback doesn't work with LCG-CE, so these CEs must be excluded in the MM FIXED (2011/02/10) Proposed fix is that yaim set:
 WmsRequirements = ((ShortDeadlineJob ? TRUE) ? RegExp(".*sdj$", other.GlueCEUniqueID) : RegExp(".*sdj$", other.GlueCEUniqueID)) && (other.GlueCEPolicyMaxTotalJobs = 0 || other.GlueCEStateTotalJobs < other.GlueCEPolicyMaxTotalJobs) && ((EnableWmsFeedback =? TRUE) ? (RegExp("cream", other.GlueCEImplementationName, "i")) : true)
  1. WM defines a rescheduled request as a "Submission" FIXED (2011/01/25)
  2. Bug: #76097 During the first mm of a node the "UserTag CEInfoHostName" is not logged FIXED
Line: 425 to 425
  31 Jan, 16:37:21 -I: [Info] operator()(dispatcher_utils.cpp:228): new jobresubmit for https://cream-44.pd.infn.it:9000/Ham9zx-h36iNR0FVtjT9Ng 31 Jan, 16:37:21 -E: [Error] operator()(submit_request.cpp:536): cannot rename temporary token /var/glite/SandboxDir/Ha/https_3a_2f_2fcream-44.pd.infn.it_3a9000_2fHam9zx-h36iNR0FVtjT9Ng/token.txt_0. (error 2) 31 Jan, 16:37:21 -I: [Info] postpone(submit_request.cpp:227): postponing https://cream-44.pd.infn.it:9000/Ham9zx-h36iNR0FVtjT9Ng (cannot create token for shallow resubmission)
Changed:
<
<
  1. repo 110114 Bug: #77366: Sometimes submission failed for LB error NOT FIXED (2011/01/19) :
>
>
  1. repo 110114 Bug: #77366: Sometimes submission failed for LB error Hopefully FIXED (2011/02/10) :
 Warning - Unable to register the job to the service: https://cream-46.pd.infn.it:7443/glite_wms_wmproxy_server Register COLLECTIONfailed to LB server:devel17.cnaf.infn.it:9000 edg_wll_RegisterJobProxy/Sync
Line: 448 to 448
 
  1. repo 110114 Feedback doesn't work FIXED (2011/02/07)
    20 Jan, 13:28:13 -I: [Info] operator()(replanner.cpp:226): created replanning request for job https://cream-46.pd.infn.it:9000/P4kfEQQnyw3g9NL7qyfA3Q with token /var/glite/SandboxDir/P4/https_3a_2f_2fcream-46.pd.infn.it_3a9000_2fP4kfEQQnyw3g9NL7qyfA3Q/token.txt_1
    20 Jan, 13:28:14 -I: [Info] operator()(dispatcher_utils.cpp:310): cannot create LB context (2) for [...] 
Changed:
<
<
  1. repo 110114 Before resubmission ICE must be sure that the job's proxy should be valid for at least "n" minutes. NOT FIXED (2011/01/21)
>
>
  1. repo 110114 Before resubmission ICE must be sure that the job's proxy should be valid for at least "n" minutes. Hopefully FIXED (2011/02/10)
 
  1. WMS repo 110131 & LB patch #4623 Yaim error: sed: can't read /opt/glite/etc/gip/glite-info-generic.conf: No such file or directory FIXED (2011/02/07) Probably this package: glite-info-generic is missing. After configuration we have:
    [root@cream-44 ~]# ls -l /opt/glite/etc/gip/glite-info-generic.conf
    -rw-r--r-- 1 root root 0 Jan 31 15:45 /opt/glite/etc/gip/glite-info-generic.conf

Revision 682011-02-07 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"
Line: 339 to 339
 19 Jan, 12:55:46 -D- PID: 22916 - "wmpgsoapoperations::ns1__getOutputFileList": Stack: 19 Jan, 12:55:46 -D- PID: 22916 - "wmpgsoapoperations::ns1__getOutputFileList": ---------------------------------------------------------------------------------- 19 Jan, 12:55:46 -D- PID: 22916 - "wmpgsoapoperations::ns1__getOutputFileList": getOutputFileList operation completed
Changed:
<
<
  1. The state of the nodes doesn't change after the output retrieval of the parent of a collection. NOT FIXED (2011/02/01)
>
>
  1. The state of the nodes doesn't change after the output retrieval of the parent of a collection. Bug #77876 NOT FIXED (2011/02/01)
 
  1. Dag job doesn't work FIXED . Probably the name of the token is wrongly set.
  2. Bug #73715 FIXED the Really Running event is not logged from JC/LM (instead it works for ICE).
  3. The "type" attribute is case sensitive FIXED installing the new jdl's rpm on the UI

Revision 672011-02-07 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"
Line: 452 to 452
 
  1. WMS repo 110131 & LB patch #4623 Yaim error: sed: can't read /opt/glite/etc/gip/glite-info-generic.conf: No such file or directory FIXED (2011/02/07) Probably this package: glite-info-generic is missing. After configuration we have:
    [root@cream-44 ~]# ls -l /opt/glite/etc/gip/glite-info-generic.conf
    -rw-r--r-- 1 root root 0 Jan 31 15:45 /opt/glite/etc/gip/glite-info-generic.conf
Added:
>
>
  1. repo 110204 NOT FIXED (2011/02/07) Cron Purger doesn't work:
    [root@cream-46 ~]#   /opt/glite/sbin/glite-wms-purgeStorage --help
    glite-wms-purgeStorage: glite-wms-purgeStorage.cpp:87: bool<unnamed>::lb_proxy(): Assertion `f_conf' failed.
    Aborted
  -- AlessioGianelle - 2010-09-08

Revision 662011-02-07 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"
Line: 353 to 353
 
  1. Problem with job's proxy expired in ice FIXED To verify with the new tag . When proxy expired the jobs in ice queue are not correctly removed
    2010-11-17 10:08:37,269 DEBUG - iceCommandLBLogging::execute - TID=[] Will not LOG anything to LB for Job [https://devel17.cnaf.infn.it:9000/ZSwHnJmAMblpLYKwz69mFQ] for reason: CreamJobID [CREAM898633002] disappeared from ICE database !
  2. Ice log Cancel events with wrong sequence code FIXED (2011-01-19)
Changed:
<
<
  1. Ice Aborted NOT FIXED It happen that ICE aborted with these messages (see also 49.):
>
>
  1. Ice Aborted TO Monitoring It happen that ICE aborted with these messages (see also 49.):
 t1169246528:p13952: Fatal error: [Thread System] GLOBUSTHREAD: pthread_mutex_destroy() failed

[Thread System] mutex is locked (EBUSY)t1295124800:p13952: Fatal error: [Thread System] GLOBUSTHREAD: globus_thread_setspecific() failed

Line: 371 to 371
  [root@cream-46 persist_dir]# /opt/glite/bin/glite-wms-ice-db-rm --from-file a /opt/glite/bin/glite-wms-ice-db-rm: unrecognized option `--from-file'
Changed:
<
<
  1. feedback doesn't work FIXED to check The "old" token is not removed!
>
>
  1. feedback doesn't work FIXED The "old" token is not removed!
 [root@devel09 ~]# ls -l /var/glite/SandboxDir/B4/https_3a_2f_2fdevel17.cnaf.infn.it_3a9000_2fB4ni_5fIMNzn6cY6W8llClSA/ total 20 drwxrwx--- 2 dteam035 glite 4096 Nov 30 10:17 input
Line: 387 to 387
  Then the job fails with this message:
        - Standard output does not contain useful data.Cannot read JobWrapper output, both from Condor and from Maradona.
Thu Dec  2 12:08:53 CET 2010: 0LM_log_done_beginThu Dec  2 12:08:53 CET 2010: 0LM_log_done_end
Changed:
<
<
  1. Feedback doesn't work with LCG-CE, so these CEs must be excluded in the MM NOT FIXED (2011/01/20) Proposed fix is that yiaim set:
>
>
  1. Feedback doesn't work with LCG-CE, so these CEs must be excluded in the MM NOT FIXED (2011/01/20) Proposed fix is that yaim set:
 WmsRequirements = ((ShortDeadlineJob ? TRUE) ? RegExp(".*sdj$", other.GlueCEUniqueID) : RegExp(".*sdj$", other.GlueCEUniqueID)) && (other.GlueCEPolicyMaxTotalJobs = 0 || other.GlueCEStateTotalJobs < other.GlueCEPolicyMaxTotalJobs) && ((EnableWmsFeedback =? TRUE) ? (RegExp("cream", other.GlueCEImplementationName, "i")) : true)
  1. WM defines a rescheduled request as a "Submission" FIXED (2011/01/25)
  2. Bug: #76097 During the first mm of a node the "UserTag CEInfoHostName" is not logged FIXED
Changed:
<
<
  1. Both WM and ICE give on high load: t1168148800:p30771: Fatal error: [Thread System] GLOBUSTHREAD: pthread_mutex_lock() failed. Supposedly fixed by the 'newest' GPT in glite 32. Pay attention. (see also 42.)
>
>
  1. Both WM and ICE give on high load: t1168148800:p30771: Fatal error: [Thread System] GLOBUSTHREAD: pthread_mutex_lock() failed. Supposedly fixed by the 'newest' GPT in glite 32. Pay attention. (see also 42.)
 
  1. repo 110114 Yaim fails FIXED (2011/01/31):
    cp: cannot stat `/opt/glite/sbin/glite_wms_wmproxy_load_monitor.template': No such file or directory
    chmod: cannot access `/opt/glite/sbin/glite_wms_wmproxy_load_monitor': No such file or directory
Line: 421 to 421
 19 Jan, 12:16:55 -D- PID: 17941 - "wmpgsoapoperations::ns1__getOutputFileList": Stack: 19 Jan, 12:16:55 -D- PID: 17941 - "wmpgsoapoperations::ns1__getOutputFileList": ---------------------------------------------------------------------------------- 19 Jan, 12:16:55 -D- PID: 17941 - "wmpgsoapoperations::ns1__getOutputFileList": getOutputFileList operation completed
Changed:
<
<
  1. repo 110114 deep resubmission doesn't work NOT FIXED (2011/01/31) In fact when the old token is grabbed, wm doesn't create the new one
>
>
  1. repo 110114 deep resubmission doesn't work FIXED (2011/02/07) In fact when the old token is grabbed, wm doesn't create the new one
  31 Jan, 16:37:21 -I: [Info] operator()(dispatcher_utils.cpp:228): new jobresubmit for https://cream-44.pd.infn.it:9000/Ham9zx-h36iNR0FVtjT9Ng 31 Jan, 16:37:21 -E: [Error] operator()(submit_request.cpp:536): cannot rename temporary token /var/glite/SandboxDir/Ha/https_3a_2f_2fcream-44.pd.infn.it_3a9000_2fHam9zx-h36iNR0FVtjT9Ng/token.txt_0. (error 2) 31 Jan, 16:37:21 -I: [Info] postpone(submit_request.cpp:227): postponing https://cream-44.pd.infn.it:9000/Ham9zx-h36iNR0FVtjT9Ng (cannot create token for shallow resubmission)
Line: 445 to 445
 Error - Operation failed Unable to find any endpoint where to perform service request
  1. repo 110114 Bug #77055 FIXED (2011/01/31) "MyProxyServer: wrong type caught for attribute" for parametric jobs
Changed:
<
<
  1. repo 110114 Feedback doesn't work NOT FIXED (2011/01/20)
>
>
  1. repo 110114 Feedback doesn't work FIXED (2011/02/07)
 20 Jan, 13:28:13 -I: [Info] operator()(replanner.cpp:226): created replanning request for job https://cream-46.pd.infn.it:9000/P4kfEQQnyw3g9NL7qyfA3Q with token /var/glite/SandboxDir/P4/https_3a_2f_2fcream-46.pd.infn.it_3a9000_2fP4kfEQQnyw3g9NL7qyfA3Q/token.txt_1 20 Jan, 13:28:14 -I: [Info] operator()(dispatcher_utils.cpp:310): cannot create LB context (2) for [...]
  1. repo 110114 Before resubmission ICE must be sure that the job's proxy should be valid for at least "n" minutes. NOT FIXED (2011/01/21)
Changed:
<
<
  1. WMS repo 110131 & LB patch #4623 Yaim error: sed: can't read /opt/glite/etc/gip/glite-info-generic.conf: No such file or directory NOT FIXED (2011/01/31) Probably this package: glite-info-generic is missing. After configuration we have:
>
>
  1. WMS repo 110131 & LB patch #4623 Yaim error: sed: can't read /opt/glite/etc/gip/glite-info-generic.conf: No such file or directory FIXED (2011/02/07) Probably this package: glite-info-generic is missing. After configuration we have:
 [root@cream-44 ~]# ls -l /opt/glite/etc/gip/glite-info-generic.conf -rw-r--r-- 1 root root 0 Jan 31 15:45 /opt/glite/etc/gip/glite-info-generic.conf

Revision 652011-02-02 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"
Line: 380 to 380
 drwxrwx--- 2 dteam035 glite 4096 Nov 30 10:17 peek -rw-r--r-- 1 glite glite 0 Nov 30 10:19 token.txt_0_ -rw-r--r-- 1 glite glite 0 Nov 30 10:17 token.txt_1
Changed:
<
<
  1. Misleading messages in Maradona file: NOT FIXED
>
>
  1. Misleading messages in Maradona file: FIXED (2011/02/02)
 LM_log_done_beginThu Dec 2 12:08:53 CET 2010: 0 LM_log_done_end jw exit status = 1 Then the job fails with this message:
        - Standard output does not contain useful data.Cannot read JobWrapper output, both from Condor and from Maradona.
Thu Dec  2 12:08:53 CET 2010: 0LM_log_done_beginThu Dec  2 12:08:53 CET 2010: 0LM_log_done_end
Changed:
<
<
  1. Feedback doesn't work with LCG-CE, so these CEs must be excluded in the MM NOT FIXED (2011/01/20)
>
>
  1. Feedback doesn't work with LCG-CE, so these CEs must be excluded in the MM NOT FIXED (2011/01/20) Proposed fix is that yiaim set:
    WmsRequirements  = ((ShortDeadlineJob =?= TRUE) ? RegExp(".*sdj$", other.GlueCEUniqueID) : !RegExp(".*sdj$", other.GlueCEUniqueID)) && (other.GlueCEPolicyMaxTotalJobs == 0 || other.GlueCEStateTotalJobs < other.GlueCEPolicyMaxTotalJobs) && ((EnableWmsFeedback =?= TRUE) ? (RegExp("cream", other.GlueCEImplementationName, "i")) : true)
 
  1. WM defines a rescheduled request as a "Submission" FIXED (2011/01/25)
  2. Bug: #76097 During the first mm of a node the "UserTag CEInfoHostName" is not logged FIXED
  3. Both WM and ICE give on high load: t1168148800:p30771: Fatal error: [Thread System] GLOBUSTHREAD: pthread_mutex_lock() failed. Supposedly fixed by the 'newest' GPT in glite 32. Pay attention. (see also 42.)

Revision 642011-02-01 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"
Line: 339 to 339
 19 Jan, 12:55:46 -D- PID: 22916 - "wmpgsoapoperations::ns1__getOutputFileList": Stack: 19 Jan, 12:55:46 -D- PID: 22916 - "wmpgsoapoperations::ns1__getOutputFileList": ---------------------------------------------------------------------------------- 19 Jan, 12:55:46 -D- PID: 22916 - "wmpgsoapoperations::ns1__getOutputFileList": getOutputFileList operation completed
Changed:
<
<
  1. The state of the nodes doesn't change neither after the output retrieval of the parent nor after the cancellation of the parent NOT FIXED (2011/01/19)
>
>
  1. The state of the nodes doesn't change after the output retrieval of the parent of a collection. NOT FIXED (2011/02/01)
 
  1. Dag job doesn't work FIXED . Probably the name of the token is wrongly set.
  2. Bug #73715 FIXED the Really Running event is not logged from JC/LM (instead it works for ICE).
  3. The "type" attribute is case sensitive FIXED installing the new jdl's rpm on the UI

Revision 632011-02-01 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"
Line: 323 to 323
 - Tag = unavailable - Timestamp = Wed Nov 17 13:31:54 2010 CET - User = /C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Alessio Gianelle/CN=proxy/CN=proxy
Changed:
<
<
  1. St9exception NOT FIXED (2011/01/19). After the "get-output" of a dag, the nodes are in "DONE-OK" state. If you try a get-output at this point for a node you have (see also 53.):
>
>
  1. St9exception FIXED (2011/02/01). After the "get-output" of a dag, the nodes are in "DONE-OK" state. If you try a get-output at this point for a node you have (see also 53.):
  [ale@cream-03 ~]$ glite-wms-job-output https://devel17.cnaf.infn.it:9000/-ZpRldEJcENxhNiwMqidiw

Connecting to the service https://cream-46.pd.infn.it:7443/glite_wms_wmproxy_server

Line: 405 to 405
  <SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:jsdl="http://schemas.ggf.org/jsdl/2005/11/jsdl" xmlns:jsdlposix="http://schemas.ggf.org/jsdl/2005/11/jsdl-posix" xmlns:delegation1="http://www.gridsite.org/namespaces/delegation-1" xmlns:delegationns="http://www.gridsite.org/namespaces/delegation-2" xmlns:ns1="http://glite.org/wms/wmproxy"><SOAP-ENV:Body><SOAP-ENV:Fault SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">SOAP-ENV:ClientEnd of file or no input: 'Invalid argument'</SOAP-ENV:Fault></SOAP-ENV:Body></SOAP-ENV:Envelope>[Fri Jan 14 16:45:46 2011] [error] [client 193.206.210.108] FastCGI: incomplete headers (0 bytes) received from server "/opt/glite/bin/glite_wms_wmproxy_server"
  1. repo 110114 Cannot take shallow resubmission token FIXED (2011/01/31). Infact in the sandbox dir of the job the token is named: token.txt_0. (note the dot)
Changed:
<
<
  1. repo 110114 Misleading output message from UI NOT FIXED (2011/01/19) When a user try to retrieve the output files of a job of another user the message is (see also 34.):
     
>
>
  1. repo 110114 Misleading output message from UI FIXED (2011/02/01) When a user try to retrieve the output files of a job of another user the message is (see also 34.):
     
 [ale@cream-03 UI]$ glite-wms-job-output https://devel17.cnaf.infn.it:9000/cxyj4EodCZx7qY2vretM6g

Connecting to the service https://cream-46.pd.infn.it:7443/glite_wms_wmproxy_server

Revision 622011-02-01 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"
Line: 449 to 449
 20 Jan, 13:28:13 -I: [Info] operator()(replanner.cpp:226): created replanning request for job https://cream-46.pd.infn.it:9000/P4kfEQQnyw3g9NL7qyfA3Q with token /var/glite/SandboxDir/P4/https_3a_2f_2fcream-46.pd.infn.it_3a9000_2fP4kfEQQnyw3g9NL7qyfA3Q/token.txt_1 20 Jan, 13:28:14 -I: [Info] operator()(dispatcher_utils.cpp:310): cannot create LB context (2) for [...]
  1. repo 110114 Before resubmission ICE must be sure that the job's proxy should be valid for at least "n" minutes. NOT FIXED (2011/01/21)
Changed:
<
<
  1. WMS repo 110131 & LB patch #4623 Yaim error: sed: can't read /opt/glite/etc/gip/glite-info-generic.conf: No such file or directory NOT FIXED (2011/01/31) After configuration we have:
>
>
  1. WMS repo 110131 & LB patch #4623 Yaim error: sed: can't read /opt/glite/etc/gip/glite-info-generic.conf: No such file or directory NOT FIXED (2011/01/31) Probably this package: glite-info-generic is missing. After configuration we have:
 [root@cream-44 ~]# ls -l /opt/glite/etc/gip/glite-info-generic.conf -rw-r--r-- 1 root root 0 Jan 31 15:45 /opt/glite/etc/gip/glite-info-generic.conf

Revision 612011-01-31 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"
Line: 392 to 392
 
  1. WM defines a rescheduled request as a "Submission" FIXED (2011/01/25)
  2. Bug: #76097 During the first mm of a node the "UserTag CEInfoHostName" is not logged FIXED
  3. Both WM and ICE give on high load: t1168148800:p30771: Fatal error: [Thread System] GLOBUSTHREAD: pthread_mutex_lock() failed. Supposedly fixed by the 'newest' GPT in glite 32. Pay attention. (see also 42.)
Changed:
<
<
  1. repo 110114 Yaim fails NOT FIXED (2011/01/14):
>
>
  1. repo 110114 Yaim fails FIXED (2011/01/31):
 cp: cannot stat `/opt/glite/sbin/glite_wms_wmproxy_load_monitor.template': No such file or directory chmod: cannot access `/opt/glite/sbin/glite_wms_wmproxy_load_monitor': No such file or directory
  1. repo 110114 Submission failed FIXED Problem was with fast-cgi in gsoap (2011/01/27) :
Line: 404 to 404
  <SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:jsdl="http://schemas.ggf.org/jsdl/2005/11/jsdl" xmlns:jsdlposix="http://schemas.ggf.org/jsdl/2005/11/jsdl-posix" xmlns:delegation1="http://www.gridsite.org/namespaces/delegation-1" xmlns:delegationns="http://www.gridsite.org/namespaces/delegation-2" xmlns:ns1="http://glite.org/wms/wmproxy"><SOAP-ENV:Body><SOAP-ENV:Fault SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">SOAP-ENV:ClientEnd of file or no input: 'Invalid argument'</SOAP-ENV:Fault></SOAP-ENV:Body></SOAP-ENV:Envelope>[Fri Jan 14 16:45:46 2011] [error] [client 193.206.210.108] FastCGI: incomplete headers (0 bytes) received from server "/opt/glite/bin/glite_wms_wmproxy_server"
Changed:
<
<
  1. repo 110114 Cannot take shallow resubmission token NOT FIXED (2011/01/19). Infact in the sandbox dir of the job the token is named: token.txt_0. (note the dot)
>
>
  1. repo 110114 Cannot take shallow resubmission token FIXED (2011/01/31). Infact in the sandbox dir of the job the token is named: token.txt_0. (note the dot)
 
  1. repo 110114 Misleading output message from UI NOT FIXED (2011/01/19) When a user try to retrieve the output files of a job of another user the message is (see also 34.):
     
    [ale@cream-03 UI]$ glite-wms-job-output https://devel17.cnaf.infn.it:9000/cxyj4EodCZx7qY2vretM6g
    
Line: 421 to 421
 19 Jan, 12:16:55 -D- PID: 17941 - "wmpgsoapoperations::ns1__getOutputFileList": Stack: 19 Jan, 12:16:55 -D- PID: 17941 - "wmpgsoapoperations::ns1__getOutputFileList": ---------------------------------------------------------------------------------- 19 Jan, 12:16:55 -D- PID: 17941 - "wmpgsoapoperations::ns1__getOutputFileList": getOutputFileList operation completed
Changed:
<
<
  1. repo 110114 deep resubmission doesn't work NOT FIXED (2011/01/19) In fact when the old token is grabbed, wm doesn't create the new one
>
>
  1. repo 110114 deep resubmission doesn't work NOT FIXED (2011/01/31) In fact when the old token is grabbed, wm doesn't create the new one
     31 Jan, 16:37:21 -I: [Info] operator()(dispatcher_utils.cpp:228): new jobresubmit for https://cream-44.pd.infn.it:9000/Ham9zx-h36iNR0FVtjT9Ng
    31 Jan, 16:37:21 -E: [Error] operator()(submit_request.cpp:536): cannot rename temporary token /var/glite/SandboxDir/Ha/https_3a_2f_2fcream-44.pd.infn.it_3a9000_2fHam9zx-h36iNR0FVtjT9Ng/token.txt_0. (error 2)
    31 Jan, 16:37:21 -I: [Info] postpone(submit_request.cpp:227): postponing https://cream-44.pd.infn.it:9000/Ham9zx-h36iNR0FVtjT9Ng (cannot create token for shallow resubmission)
 
  1. repo 110114 Bug: #77366: Sometimes submission failed for LB error NOT FIXED (2011/01/19) :
    Warning - Unable to register the job to the service: https://cream-46.pd.infn.it:7443/glite_wms_wmproxy_server
    Register COLLECTIONfailed to LB server:devel17.cnaf.infn.it:9000
Line: 441 to 444
  Error - Operation failed Unable to find any endpoint where to perform service request
Changed:
<
<
  1. repo 110114 Bug #77055 NOT FIXED (2011/01/20) "MyProxyServer: wrong type caught for attribute" for parametric jobs
>
>
  1. repo 110114 Bug #77055 FIXED (2011/01/31) "MyProxyServer: wrong type caught for attribute" for parametric jobs
 
  1. repo 110114 Feedback doesn't work NOT FIXED (2011/01/20)
    20 Jan, 13:28:13 -I: [Info] operator()(replanner.cpp:226): created replanning request for job https://cream-46.pd.infn.it:9000/P4kfEQQnyw3g9NL7qyfA3Q with token /var/glite/SandboxDir/P4/https_3a_2f_2fcream-46.pd.infn.it_3a9000_2fP4kfEQQnyw3g9NL7qyfA3Q/token.txt_1
    20 Jan, 13:28:14 -I: [Info] operator()(dispatcher_utils.cpp:310): cannot create LB context (2) for [...] 
  2. repo 110114 Before resubmission ICE must be sure that the job's proxy should be valid for at least "n" minutes. NOT FIXED (2011/01/21)
Changed:
<
<
>
>
  1. WMS repo 110131 & LB patch #4623 Yaim error: sed: can't read /opt/glite/etc/gip/glite-info-generic.conf: No such file or directory NOT FIXED (2011/01/31) After configuration we have:
    [root@cream-44 ~]# ls -l /opt/glite/etc/gip/glite-info-generic.conf
    -rw-r--r-- 1 root root 0 Jan 31 15:45 /opt/glite/etc/gip/glite-info-generic.conf
  -- AlessioGianelle - 2010-09-08

Revision 602011-01-27 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"
Line: 323 to 323
 - Tag = unavailable - Timestamp = Wed Nov 17 13:31:54 2010 CET - User = /C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Alessio Gianelle/CN=proxy/CN=proxy
Changed:
<
<
  1. St9exception NOT FIXED (2011/01/19). After the "get-output" of a dag, the nodes are in "DONE-OK" state. If you try a get-output at this point for a node you have:
>
>
  1. St9exception NOT FIXED (2011/01/19). After the "get-output" of a dag, the nodes are in "DONE-OK" state. If you try a get-output at this point for a node you have (see also 53.):
  [ale@cream-03 ~]$ glite-wms-job-output https://devel17.cnaf.infn.it:9000/-ZpRldEJcENxhNiwMqidiw

Connecting to the service https://cream-46.pd.infn.it:7443/glite_wms_wmproxy_server

Line: 354 to 354
 
  1. Problem with job's proxy expired in ice FIXED To verify with the new tag . When proxy expired the jobs in ice queue are not correctly removed
    2010-11-17 10:08:37,269 DEBUG - iceCommandLBLogging::execute - TID=[] Will not LOG anything to LB for Job [https://devel17.cnaf.infn.it:9000/ZSwHnJmAMblpLYKwz69mFQ] for reason: CreamJobID [CREAM898633002] disappeared from ICE database !
  2. Ice log Cancel events with wrong sequence code FIXED (2011-01-19)
Changed:
<
<
  1. Ice Aborted NOT FIXED It happen that ICE aborted with these messages:
>
>
  1. Ice Aborted NOT FIXED It happen that ICE aborted with these messages (see also 49.):
 t1169246528:p13952: Fatal error: [Thread System] GLOBUSTHREAD: pthread_mutex_destroy() failed

[Thread System] mutex is locked (EBUSY)t1295124800:p13952: Fatal error: [Thread System] GLOBUSTHREAD: globus_thread_setspecific() failed

Line: 391 to 391
 
  1. Feedback doesn't work with LCG-CE, so these CEs must be excluded in the MM NOT FIXED (2011/01/20)
  2. WM defines a rescheduled request as a "Submission" FIXED (2011/01/25)
  3. Bug: #76097 During the first mm of a node the "UserTag CEInfoHostName" is not logged FIXED
Changed:
<
<
  1. Both WM and ICE give on high load: t1168148800:p30771: Fatal error: [Thread System] GLOBUSTHREAD: pthread_mutex_lock() failed. Supposedly fixed by the 'newest' GPT in glite 32. Pay attention.
>
>
  1. Both WM and ICE give on high load: t1168148800:p30771: Fatal error: [Thread System] GLOBUSTHREAD: pthread_mutex_lock() failed. Supposedly fixed by the 'newest' GPT in glite 32. Pay attention. (see also 42.)
 
  1. repo 110114 Yaim fails NOT FIXED (2011/01/14):
    cp: cannot stat `/opt/glite/sbin/glite_wms_wmproxy_load_monitor.template': No such file or directory
    chmod: cannot access `/opt/glite/sbin/glite_wms_wmproxy_load_monitor': No such file or directory
Changed:
<
<
  1. repo 110114 Submission failed Understood... (2011/01/17) :
>
>
  1. repo 110114 Submission failed FIXED Problem was with fast-cgi in gsoap (2011/01/27) :
 Status: 500 Internal Server Error Server: gSOAP/2.7 Content-Type: text/xml; charset=utf-8
Line: 422 to 422
 19 Jan, 12:16:55 -D- PID: 17941 - "wmpgsoapoperations::ns1__getOutputFileList": ---------------------------------------------------------------------------------- 19 Jan, 12:16:55 -D- PID: 17941 - "wmpgsoapoperations::ns1__getOutputFileList": getOutputFileList operation completed
  1. repo 110114 deep resubmission doesn't work NOT FIXED (2011/01/19) In fact when the old token is grabbed, wm doesn't create the new one
Changed:
<
<
  1. repo 110114 Sometimes submission failed for LB error NOT FIXED (2011/01/19) :
>
>
  1. repo 110114 Bug: #77366: Sometimes submission failed for LB error NOT FIXED (2011/01/19) :
 Warning - Unable to register the job to the service: https://cream-46.pd.infn.it:7443/glite_wms_wmproxy_server Register COLLECTIONfailed to LB server:devel17.cnaf.infn.it:9000 edg_wll_RegisterJobProxy/Sync

Revision 592011-01-25 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"
Line: 392 to 392
 
  1. WM defines a rescheduled request as a "Submission" FIXED (2011/01/25)
  2. Bug: #76097 During the first mm of a node the "UserTag CEInfoHostName" is not logged FIXED
  3. Both WM and ICE give on high load: t1168148800:p30771: Fatal error: [Thread System] GLOBUSTHREAD: pthread_mutex_lock() failed. Supposedly fixed by the 'newest' GPT in glite 32. Pay attention.
Changed:
<
<
  1. Yaim fails (builds 20110114) NOT FIXED (2011/01/14):
>
>
  1. repo 110114 Yaim fails NOT FIXED (2011/01/14):
 cp: cannot stat `/opt/glite/sbin/glite_wms_wmproxy_load_monitor.template': No such file or directory chmod: cannot access `/opt/glite/sbin/glite_wms_wmproxy_load_monitor': No such file or directory
Changed:
<
<
  1. Submission failed (builds 20110114) Understood... (2011/01/17) :
>
>
  1. repo 110114 Submission failed Understood... (2011/01/17) :
 Status: 500 Internal Server Error Server: gSOAP/2.7 Content-Type: text/xml; charset=utf-8
Line: 404 to 404
  <SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:jsdl="http://schemas.ggf.org/jsdl/2005/11/jsdl" xmlns:jsdlposix="http://schemas.ggf.org/jsdl/2005/11/jsdl-posix" xmlns:delegation1="http://www.gridsite.org/namespaces/delegation-1" xmlns:delegationns="http://www.gridsite.org/namespaces/delegation-2" xmlns:ns1="http://glite.org/wms/wmproxy"><SOAP-ENV:Body><SOAP-ENV:Fault SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">SOAP-ENV:ClientEnd of file or no input: 'Invalid argument'</SOAP-ENV:Fault></SOAP-ENV:Body></SOAP-ENV:Envelope>[Fri Jan 14 16:45:46 2011] [error] [client 193.206.210.108] FastCGI: incomplete headers (0 bytes) received from server "/opt/glite/bin/glite_wms_wmproxy_server"
Changed:
<
<
  1. Cannot take shallow resubmission token NOT FIXED (2011/01/19). Infact in the sandbox dir of the job the token is named: token.txt_0. (note the dot)
  2. Misleading output message from UI NOT FIXED (2011/01/19) When a user try to retrieve the output files of a job of another user the message is (see also 34.):
     
>
>
  1. repo 110114 Cannot take shallow resubmission token NOT FIXED (2011/01/19). Infact in the sandbox dir of the job the token is named: token.txt_0. (note the dot)
  2. repo 110114 Misleading output message from UI NOT FIXED (2011/01/19) When a user try to retrieve the output files of a job of another user the message is (see also 34.):
     
 [ale@cream-03 UI]$ glite-wms-job-output https://devel17.cnaf.infn.it:9000/cxyj4EodCZx7qY2vretM6g

Connecting to the service https://cream-46.pd.infn.it:7443/glite_wms_wmproxy_server

Line: 421 to 421
 19 Jan, 12:16:55 -D- PID: 17941 - "wmpgsoapoperations::ns1__getOutputFileList": Stack: 19 Jan, 12:16:55 -D- PID: 17941 - "wmpgsoapoperations::ns1__getOutputFileList": ---------------------------------------------------------------------------------- 19 Jan, 12:16:55 -D- PID: 17941 - "wmpgsoapoperations::ns1__getOutputFileList": getOutputFileList operation completed
Changed:
<
<
  1. deep resubmission doesn't work NOT FIXED (2011/01/19) In fact when the old token is grabbed, wm doesn't create the new one
  2. Sometimes submission failed for LB error NOT FIXED (2011/01/19) :
>
>
  1. repo 110114 deep resubmission doesn't work NOT FIXED (2011/01/19) In fact when the old token is grabbed, wm doesn't create the new one
  2. repo 110114 Sometimes submission failed for LB error NOT FIXED (2011/01/19) :
 Warning - Unable to register the job to the service: https://cream-46.pd.infn.it:7443/glite_wms_wmproxy_server Register COLLECTIONfailed to LB server:devel17.cnaf.infn.it:9000 edg_wll_RegisterJobProxy/Sync
Line: 441 to 441
  Error - Operation failed Unable to find any endpoint where to perform service request
Changed:
<
<
  1. Bug #77055 NOT FIXED (2011/01/20) "MyProxyServer: wrong type caught for attribute" for parametric jobs
  2. Feedback doesn't work NOT FIXED (2011/01/20)
>
>
  1. repo 110114 Bug #77055 NOT FIXED (2011/01/20) "MyProxyServer: wrong type caught for attribute" for parametric jobs
  2. repo 110114 Feedback doesn't work NOT FIXED (2011/01/20)
 20 Jan, 13:28:13 -I: [Info] operator()(replanner.cpp:226): created replanning request for job https://cream-46.pd.infn.it:9000/P4kfEQQnyw3g9NL7qyfA3Q with token /var/glite/SandboxDir/P4/https_3a_2f_2fcream-46.pd.infn.it_3a9000_2fP4kfEQQnyw3g9NL7qyfA3Q/token.txt_1 20 Jan, 13:28:14 -I: [Info] operator()(dispatcher_utils.cpp:310): cannot create LB context (2) for [...]
Changed:
<
<
  1. Before resubmission ICE must be sure that the job's proxy should be valid for at least "n" minutes. NOT FIXED (2011/01/21)
>
>
  1. repo 110114 Before resubmission ICE must be sure that the job's proxy should be valid for at least "n" minutes. NOT FIXED (2011/01/21)
 

-- AlessioGianelle - 2010-09-08

Revision 582011-01-25 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"
Line: 389 to 389
  - Standard output does not contain useful data.Cannot read JobWrapper output, both from Condor and from Maradona. Thu Dec 2 12:08:53 CET 2010: 0LM_log_done_beginThu Dec 2 12:08:53 CET 2010: 0LM_log_done_end
  1. Feedback doesn't work with LCG-CE, so these CEs must be excluded in the MM NOT FIXED (2011/01/20)
Changed:
<
<
  1. WM defines a rescheduled request as a "Submission" NOT FIXED
>
>
  1. WM defines a rescheduled request as a "Submission" FIXED (2011/01/25)
 
  1. Bug: #76097 During the first mm of a node the "UserTag CEInfoHostName" is not logged FIXED
  2. Both WM and ICE give on high load: t1168148800:p30771: Fatal error: [Thread System] GLOBUSTHREAD: pthread_mutex_lock() failed. Supposedly fixed by the 'newest' GPT in glite 32. Pay attention.
  3. Yaim fails (builds 20110114) NOT FIXED (2011/01/14):

Revision 572011-01-21 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"
Line: 445 to 445
 
  1. Feedback doesn't work NOT FIXED (2011/01/20)
    20 Jan, 13:28:13 -I: [Info] operator()(replanner.cpp:226): created replanning request for job https://cream-46.pd.infn.it:9000/P4kfEQQnyw3g9NL7qyfA3Q with token /var/glite/SandboxDir/P4/https_3a_2f_2fcream-46.pd.infn.it_3a9000_2fP4kfEQQnyw3g9NL7qyfA3Q/token.txt_1
    20 Jan, 13:28:14 -I: [Info] operator()(dispatcher_utils.cpp:310): cannot create LB context (2) for [...] 
Added:
>
>
  1. Before resubmission ICE must be sure that the job's proxy should be valid for at least "n" minutes. NOT FIXED (2011/01/21)
  -- AlessioGianelle - 2010-09-08

Revision 562011-01-21 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"
Line: 392 to 392
 
  1. WM defines a rescheduled request as a "Submission" NOT FIXED
  2. Bug: #76097 During the first mm of a node the "UserTag CEInfoHostName" is not logged FIXED
  3. Both WM and ICE give on high load: t1168148800:p30771: Fatal error: [Thread System] GLOBUSTHREAD: pthread_mutex_lock() failed. Supposedly fixed by the 'newest' GPT in glite 32. Pay attention.
Changed:
<
<
  1. Yaim fails (builds 20110114) NOT FIXED :
>
>
  1. Yaim fails (builds 20110114) NOT FIXED (2011/01/14):
 cp: cannot stat `/opt/glite/sbin/glite_wms_wmproxy_load_monitor.template': No such file or directory chmod: cannot access `/opt/glite/sbin/glite_wms_wmproxy_load_monitor': No such file or directory
Changed:
<
<
  1. Submission failed (builds 20110114) Understood... :
>
>
  1. Submission failed (builds 20110114) Understood... (2011/01/17) :
 Status: 500 Internal Server Error Server: gSOAP/2.7 Content-Type: text/xml; charset=utf-8

Revision 552011-01-20 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"
Line: 388 to 388
  Then the job fails with this message:
        - Standard output does not contain useful data.Cannot read JobWrapper output, both from Condor and from Maradona.
Thu Dec  2 12:08:53 CET 2010: 0LM_log_done_beginThu Dec  2 12:08:53 CET 2010: 0LM_log_done_end
Changed:
<
<
  1. Feedback doesn't work with LCG-CE, so these CEs must be excluded in the MM NOT FIXED
>
>
  1. Feedback doesn't work with LCG-CE, so these CEs must be excluded in the MM NOT FIXED (2011/01/20)
 
  1. WM defines a rescheduled request as a "Submission" NOT FIXED
  2. Bug: #76097 During the first mm of a node the "UserTag CEInfoHostName" is not logged FIXED
  3. Both WM and ICE give on high load: t1168148800:p30771: Fatal error: [Thread System] GLOBUSTHREAD: pthread_mutex_lock() failed. Supposedly fixed by the 'newest' GPT in glite 32. Pay attention.
Line: 442 to 442
 Error - Operation failed Unable to find any endpoint where to perform service request
  1. Bug #77055 NOT FIXED (2011/01/20) "MyProxyServer: wrong type caught for attribute" for parametric jobs
Changed:
<
<
>
>
  1. Feedback doesn't work NOT FIXED (2011/01/20)
    20 Jan, 13:28:13 -I: [Info] operator()(replanner.cpp:226): created replanning request for job https://cream-46.pd.infn.it:9000/P4kfEQQnyw3g9NL7qyfA3Q with token /var/glite/SandboxDir/P4/https_3a_2f_2fcream-46.pd.infn.it_3a9000_2fP4kfEQQnyw3g9NL7qyfA3Q/token.txt_1
    20 Jan, 13:28:14 -I: [Info] operator()(dispatcher_utils.cpp:310): cannot create LB context (2) for [...] 
  -- AlessioGianelle - 2010-09-08

Revision 542011-01-20 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"
Line: 441 to 441
  Error - Operation failed Unable to find any endpoint where to perform service request
Added:
>
>
  1. Bug #77055 NOT FIXED (2011/01/20) "MyProxyServer: wrong type caught for attribute" for parametric jobs
 

-- AlessioGianelle - 2010-09-08

Revision 532011-01-19 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"
Line: 297 to 297
 Warning - JobPurging not allowed (The Operation is not allowed: Unable to complete job purge) You need to set the WMS DN in both ACTION "READ_ALL" and "LOG_WMS_EVENTS" of glite-lb-authz.conf file of the LB Server
Changed:
<
<
  1. Cleared event not logged NOT FIXED .
>
>
  1. Cleared event not logged FIXED (2011/01/19).
 [ale@cream-03 ~]$ glite-wms-job-output https://devel07.cnaf.infn.it:9000/Hn0VS-oMpYlT7bdAIXNB7g

Connecting to the service https://cream-46.pd.infn.it:7443/glite_wms_wmproxy_server

Line: 353 to 353
 2010-11-15 10:57:42,019 DEBUG - DNProxyManager::setUserProxyIfLonger_Legacy() - New proxy [/var/glite/SandboxDir/jI/https_3a_2f_2fdevel17.cnaf.infn.it_3a9000_2fjIwx1BAneLSLa93u3CEeAQ/user.proxy] has been copied into [/var/glite/ice/persist_dir/B23D0D7177A8B6234F1985493FA09FF41A4FA98C.proxy] - New Expiration Time is [Tue Nov 16 10:56:34 2010]
  1. Problem with job's proxy expired in ice FIXED To verify with the new tag . When proxy expired the jobs in ice queue are not correctly removed
    2010-11-17 10:08:37,269 DEBUG - iceCommandLBLogging::execute - TID=[] Will not LOG anything to LB for Job [https://devel17.cnaf.infn.it:9000/ZSwHnJmAMblpLYKwz69mFQ] for reason: CreamJobID [CREAM898633002] disappeared from ICE database !
Changed:
<
<
  1. Ice log Cancel events with wrong sequence code NOT FIXED
>
>
  1. Ice log Cancel events with wrong sequence code FIXED (2011-01-19)
 
  1. Ice Aborted NOT FIXED It happen that ICE aborted with these messages:
    t1169246528:p13952: Fatal error: [Thread System] GLOBUSTHREAD: pthread_mutex_destroy() failed
    
Line: 362 to 362
 t1313102144:p12551: Fatal error: [Thread System] GLOBUSTHREAD: globus_thread_setspecific() failed

[Thread System] invalid value passed to thread interface (EINVAL)Aborted (core dumped)

Changed:
<
<
  1. glite-wms-ice-db-rm, various errors NOT FIXED
>
>
  1. glite-wms-ice-db-rm, various errors FIXED (2011/01/19)
 [root@cream-46 persist_dir]# /opt/glite/bin/glite-wms-ice-db-rm -h /opt/glite/bin/glite-wms-ice-db-rm: invalid option -- h Type /opt/glite/bin/glite-wms-ice-db-rm -h for help
Line: 421 to 421
 19 Jan, 12:16:55 -D- PID: 17941 - "wmpgsoapoperations::ns1__getOutputFileList": Stack: 19 Jan, 12:16:55 -D- PID: 17941 - "wmpgsoapoperations::ns1__getOutputFileList": ---------------------------------------------------------------------------------- 19 Jan, 12:16:55 -D- PID: 17941 - "wmpgsoapoperations::ns1__getOutputFileList": getOutputFileList operation completed
Added:
>
>
  1. deep resubmission doesn't work NOT FIXED (2011/01/19) In fact when the old token is grabbed, wm doesn't create the new one
  2. Sometimes submission failed for LB error NOT FIXED (2011/01/19) :
    Warning - Unable to register the job to the service: https://cream-46.pd.infn.it:7443/glite_wms_wmproxy_server
    Register COLLECTIONfailed to LB server:devel17.cnaf.infn.it:9000
    edg_wll_RegisterJobProxy/Sync
    Exit code: 22
    LB[Proxy] Error: Invalid argument
    (edg_wll_RegisterJobMaster(): unable to register job
    Resource temporarily unavailable;; Logging library ERROR: 
    Resource temporarily unavailable;; edg_wll_DoLogEventServer(): edg_wll_log_direct_read error
    LB server (bkserver,lbproxy) store protocol error;; edg_wll_log_proto_client_direct(): error reading answer from L&B direct server
    LB server (bkserver,lbproxy) store protocol error;; get_reply_gss(): error reading reply
    LB server (bkserver,lbproxy) store protocol error;; gss_reader(): error reading message
    Transport endpoint is not connected;; edg_wll_gss_read_full;; GSS Error: EOF occured;)
    
    Method: jobRegister
 
Added:
>
>
Error - Operation failed Unable to find any endpoint where to perform service request
 

-- AlessioGianelle - 2010-09-08

Revision 522011-01-19 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"
Line: 214 to 214
  and nodes stuck in "Submitted". FIXED .
Changed:
<
<
  1. BUG #75223 NOT FIXED When a job failed logged reason is wrong
>
>
  1. BUG #75223 FIXED When a job failed logged reason is wrong
 Event: Done - Arrived = Thu Nov 11 14:06:54 2010 CET - Exit code = 1
Line: 323 to 323
 - Tag = unavailable - Timestamp = Wed Nov 17 13:31:54 2010 CET - User = /C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Alessio Gianelle/CN=proxy/CN=proxy
Changed:
<
<
  1. St9exception NOT FIXED . After the "get-output" of a dag, the nodes are in "DONE-OK" state. If you try a get-output at this point for a node you have:
>
>
  1. St9exception NOT FIXED (2011/01/19). After the "get-output" of a dag, the nodes are in "DONE-OK" state. If you try a get-output at this point for a node you have:
  [ale@cream-03 ~]$ glite-wms-job-output https://devel17.cnaf.infn.it:9000/-ZpRldEJcENxhNiwMqidiw

Connecting to the service https://cream-46.pd.infn.it:7443/glite_wms_wmproxy_server

Error - getOutputFileList Error (St9exception)

Changed:
<
<
  1. The state of the nodes doesn't change neither after the output retrieval of the parent nor after the cancellation of the parent NOT FIXED
>
>
Instead the right message should be found in the wmproxy log file:
19 Jan, 12:55:46 -D- PID: 22916 - "wmpgsoapoperations::ns1__getOutputFileList": JobId Exception: The Operation is not allowed: The job has not been registered from this Workload Manager Proxy server (https://cream-46.pd.infn.it:7443/glite_wms_wmproxy_server) or it has been purged
19 Jan, 12:55:46 -D- PID: 22916 - "wmpgsoapoperations::ns1__getOutputFileList": ------------------------------- Fault Description --------------------------------
19 Jan, 12:55:46 -D- PID: 22916 - "wmpgsoapoperations::ns1__getOutputFileList": Method: getOutputFileList
19 Jan, 12:55:46 -D- PID: 22916 - "wmpgsoapoperations::ns1__getOutputFileList": Code: 1202
19 Jan, 12:55:46 -D- PID: 22916 - "wmpgsoapoperations::ns1__getOutputFileList": Description: St9exception
19 Jan, 12:55:46 -D- PID: 22916 - "wmpgsoapoperations::ns1__getOutputFileList": Stack: 
19 Jan, 12:55:46 -D- PID: 22916 - "wmpgsoapoperations::ns1__getOutputFileList": ----------------------------------------------------------------------------------
19 Jan, 12:55:46 -D- PID: 22916 - "wmpgsoapoperations::ns1__getOutputFileList": getOutputFileList operation completed
  1. The state of the nodes doesn't change neither after the output retrieval of the parent nor after the cancellation of the parent NOT FIXED (2011/01/19)
 
  1. Dag job doesn't work FIXED . Probably the name of the token is wrongly set.
  2. Bug #73715 FIXED the Really Running event is not logged from JC/LM (instead it works for ICE).
Line: 381 to 390
 Thu Dec 2 12:08:53 CET 2010: 0LM_log_done_beginThu Dec 2 12:08:53 CET 2010: 0LM_log_done_end
  1. Feedback doesn't work with LCG-CE, so these CEs must be excluded in the MM NOT FIXED
  2. WM defines a rescheduled request as a "Submission" NOT FIXED
Changed:
<
<
  1. Bug: #76097 During the first mm of a node the "UserTag CEInfoHostName" is not logged FIXED To verify
>
>
  1. Bug: #76097 During the first mm of a node the "UserTag CEInfoHostName" is not logged FIXED
 
  1. Both WM and ICE give on high load: t1168148800:p30771: Fatal error: [Thread System] GLOBUSTHREAD: pthread_mutex_lock() failed. Supposedly fixed by the 'newest' GPT in glite 32. Pay attention.
  2. Yaim fails (builds 20110114) NOT FIXED :
    cp: cannot stat `/opt/glite/sbin/glite_wms_wmproxy_load_monitor.template': No such file or directory
Line: 395 to 404
  <SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:jsdl="http://schemas.ggf.org/jsdl/2005/11/jsdl" xmlns:jsdlposix="http://schemas.ggf.org/jsdl/2005/11/jsdl-posix" xmlns:delegation1="http://www.gridsite.org/namespaces/delegation-1" xmlns:delegationns="http://www.gridsite.org/namespaces/delegation-2" xmlns:ns1="http://glite.org/wms/wmproxy"><SOAP-ENV:Body><SOAP-ENV:Fault SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">SOAP-ENV:ClientEnd of file or no input: 'Invalid argument'</SOAP-ENV:Fault></SOAP-ENV:Body></SOAP-ENV:Envelope>[Fri Jan 14 16:45:46 2011] [error] [client 193.206.210.108] FastCGI: incomplete headers (0 bytes) received from server "/opt/glite/bin/glite_wms_wmproxy_server"
Changed:
<
<
  1. Cannot take shallow resubmission token NOT FIXED . Infact in the sandbox dir of the job the token is named: token.txt_0. (note the dot)
>
>
  1. Cannot take shallow resubmission token NOT FIXED (2011/01/19). Infact in the sandbox dir of the job the token is named: token.txt_0. (note the dot)
  2. Misleading output message from UI NOT FIXED (2011/01/19) When a user try to retrieve the output files of a job of another user the message is (see also 34.):
     
    [ale@cream-03 UI]$ glite-wms-job-output https://devel17.cnaf.infn.it:9000/cxyj4EodCZx7qY2vretM6g
    
    Connecting to the service https://cream-46.pd.infn.it:7443/glite_wms_wmproxy_server
    
    Error - getOutputFileList Error
     (St9exception)
Instead in the wmproxy log file you should find the right reason:
19 Jan, 12:16:55 -D- PID: 17941 - "wmpgsoapoperations::ns1__getOutputFileList": JobId Exception: User not authorized to perform this operation
19 Jan, 12:16:55 -D- PID: 17941 - "wmpgsoapoperations::ns1__getOutputFileList": ------------------------------- Fault Description --------------------------------
19 Jan, 12:16:55 -D- PID: 17941 - "wmpgsoapoperations::ns1__getOutputFileList": Method: getOutputFileList
19 Jan, 12:16:55 -D- PID: 17941 - "wmpgsoapoperations::ns1__getOutputFileList": Code: 1202
19 Jan, 12:16:55 -D- PID: 17941 - "wmpgsoapoperations::ns1__getOutputFileList": Description: St9exception
19 Jan, 12:16:55 -D- PID: 17941 - "wmpgsoapoperations::ns1__getOutputFileList": Stack: 
19 Jan, 12:16:55 -D- PID: 17941 - "wmpgsoapoperations::ns1__getOutputFileList": ----------------------------------------------------------------------------------
19 Jan, 12:16:55 -D- PID: 17941 - "wmpgsoapoperations::ns1__getOutputFileList": getOutputFileList operation completed
 

Revision 512011-01-19 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"
Line: 395 to 395
  <SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:jsdl="http://schemas.ggf.org/jsdl/2005/11/jsdl" xmlns:jsdlposix="http://schemas.ggf.org/jsdl/2005/11/jsdl-posix" xmlns:delegation1="http://www.gridsite.org/namespaces/delegation-1" xmlns:delegationns="http://www.gridsite.org/namespaces/delegation-2" xmlns:ns1="http://glite.org/wms/wmproxy"><SOAP-ENV:Body><SOAP-ENV:Fault SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">SOAP-ENV:ClientEnd of file or no input: 'Invalid argument'</SOAP-ENV:Fault></SOAP-ENV:Body></SOAP-ENV:Envelope>[Fri Jan 14 16:45:46 2011] [error] [client 193.206.210.108] FastCGI: incomplete headers (0 bytes) received from server "/opt/glite/bin/glite_wms_wmproxy_server"
Added:
>
>
  1. Cannot take shallow resubmission token NOT FIXED . Infact in the sandbox dir of the job the token is named: token.txt_0. (note the dot)
 

Revision 502011-01-18 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"
Line: 386 to 386
 
  1. Yaim fails (builds 20110114) NOT FIXED :
    cp: cannot stat `/opt/glite/sbin/glite_wms_wmproxy_load_monitor.template': No such file or directory
    chmod: cannot access `/opt/glite/sbin/glite_wms_wmproxy_load_monitor': No such file or directory
Changed:
<
<
  1. Submission failed (builds 20110114) NOT FIXED :
>
>
  1. Submission failed (builds 20110114) Understood... :
 Status: 500 Internal Server Error Server: gSOAP/2.7 Content-Type: text/xml; charset=utf-8

Revision 492011-01-14 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"
Line: 383 to 383
 
  1. WM defines a rescheduled request as a "Submission" NOT FIXED
  2. Bug: #76097 During the first mm of a node the "UserTag CEInfoHostName" is not logged FIXED To verify
  3. Both WM and ICE give on high load: t1168148800:p30771: Fatal error: [Thread System] GLOBUSTHREAD: pthread_mutex_lock() failed. Supposedly fixed by the 'newest' GPT in glite 32. Pay attention.
Changed:
<
<
  1. Yaim fails (buils 20110114) NOT FIXED :
     
>
>
  1. Yaim fails (builds 20110114) NOT FIXED :
 cp: cannot stat `/opt/glite/sbin/glite_wms_wmproxy_load_monitor.template': No such file or directory chmod: cannot access `/opt/glite/sbin/glite_wms_wmproxy_load_monitor': No such file or directory
Added:
>
>
  1. Submission failed (builds 20110114) NOT FIXED :
    Status: 500 Internal Server Error
    Server: gSOAP/2.7
    Content-Type: text/xml; charset=utf-8
    Content-Length: 831
    Connection: close
    
    <?xml version="1.0" encoding="UTF-8"?>
    <SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:jsdl="http://schemas.ggf.org/jsdl/2005/11/jsdl" xmlns:jsdlposix="http://schemas.ggf.org/jsdl/2005/11/jsdl-posix" xmlns:delegation1="http://www.gridsite.org/namespaces/delegation-1" xmlns:delegationns="http://www.gridsite.org/namespaces/delegation-2" xmlns:ns1="http://glite.org/wms/wmproxy"><SOAP-ENV:Body><SOAP-ENV:Fault SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"><faultcode>SOAP-ENV:Client</faultcode><faultstring>End of file or no input: 'Invalid argument'</faultstring></SOAP-ENV:Fault></SOAP-ENV:Body></SOAP-ENV:Envelope>[Fri Jan 14 16:45:46 2011] [error] [client 193.206.210.108] FastCGI: incomplete headers (0 bytes) received from server "/opt/glite/bin/glite_wms_wmproxy_server"
 

Revision 482011-01-14 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"
Line: 382 to 382
 
  1. Feedback doesn't work with LCG-CE, so these CEs must be excluded in the MM NOT FIXED
  2. WM defines a rescheduled request as a "Submission" NOT FIXED
  3. Bug: #76097 During the first mm of a node the "UserTag CEInfoHostName" is not logged FIXED To verify
Changed:
<
<
  1. Both WM and ICE give on high load: t1168148800:p30771: Fatal error: [Thread System] GLOBUSTHREAD: pthread_mutex_lock() failed. Supposedly fixed by the 'newest' GPT in glite 32. Pay attention.
>
>
  1. Both WM and ICE give on high load: t1168148800:p30771: Fatal error: [Thread System] GLOBUSTHREAD: pthread_mutex_lock() failed. Supposedly fixed by the 'newest' GPT in glite 32. Pay attention.
  2. Yaim fails (buils 20110114) NOT FIXED :
     
    cp: cannot stat `/opt/glite/sbin/glite_wms_wmproxy_load_monitor.template': No such file or directory
    chmod: cannot access `/opt/glite/sbin/glite_wms_wmproxy_load_monitor': No such file or directory
 

-- AlessioGianelle - 2010-09-08

Revision 472010-12-14 - MarcoCecchi

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"
Line: 382 to 382
 
  1. Feedback doesn't work with LCG-CE, so these CEs must be excluded in the MM NOT FIXED
  2. WM defines a rescheduled request as a "Submission" NOT FIXED
  3. Bug: #76097 During the first mm of a node the "UserTag CEInfoHostName" is not logged FIXED To verify
Changed:
<
<
>
>
  1. Both WM and ICE give on high load: t1168148800:p30771: Fatal error: [Thread System] GLOBUSTHREAD: pthread_mutex_lock() failed. Supposedly fixed by the 'newest' GPT in glite 32. Pay attention.
 

-- AlessioGianelle - 2010-09-08

Revision 462010-12-14 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"
Line: 201 to 201
 Error: Missing Dependency: org.glite.build.common-cpp >= 3.2.1 is needed by package glite-security-lcmaps-without-gsi-1.4.8-5.sl5.x86_64 (ETICS-volatile-build-5b42070f-c48b-4e7b-9819-48810222a0b3-sl5_x86_64_gcc412) Error: Missing Dependency: mod_fastcgi >= 2.4.3 is needed by package glite-wms-wmproxy-3.3.0-3.sl5.x86_64 (ETICS-volatile-build-5b42070f-c48b-4e7b-9819-48810222a0b3-sl5_x86_64_gcc412) Error: Missing Dependency: mod_fastcgi >= 2.4.3 is needed by package glite-WMS-3.3.0-0.sl5.x86_64 (ETICS-volatile-build-5b42070f-c48b-4e7b-9819-48810222a0b3-sl5_x86_64_gcc412)
Changed:
<
<
two packages missing... NOT FIXED
>
>
two packages missing... FIXED
 
  1. Resubmission for jobs submitted with -r option is not done. Jobs are aborted with "hit job shallow retry count (0)" even if ShallowRetryCount was set in the JDL FIXED .
Line: 234 to 234
 - Status code = OK - Timestamp = Fri Nov 26 12:13:46 2010 CET - User = /C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Alessio Gianelle
Changed:
<
<
  1. Problem installing new build (07/11/1010) NOT FIXED .
>
>
  1. Problem installing new build (07/11/1010) FIXED .
  --> Missing Dependency: mod_fastcgi >= 2.4.3 is needed by package glite-wms-wmproxy-3.3.0-4.sl5.x86_64 (ETICS-name-patch_2876_1)
  1. Problem starting bdii:
    Starting BDII slapd: Traceback (most recent call last):
Line: 334 to 334
 
  1. Dag job doesn't work FIXED . Probably the name of the token is wrongly set.
  2. Bug #73715 FIXED the Really Running event is not logged from JC/LM (instead it works for ICE).
Changed:
<
<
  1. The "type" attribute is case sensitive NOT FIXED
>
>
  1. The "type" attribute is case sensitive FIXED installing the new jdl's rpm on the UI
 Error - AdSyntaxException The following parsing error(s) have been found: 'node_type' must be "dag"

Revision 452010-12-07 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"
Line: 381 to 381
 Thu Dec 2 12:08:53 CET 2010: 0LM_log_done_beginThu Dec 2 12:08:53 CET 2010: 0LM_log_done_end
  1. Feedback doesn't work with LCG-CE, so these CEs must be excluded in the MM NOT FIXED
  2. WM defines a rescheduled request as a "Submission" NOT FIXED
Changed:
<
<
  1. Bug: #76097 During the first mm of a node the "UserTag CEInfoHostName" is not logged NOT FIXED
>
>
  1. Bug: #76097 During the first mm of a node the "UserTag CEInfoHostName" is not logged FIXED To verify
 

Revision 442010-12-06 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"
Line: 381 to 381
 Thu Dec 2 12:08:53 CET 2010: 0LM_log_done_beginThu Dec 2 12:08:53 CET 2010: 0LM_log_done_end
  1. Feedback doesn't work with LCG-CE, so these CEs must be excluded in the MM NOT FIXED
  2. WM defines a rescheduled request as a "Submission" NOT FIXED
Changed:
<
<
  1. During the first mm of a node the "UserTag CEInfoHostName" is not logged NOT FIXED
>
>
  1. Bug: #76097 During the first mm of a node the "UserTag CEInfoHostName" is not logged NOT FIXED
 

Revision 432010-12-06 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"
Line: 381 to 381
 Thu Dec 2 12:08:53 CET 2010: 0LM_log_done_beginThu Dec 2 12:08:53 CET 2010: 0LM_log_done_end
  1. Feedback doesn't work with LCG-CE, so these CEs must be excluded in the MM NOT FIXED
  2. WM defines a rescheduled request as a "Submission" NOT FIXED
Added:
>
>
  1. During the first mm of a node the "UserTag CEInfoHostName" is not logged NOT FIXED
 

Revision 422010-12-06 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"
Line: 380 to 380
  - Standard output does not contain useful data.Cannot read JobWrapper output, both from Condor and from Maradona. Thu Dec 2 12:08:53 CET 2010: 0LM_log_done_beginThu Dec 2 12:08:53 CET 2010: 0LM_log_done_end
  1. Feedback doesn't work with LCG-CE, so these CEs must be excluded in the MM NOT FIXED
Added:
>
>
  1. WM defines a rescheduled request as a "Submission" NOT FIXED
 

Revision 412010-12-02 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"
Line: 379 to 379
  Then the job fails with this message:
        - Standard output does not contain useful data.Cannot read JobWrapper output, both from Condor and from Maradona.
Thu Dec  2 12:08:53 CET 2010: 0LM_log_done_beginThu Dec  2 12:08:53 CET 2010: 0LM_log_done_end
Added:
>
>
  1. Feedback doesn't work with LCG-CE, so these CEs must be excluded in the MM NOT FIXED
 

Revision 402010-12-02 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"
Line: 363 to 363
  [root@cream-46 persist_dir]# /opt/glite/bin/glite-wms-ice-db-rm --from-file a /opt/glite/bin/glite-wms-ice-db-rm: unrecognized option `--from-file'
Changed:
<
<
  1. feedback doesn't work NOT FIXED The "old" token is not removed!
>
>
  1. feedback doesn't work FIXED to check The "old" token is not removed!
 [root@devel09 ~]# ls -l /var/glite/SandboxDir/B4/https_3a_2f_2fdevel17.cnaf.infn.it_3a9000_2fB4ni_5fIMNzn6cY6W8llClSA/ total 20 drwxrwx--- 2 dteam035 glite 4096 Nov 30 10:17 input
Line: 372 to 372
 drwxrwx--- 2 dteam035 glite 4096 Nov 30 10:17 peek -rw-r--r-- 1 glite glite 0 Nov 30 10:19 token.txt_0_ -rw-r--r-- 1 glite glite 0 Nov 30 10:17 token.txt_1
Added:
>
>
  1. Misleading messages in Maradona file: NOT FIXED
    LM_log_done_beginThu Dec  2 12:08:53 CET 2010: 0
    LM_log_done_end
    jw exit status = 1
    Then the job fails with this message:
            - Standard output does not contain useful data.Cannot read JobWrapper output, both from Condor and from Maradona.
    Thu Dec  2 12:08:53 CET 2010: 0LM_log_done_beginThu Dec  2 12:08:53 CET 2010: 0LM_log_done_end
 

-- AlessioGianelle - 2010-09-08

Revision 392010-11-30 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"
Line: 363 to 363
  [root@cream-46 persist_dir]# /opt/glite/bin/glite-wms-ice-db-rm --from-file a /opt/glite/bin/glite-wms-ice-db-rm: unrecognized option `--from-file'
Added:
>
>
  1. feedback doesn't work NOT FIXED The "old" token is not removed!
    [root@devel09 ~]# ls -l  /var/glite/SandboxDir/B4/https_3a_2f_2fdevel17.cnaf.infn.it_3a9000_2fB4ni_5fIMNzn6cY6W8llClSA/
    total 20
    drwxrwx--- 2 dteam035 glite 4096 Nov 30 10:17 input
    -rw-r--r-- 1 dteam035 dteam  133 Nov 30 10:41 Maradona.output
    drwxrwx--- 2 dteam035 glite 4096 Nov 30 10:17 output
    drwxrwx--- 2 dteam035 glite 4096 Nov 30 10:17 peek
    -rw-r--r-- 1 glite    glite    0 Nov 30 10:19 token.txt_0_
    -rw-r--r-- 1 glite    glite    0 Nov 30 10:17 token.txt_1
 

-- AlessioGianelle - 2010-09-08

Revision 382010-11-26 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"
Line: 132 to 132
 27 Jul, 17:41:22 -D- PID: 25792 - "wmpgsoapoperations::ns1__jobPurge": at jobPurge()[wmpcoreoperations.cpp:2667] 27 Jul, 17:41:22 -D- PID: 25792 - "wmpgsoapoperations::ns1__jobPurge": ---------------------------------------------------------------------------------- 27 Jul, 17:41:22 -D- PID: 25792 - "wmpgsoapoperations::ns1__jobPurge": jobPurge operation completed
Changed:
<
<
TO CHECK
>
>
FIXED for normal jobs
 
  1. Submission to LCG CE doesn't work: Got a job held event, reason: Failed to initialize GAHP
Line: 166 to 166
 
  1. WM required huge amount of memory:
    PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND  
    15007 glite     25   0 2657m 2.8g 8320 S  0.0 77.0   2:34.79 glite-wms-workl 
Changed:
<
<
NOT FIXED Seems ok now if using google malloc. Checking if it is possible to have it in the WMS MP and repo
>
>
Monitoring.. Google malloc is used by default and correctly set by yaim
 
  1. *** glite-lb-bkserverd:
Line: 224 to 224
 - Src instance = unique - Status code = FAILED - Timestamp = Thu Nov 11 14:06:54 2010 CET
Added:
>
>
Another type of error:
Event: Done
- Arrived                    =    Fri Nov 26 12:15:30 2010 CET
- Exit code                  =    0
- Host                       =    pamelawn23.na.infn.it
- Reason                     =    Fri Nov 26 12:13:45 CET 2010: Cannot
- Source                     =    LRMS
- Status code                =    OK
- Timestamp                  =    Fri Nov 26 12:13:46 2010 CET
- User                       =    /C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Alessio Gianelle
 
  1. Problem installing new build (07/11/1010) NOT FIXED .
      --> Missing Dependency: mod_fastcgi >= 2.4.3 is needed by package glite-wms-wmproxy-3.3.0-4.sl5.x86_64 (ETICS-name-patch_2876_1)
  2. Problem starting bdii:
Line: 236 to 246
  [ OK ] BDII update process failed to startStarting BDII update pro[FAILED] Be sure that the installed version cames from etics repo: bdii-5.0.9-1
Changed:
<
<
  1. BUG #75099 FIXED using the new ice rpm coming from head .
>
>
  1. BUG #75099 FIXED .
 2010-11-09 13:01:07,649 DEBUG - iceCommandEventQuery::execute() - TID=[203357600] Database ID=[1265375986000] 2010-11-09 13:01:07,650 DEBUG - iceCommandEventQuery::execute() - TID=[203357600] Exec time ID=[6] 2010-11-09 13:01:07,651 INFO - scoped_timer iceCommandEventQuery::processEvents() - TID=[203357600] All Events Proc Time 1289304067.650547 1289304067.651501 0.000954
Line: 244 to 254
  what(): basic_string::at Aborted It happen when shallow resubmission is set to -1.
Changed:
<
<
  1. Submission failed NOT FIXED Problem not disappear using a new LB server
>
>
  1. Submission failed Probably FIXED using a new LB server (2.1)
 Warning - Unable to register the job to the service: https://cream-46.pd.infn.it:7443/glite_wms_wmproxy_server HTTP Error
Line: 295 to 305
 Warning - JobPurging not allowed (Proxy exception: The delegated Proxy has expired) This happen when the job proxy has expired.
Changed:
<
<
  1. Bug #75368 NOT FIXED A "DONE OK" job is marked as "ABORTED". The problem is that a failed job should not been aborted (resubmission is possibile only for Done Failed jobs). Another case:
>
>
  1. Bug #75368 FIXED A "DONE OK" job is marked as "ABORTED". The problem is that a failed job should not been aborted (resubmission is possibile only for Done Failed jobs). Another case:
  Event: Abort - Arrived = Wed Nov 17 13:31:54 2010 CET - Host = cream-46.pd.infn.it
Line: 322 to 332
  (St9exception)
  1. The state of the nodes doesn't change neither after the output retrieval of the parent nor after the cancellation of the parent NOT FIXED
Changed:
<
<
  1. Dag job doesn't work FIXED ... to check in the next tag . Probably the name of the token is wrongly set.
  2. Bug #73715 NOT FIXED the Really Running event is not logged from JC/LM (instead it works for ICE).
>
>
  1. Dag job doesn't work FIXED . Probably the name of the token is wrongly set.
  2. Bug #73715 FIXED the Really Running event is not logged from JC/LM (instead it works for ICE).
 
  1. The "type" attribute is case sensitive NOT FIXED
    Error - AdSyntaxException
    The following parsing error(s) have been found:

Revision 372010-11-18 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"
Line: 343 to 343
 t1313102144:p12551: Fatal error: [Thread System] GLOBUSTHREAD: globus_thread_setspecific() failed

[Thread System] invalid value passed to thread interface (EINVAL)Aborted (core dumped)

Added:
>
>
  1. glite-wms-ice-db-rm, various errors NOT FIXED
    [root@cream-46 persist_dir]# /opt/glite/bin/glite-wms-ice-db-rm -h
    /opt/glite/bin/glite-wms-ice-db-rm: invalid option -- h
    Type /opt/glite/bin/glite-wms-ice-db-rm -h for help
    
    [root@cream-46 persist_dir]# /opt/glite/bin/glite-wms-ice-db-rm
    Must specify at least one of the options --from-file <pathfile> or <gridjobid>
    
    [root@cream-46 persist_dir]# /opt/glite/bin/glite-wms-ice-db-rm --from-file a
    /opt/glite/bin/glite-wms-ice-db-rm: unrecognized option `--from-file' 
 

-- AlessioGianelle - 2010-09-08

Revision 362010-11-18 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"
Line: 335 to 335
 
  1. Problem with job's proxy expired in ice FIXED To verify with the new tag . When proxy expired the jobs in ice queue are not correctly removed
    2010-11-17 10:08:37,269 DEBUG - iceCommandLBLogging::execute - TID=[] Will not LOG anything to LB for Job [https://devel17.cnaf.infn.it:9000/ZSwHnJmAMblpLYKwz69mFQ] for reason: CreamJobID [CREAM898633002] disappeared from ICE database !
  2. Ice log Cancel events with wrong sequence code NOT FIXED
Added:
>
>
  1. Ice Aborted NOT FIXED It happen that ICE aborted with these messages:
    t1169246528:p13952: Fatal error: [Thread System] GLOBUSTHREAD: pthread_mutex_destroy() failed
    
    [Thread System] mutex is locked (EBUSY)t1295124800:p13952: Fatal error: [Thread System] GLOBUSTHREAD: globus_thread_setspecific() failed
    or
    t1313102144:p12551: Fatal error: [Thread System] GLOBUSTHREAD: globus_thread_setspecific() failed
    
    [Thread System] invalid value passed to thread interface (EINVAL)Aborted (core dumped)
 

-- AlessioGianelle - 2010-09-08

Revision 352010-11-17 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"
Line: 334 to 334
 2010-11-15 10:57:42,019 DEBUG - DNProxyManager::setUserProxyIfLonger_Legacy() - New proxy [/var/glite/SandboxDir/jI/https_3a_2f_2fdevel17.cnaf.infn.it_3a9000_2fjIwx1BAneLSLa93u3CEeAQ/user.proxy] has been copied into [/var/glite/ice/persist_dir/B23D0D7177A8B6234F1985493FA09FF41A4FA98C.proxy] - New Expiration Time is [Tue Nov 16 10:56:34 2010]
  1. Problem with job's proxy expired in ice FIXED To verify with the new tag . When proxy expired the jobs in ice queue are not correctly removed
    2010-11-17 10:08:37,269 DEBUG - iceCommandLBLogging::execute - TID=[] Will not LOG anything to LB for Job [https://devel17.cnaf.infn.it:9000/ZSwHnJmAMblpLYKwz69mFQ] for reason: CreamJobID [CREAM898633002] disappeared from ICE database !
Changed:
<
<
  1. NOT FIXED If the request for a node expiry, an abort event is logged also for the parent:
    ======================= glite-wms-job-status Success =====================
    BOOKKEEPING INFORMATION:
    
    Status info for the Job : https://devel17.cnaf.infn.it:9000/xLtssR8L6VPnOW4oCdRsNQ
    Current Status:     Running 
    Status Reason:      request expired
    Submitted:          Wed Nov 17 13:28:54 2010 CET
    ==========================================================================
    
    - Nodes information for: 
        Status info for the Job : https://devel17.cnaf.infn.it:9000/UpxeviPCgDgJ9augCANhuw
        Current Status:     Running 
        Status Reason:      Job successfully submitted to Globus
        Destination:        ce104.cern.ch:2119/jobmanager-lcglsf-grid_dteam
        Submitted:          Wed Nov 17 13:28:54 2010 CET
    ==========================================================================
        
        Status info for the Job : https://devel17.cnaf.infn.it:9000/m-UxeYg2ZlJX7ZPD-hv9pQ
        Current Status:     Running 
        Status Reason:      Job successfully submitted to Globus
        Destination:        atlas-ce-02.roma1.infn.it:2119/jobmanager-lcglsf-atlasgcert
        Submitted:          Wed Nov 17 13:28:54 2010 CET
    ==========================================================================
        
        Status info for the Job : https://devel17.cnaf.infn.it:9000/nLtXmSxRIapqZERWBtYP6Q
        Current Status:     Aborted 
        Status Reason:      request expired
        Submitted:          Wed Nov 17 13:28:54 2010 CET
    ==========================================================================
    in workload_manager log file:
    17 Nov, 13:38:57 -E: [Error] unrecoverable_collection(submit_request.cpp:108): https://devel17.cnaf.infn.it:9000/xLtssR8L6VPnOW4oCdRsNQ: unable to retrieve children information from jobstatus
    17 Nov, 13:38:57 -E: [Error] unrecoverable(submit_request.cpp:126): https://devel17.cnaf.infn.it:9000/xLtssR8L6VPnOW4oCdRsNQ failed (request expired)
    17 Nov, 13:38:58 -E: [Error] unrecoverable(submit_request.cpp:126): https://devel17.cnaf.infn.it:9000/nLtXmSxRIapqZERWBtYP6Q failed (request expired)
>
>
  1. Ice log Cancel events with wrong sequence code NOT FIXED
 

-- AlessioGianelle - 2010-09-08

Revision 342010-11-17 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"
Line: 295 to 295
 Warning - JobPurging not allowed (Proxy exception: The delegated Proxy has expired) This happen when the job proxy has expired.
Changed:
<
<
  1. Bug #75368 FIXED ... to check in the next tag A "DONE OK" job is marked as "ABORTED". The problem is that a failed job should not been aborted (resubmission is possibile only for DOne Failed jobs)
>
>
  1. Bug #75368 NOT FIXED A "DONE OK" job is marked as "ABORTED". The problem is that a failed job should not been aborted (resubmission is possibile only for Done Failed jobs). Another case:
     Event: Abort
    - Arrived                    =    Wed Nov 17 13:31:54 2010 CET
    - Host                       =    cream-46.pd.infn.it
    - Reason                     =    BLAH error: submission command failed (exit code = 1) (stdout:) (stderr:Connection timed out-qsub: cannot connect to server gridba3.ba.infn.it (errno=110) Connection timed out-TERM environment variable not set.-) N/A (jobId = CREAM809848230)
    - Source                     =    LogMonitor
    - Timestamp                  =    Wed Nov 17 13:31:53 2010 CET
    - User                       =    /C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Alessio Gianelle/CN=proxy/CN=proxy
       ---
    Event: Resubmission
    - Arrived                    =    Wed Nov 17 13:31:54 2010 CET
    - Host                       =    cream-46.pd.infn.it
    - Reason                     =    Job resubmitted by ICE
    - Result                     =    WILLRESUB
    - Source                     =    LogMonitor
    - Tag                        =    unavailable
    - Timestamp                  =    Wed Nov 17 13:31:54 2010 CET
    - User                       =    /C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Alessio Gianelle/CN=proxy/CN=proxy
 
  1. St9exception NOT FIXED . After the "get-output" of a dag, the nodes are in "DONE-OK" state. If you try a get-output at this point for a node you have:
     [ale@cream-03 ~]$ glite-wms-job-output https://devel17.cnaf.infn.it:9000/-ZpRldEJcENxhNiwMqidiw
    
Line: 312 to 328
 Error - AdSyntaxException The following parsing error(s) have been found: 'node_type' must be "dag"
Changed:
<
<
  1. Bug #75402 NOT FIXED Synchronization loss between real validity of proxy and exp. time saved in ICE's database; this can happen when the copy of the new proxy fails
>
>
  1. Bug #75402 FIXED To verify with the new tag Synchronization loss between real validity of proxy and exp. time saved in ICE's database; this can happen when the copy of the new proxy fails
 2010-11-15 10:57:41,869 INFO - DNProxyManager::setUserProxyIfLonger_Legacy() - Setting user proxy to [ /var/glite/SandboxDir/jI/https_3a_2f_2fdevel17.cnaf.infn.it_3a9000_2fjIwx1BAneLSLa93u3CEeAQ/user.proxy] copied to /var/glite/ice/persist_dir/B23D0D7177A8B6234F1985493FA09FF41A4FA98C.proxy] because the old one is less long-lived. 2010-11-15 10:57:42,019 ERROR - DNProxyManager::setUserProxyIfLonger_Legacy() - Error copying proxy [/var/glite/SandboxDir/jI/https_3a_2f_2fdevel17.cnaf.infn.it_3a9000_2fjIwx1BAneLSLa93u3CEeAQ/user.proxy] to [/var/glite/ice/persist_dir/B23D0D7177A8B6234F1985493FA09FF41A4FA98C.proxy]. 2010-11-15 10:57:42,019 DEBUG - DNProxyManager::setUserProxyIfLonger_Legacy() - New proxy [/var/glite/SandboxDir/jI/https_3a_2f_2fdevel17.cnaf.infn.it_3a9000_2fjIwx1BAneLSLa93u3CEeAQ/user.proxy] has been copied into [/var/glite/ice/persist_dir/B23D0D7177A8B6234F1985493FA09FF41A4FA98C.proxy] - New Expiration Time is [Tue Nov 16 10:56:34 2010]
Added:
>
>
  1. Problem with job's proxy expired in ice FIXED To verify with the new tag . When proxy expired the jobs in ice queue are not correctly removed
    2010-11-17 10:08:37,269 DEBUG - iceCommandLBLogging::execute - TID=[] Will not LOG anything to LB for Job [https://devel17.cnaf.infn.it:9000/ZSwHnJmAMblpLYKwz69mFQ] for reason: CreamJobID [CREAM898633002] disappeared from ICE database !
  2. NOT FIXED If the request for a node expiry, an abort event is logged also for the parent:
    ======================= glite-wms-job-status Success =====================
    BOOKKEEPING INFORMATION:
    
    Status info for the Job : https://devel17.cnaf.infn.it:9000/xLtssR8L6VPnOW4oCdRsNQ
    Current Status:     Running 
    Status Reason:      request expired
    Submitted:          Wed Nov 17 13:28:54 2010 CET
    ==========================================================================
    
    - Nodes information for: 
        Status info for the Job : https://devel17.cnaf.infn.it:9000/UpxeviPCgDgJ9augCANhuw
        Current Status:     Running 
        Status Reason:      Job successfully submitted to Globus
        Destination:        ce104.cern.ch:2119/jobmanager-lcglsf-grid_dteam
        Submitted:          Wed Nov 17 13:28:54 2010 CET
    ==========================================================================
        
        Status info for the Job : https://devel17.cnaf.infn.it:9000/m-UxeYg2ZlJX7ZPD-hv9pQ
        Current Status:     Running 
        Status Reason:      Job successfully submitted to Globus
        Destination:        atlas-ce-02.roma1.infn.it:2119/jobmanager-lcglsf-atlasgcert
        Submitted:          Wed Nov 17 13:28:54 2010 CET
    ==========================================================================
        
        Status info for the Job : https://devel17.cnaf.infn.it:9000/nLtXmSxRIapqZERWBtYP6Q
        Current Status:     Aborted 
        Status Reason:      request expired
        Submitted:          Wed Nov 17 13:28:54 2010 CET
    ==========================================================================
    in workload_manager log file:
    17 Nov, 13:38:57 -E: [Error] unrecoverable_collection(submit_request.cpp:108): https://devel17.cnaf.infn.it:9000/xLtssR8L6VPnOW4oCdRsNQ: unable to retrieve children information from jobstatus
    17 Nov, 13:38:57 -E: [Error] unrecoverable(submit_request.cpp:126): https://devel17.cnaf.infn.it:9000/xLtssR8L6VPnOW4oCdRsNQ failed (request expired)
    17 Nov, 13:38:58 -E: [Error] unrecoverable(submit_request.cpp:126): https://devel17.cnaf.infn.it:9000/nLtXmSxRIapqZERWBtYP6Q failed (request expired)
 

Revision 332010-11-16 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"
Line: 312 to 312
 Error - AdSyntaxException The following parsing error(s) have been found: 'node_type' must be "dag"
Added:
>
>
  1. Bug #75402 NOT FIXED Synchronization loss between real validity of proxy and exp. time saved in ICE's database; this can happen when the copy of the new proxy fails
    2010-11-15 10:57:41,869 INFO - DNProxyManager::setUserProxyIfLonger_Legacy() - Setting user proxy to [ /var/glite/SandboxDir/jI/https_3a_2f_2fdevel17.cnaf.infn.it_3a9000_2fjIwx1BAneLSLa93u3CEeAQ/user.proxy] copied to /var/glite/ice/persist_dir/B23D0D7177A8B6234F1985493FA09FF41A4FA98C.proxy] because the old one is less long-lived.
    2010-11-15 10:57:42,019 ERROR - DNProxyManager::setUserProxyIfLonger_Legacy() - Error copying proxy [/var/glite/SandboxDir/jI/https_3a_2f_2fdevel17.cnaf.infn.it_3a9000_2fjIwx1BAneLSLa93u3CEeAQ/user.proxy] to [/var/glite/ice/persist_dir/B23D0D7177A8B6234F1985493FA09FF41A4FA98C.proxy].
    2010-11-15 10:57:42,019 DEBUG - DNProxyManager::setUserProxyIfLonger_Legacy() - New proxy [/var/glite/SandboxDir/jI/https_3a_2f_2fdevel17.cnaf.infn.it_3a9000_2fjIwx1BAneLSLa93u3CEeAQ/user.proxy] has been copied into [/var/glite/ice/persist_dir/B23D0D7177A8B6234F1985493FA09FF41A4FA98C.proxy] - New Expiration Time is [Tue Nov 16 10:56:34 2010]
 

Revision 322010-11-16 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"
Line: 308 to 308
 
  1. Dag job doesn't work FIXED ... to check in the next tag . Probably the name of the token is wrongly set.
  2. Bug #73715 NOT FIXED the Really Running event is not logged from JC/LM (instead it works for ICE).
Added:
>
>
  1. The "type" attribute is case sensitive NOT FIXED
    Error - AdSyntaxException
    The following parsing error(s) have been found:
    'node_type' must be "dag"
 

Revision 312010-11-16 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"
Line: 295 to 295
 Warning - JobPurging not allowed (Proxy exception: The delegated Proxy has expired) This happen when the job proxy has expired.
Changed:
<
<
  1. A "DONE OK" job is marked as "ABORTED" NOT FIXED
>
>
  1. Bug #75368 FIXED ... to check in the next tag A "DONE OK" job is marked as "ABORTED". The problem is that a failed job should not been aborted (resubmission is possibile only for DOne Failed jobs)
 
  1. St9exception NOT FIXED . After the "get-output" of a dag, the nodes are in "DONE-OK" state. If you try a get-output at this point for a node you have:
     [ale@cream-03 ~]$ glite-wms-job-output https://devel17.cnaf.infn.it:9000/-ZpRldEJcENxhNiwMqidiw
Line: 304 to 304
  Error - getOutputFileList Error (St9exception)
Changed:
<
<
  1. The state of the nodes don't change after the output retrieval of the parent NOT FIXED
>
>
  1. The state of the nodes doesn't change neither after the output retrieval of the parent nor after the cancellation of the parent NOT FIXED
 
Changed:
<
<
  1. Dag job doesn't work NOT FIXED . Probably the name of the token is wrongly set.
>
>
  1. Dag job doesn't work FIXED ... to check in the next tag . Probably the name of the token is wrongly set.
  2. Bug #73715 NOT FIXED the Really Running event is not logged from JC/LM (instead it works for ICE).
 

Revision 302010-11-12 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"
Line: 212 to 212
 Status Reason: Warning: job exit code = 0 Destination: dagman
Changed:
<
<
and nodes stuck in "Submitted". NOT FIXED . Fixed (to be checked in next tag)
>
>
and nodes stuck in "Submitted". FIXED .
 
  1. BUG #75223 NOT FIXED When a job failed logged reason is wrong
    Event: Done
Line: 296 to 296
  (Proxy exception: The delegated Proxy has expired) This happen when the job proxy has expired.
  1. A "DONE OK" job is marked as "ABORTED" NOT FIXED
Added:
>
>

  1. St9exception NOT FIXED . After the "get-output" of a dag, the nodes are in "DONE-OK" state. If you try a get-output at this point for a node you have:
     [ale@cream-03 ~]$ glite-wms-job-output https://devel17.cnaf.infn.it:9000/-ZpRldEJcENxhNiwMqidiw
 
Added:
>
>
Connecting to the service https://cream-46.pd.infn.it:7443/glite_wms_wmproxy_server
 
Added:
>
>
Error - getOutputFileList Error (St9exception)
  1. The state of the nodes don't change after the output retrieval of the parent NOT FIXED
  2. Dag job doesn't work NOT FIXED . Probably the name of the token is wrongly set.
 

Revision 292010-11-11 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"
Line: 204 to 204
  two packages missing... NOT FIXED
  1. Resubmission for jobs submitted with -r option is not done. Jobs are aborted with "hit job shallow retry count (0)" even if ShallowRetryCount was set in the JDL
Changed:
<
<
NOT FIXED . Retested after the commit done on branch 3.3: seems working. To be double checked in next tag
>
>
FIXED .
 
  1. Problems with DAG. DAG job reported as failed:
    Current Status:     Done (Exit Code !=0)

Revision 282010-11-11 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"
Line: 203 to 203
 Error: Missing Dependency: mod_fastcgi >= 2.4.3 is needed by package glite-WMS-3.3.0-0.sl5.x86_64 (ETICS-volatile-build-5b42070f-c48b-4e7b-9819-48810222a0b3-sl5_x86_64_gcc412) two packages missing... NOT FIXED
Changed:
<
<
  1. Resubmission for jobs submitted with -r option is not done. Jobs are aborted with "hit job shallow retry count (0)". ShallowRetryCount was not set in the JDL
>
>
  1. Resubmission for jobs submitted with -r option is not done. Jobs are aborted with "hit job shallow retry count (0)" even if ShallowRetryCount was set in the JDL
  NOT FIXED . Retested after the commit done on branch 3.3: seems working. To be double checked in next tag
  1. Problems with DAG. DAG job reported as failed:
Line: 214 to 214
  and nodes stuck in "Submitted". NOT FIXED . Fixed (to be checked in next tag)
Changed:
<
<
  1. Wrong "tag" in Logged reason NOT FIXED .
    Logged Reason(s): LM_log_done_beginThu Oct 21 15:12:25 WEST 2010: prologue failed with error 1
>
>
  1. BUG #75223 NOT FIXED When a job failed logged reason is wrong
    Event: Done
    - Arrived                    =    Thu Nov 11 14:06:54 2010 CET
    - Exit code                  =    1
    - Host                       =    cream-46.pd.infn.it
    - Reason                     =    LM_log_done_beginThu Nov 11 14:03:17 CET 2010: prologue failed with error 1
    - Source                     =    LogMonitor
    - Src instance               =    unique
    - Status code                =    FAILED
    - Timestamp                  =    Thu Nov 11 14:06:54 2010 CET
 
  1. Problem installing new build (07/11/1010) NOT FIXED .
      --> Missing Dependency: mod_fastcgi >= 2.4.3 is needed by package glite-wms-wmproxy-3.3.0-4.sl5.x86_64 (ETICS-name-patch_2876_1)
  2. Problem starting bdii:

Revision 272010-11-11 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"
Line: 140 to 140
 
    • install condor-lcg-1.1.0-1
    • set GT2_GAHP = /opt/condor-7.4.1/sbin/gahp_server and GRID_MONITOR = /opt/condor-7.4.1/libexec/glite/grid_monitor.sh on /opt/condor-c/local.<$HOSTNAME>/condor_config.local
Changed:
<
<
Waiting for Marteen investigation.... NOT FIXED Decided to use the second option. Changes done in yaim-wms. To be double checked with next tag
>
>
Waiting for Marteen investigation.... FIXED Decided to use the second option. Changes done in yaim-wms.
 
  1. BUG #73192 Submission failed:
    [ale@egee-rb-03 UI]$ glite-wms-job-submit -a --config ~/UI/etc/wmp_cream-03.conf jdl/env.jdl 
Line: 174 to 174
  done Salvet said: Installed MySQL library is slightly newer than library used for glite-lb-bkserver build. This is normal. No Problem
Changed:
<
<
  1. Bug #73206 Collection doesn't work. The only suspicious messages are in /var/log/messages:
>
>
  1. Bug #73206 Collection doesn't work. The only suspicious messages are in /var/log/messages:
  Sep 10 15:50:48 cream-44 glite_wms_wmproxy_server[6179]: ts=2010-09-10T13:50:48Z : event=wms.wmpserver_setJobFileSystem() : userid=18118 jobid=https://devel07.cnaf.infn.it:9000/pvRt15v6hiIBbEKj0UeeAw Sep 10 15:50:48 cream-44 glite_wms_wmproxy_server[6179]: ts=2010-09-10T13:50:48Z : event=wms.wmpserver_setSubjobFileSystem() : userid=18118 jobid=https://devel07.cnaf.infn.it:9000/pvRt15v6hiIBbEKj0UeeAw Sep 10 15:50:50 cream-44 glite_wms_wmproxy_server[6179]: ts=2010-09-10T13:50:50Z : event=wms.wmpserver_submit() : userid=18118 jobid=https://devel07.cnaf.infn.it:9000/pvRt15v6hiIBbEKj0UeeAw Sep 10 15:50:50 cream-44 kernel: glite_wms_wmpro[6179] general protection rip:3e86c797e0 rsp:7fff60c65148 error:0
Changed:
<
<
Is it a LB's problem? NOT FIXED Fixed (to be double checked in next tag)
>
>
FIXED
 
  1. Bug #72970 I'm running on the WMS an lbserver in "both" mode:
     9197 ?        S      0:00 /opt/glite/bin/glite-lb-bkserverd --notif-il-sock=/tmp/glite-lb-notif.sock --notif-il-fprefix=/var/tmp/glite-lb-notif -c /home/glite/.certs/hostcert.pem -k /home/glite/.certs/hostkey.pem -i /var/glite/glite-lb-bkserverd.pid --dump-prefix /var/glite/dump --purge-prefix /var/glite/purge -B --proxy-il-sock /tmp/glite-lbproxy-ilog.sock --proxy-il-fprefix /tmp/glite-lbproxy-ilog_events --policy /opt/glite/etc/glite-lb/glite-lb-authz.conf
Line: 271 to 271
 Transport endpoint is not connected;; edg_wll_gss_read_full;; GSS Error: EOF occured;) The httpd-wmproxy-errors.log says:
[Wed Nov 10 17:10:44 2010] [error] [client 193.206.210.108] FastCGI: incomplete headers (0 bytes) received from server "/opt/glite/bin/glite_wms_wmproxy_server"
Changed:
<
<
  1. Cleared event not logged NOT FIXED .
>
>
  1. Cleared event not logged FIXED .
 [ale@cream-03 ~]$ glite-wms-job-output https://devel07.cnaf.infn.it:9000/IvaPn8c4ezVSiHbQP8FJOg

Connecting to the service https://cream-46.pd.infn.it:7443/glite_wms_wmproxy_server

Warning - JobPurging not allowed (The Operation is not allowed: Unable to complete job purge)

Changed:
<
<
Problem there is a problem in the LB server configuration Under investigation...
>
>
You need to set the WMS DN in both ACTION "READ_ALL" and "LOG_WMS_EVENTS" of glite-lb-authz.conf file of the LB Server
 
  1. Cleared event not logged NOT FIXED .
    [ale@cream-03 ~]$ glite-wms-job-output https://devel07.cnaf.infn.it:9000/Hn0VS-oMpYlT7bdAIXNB7g
    

Revision 262010-11-11 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"
Line: 236 to 236
  what(): basic_string::at Aborted It happen when shallow resubmission is set to -1.
Changed:
<
<
  1. Submission failed NOT FIXED Problem disappear using a new LB server
>
>
  1. Submission failed NOT FIXED Problem not disappear using a new LB server
 Warning - Unable to register the job to the service: https://cream-46.pd.infn.it:7443/glite_wms_wmproxy_server HTTP Error
Line: 271 to 271
 Transport endpoint is not connected;; edg_wll_gss_read_full;; GSS Error: EOF occured;) The httpd-wmproxy-errors.log says:
[Wed Nov 10 17:10:44 2010] [error] [client 193.206.210.108] FastCGI: incomplete headers (0 bytes) received from server "/opt/glite/bin/glite_wms_wmproxy_server"
Added:
>
>
  1. Cleared event not logged NOT FIXED .
    [ale@cream-03 ~]$ glite-wms-job-output https://devel0