Certification report patch 3621
Author(s): Elisabetta Molinari & Alessio Gianelle
Outcome:
in certification...
Clean installation
Upgrade from production
- Starting from a Production WMS we update it.
Test Report
The test report has been produced following the guidelines from
here
List Match
List match without data
- without data:
- tried with the following
cat myjob-toICE.jdl
[
Type = "Job";
JobType = "normal";
InputSandbox = { "file:///home/emolinari/test.sh"};
VirtualOrganisation = "dteam";
Executable="test.sh";
Arguments="Hello ";
Requirements = ( RegExp("/cream-",other.GlueCEUniqueID));
Rank = 0;
fuzzyrank = true;
StdOutput="message.txt";
StdError="err.log";
OutputSandbox={"message.txt","err.log",".BrokerInfo"};
usertags = [ jdl = "normal job to ICE" ];
RetryCount = 0;
ShallowRetryCount = 3;
]
glite-wms-job-list-match --config glite_wms_devel20.conf -a myjob-toICE.jdl
Connecting to the service https://devel20.cnaf.infn.it:7443/glite_wms_wmproxy_server
==========================================================================
COMPUTING ELEMENT IDs LIST
The following CE(s) matching your job requirements have been found:
*CEId*
- atlas-creamce-01.roma1.infn.it:8443/cream-lsf-atlasgcert
- bocecream.bo.infn.it:8443/cream-pbs-cert
- bocecream.bo.infn.it:8443/cream-pbs-certSL5
- cccreamceli01.in2p3.fr:8443/cream-bqs-medium
- cccreamceli01.in2p3.fr:8443/cream-bqs-short
- ce01-lcg.cr.cnaf.infn.it:8443/cream-lsf-dteam
- ce07-lcg.cr.cnaf.infn.it:8443/cream-lsf-dteam
- ce201.cern.ch:8443/cream-lsf-grid_2nh_dteam
- ce201.cern.ch:8443/cream-lsf-grid_dteam
- ce202.cern.ch:8443/cream-lsf-grid_2nh_dteam
- ce202.cern.ch:8443/cream-lsf-grid_dteam
- cert-15.pd.infn.it:8443/cream-lsf-cert
- cream-38.pd.infn.it:8443/cream-pbs-creamtest1
- cream-38.pd.infn.it:8443/cream-pbs-creamtest2
- cream-ce.ct.infn.it:8443/cream-lsf-cert
- cream-ce.pr.infn.it:8443/cream-pbs-cert
- cream-ce.research-infrastructures.eu:8443/cream-pbs-cert
- devce.cnaf.infn.it:8443/cream-pbs-cert
- gridce0.pi.infn.it:8443/cream-lsf-cert
- prod-ce-01.pd.infn.it:8443/cream-lsf-cert
- t2-ce-01.to.infn.it:8443/cream-pbs-cert
- t2-ce-01.to.infn.it:8443/cream-pbs-short
- t2-ce-05.lnl.infn.it:8443/cream-lsf-cert1
==========================================================================
- tried substituting the Requirement with
Requirements = ( !RegExp("/cream-",other.GlueCEUniqueID));
glite-wms-job-list-match --config glite_wms_devel20.conf -a myjob-toLcg.jdl
Connecting to the service https://devel20.cnaf.infn.it:7443/glite_wms_wmproxy_server
==========================================================================
COMPUTING ELEMENT IDs LIST
The following CE(s) matching your job requirements have been found:
*CEId*
- argoce01.na.infn.it:2119/jobmanager-lcgpbs-cert
- atlas-ce-01.roma1.infn.it:2119/jobmanager-lcglsf-atlasgcert
- atlas-ce-02.roma1.infn.it:2119/jobmanager-lcglsf-atlasgcert
- atlasce01.na.infn.it:2119/jobmanager-lcgpbs-cert
- boalice3.bo.infn.it:2119/jobmanager-lcgpbs-cert
- boalice3.bo.infn.it:2119/jobmanager-lcgpbs-certSL5
- cclcgceli01.in2p3.fr:2119/jobmanager-bqs-long
- cclcgceli01.in2p3.fr:2119/jobmanager-bqs-medium
- cclcgceli01.in2p3.fr:2119/jobmanager-bqs-short
- cclcgceli02.in2p3.fr:2119/jobmanager-bqs-long
- cclcgceli02.in2p3.fr:2119/jobmanager-bqs-medium
- cclcgceli02.in2p3.fr:2119/jobmanager-bqs-short
- cclcgceli03.in2p3.fr:2119/jobmanager-bqs-long
- cclcgceli03.in2p3.fr:2119/jobmanager-bqs-medium
- cclcgceli03.in2p3.fr:2119/jobmanager-bqs-short
- cclcgceli04.in2p3.fr:2119/jobmanager-bqs-long
- cclcgceli04.in2p3.fr:2119/jobmanager-bqs-medium
- cclcgceli04.in2p3.fr:2119/jobmanager-bqs-short
- cclcgceli07.in2p3.fr:2119/jobmanager-bqs-long
- cclcgceli07.in2p3.fr:2119/jobmanager-bqs-medium
- cclcgceli07.in2p3.fr:2119/jobmanager-bqs-short
- cclcgceli08.in2p3.fr:2119/jobmanager-bqs-long
- cclcgceli08.in2p3.fr:2119/jobmanager-bqs-medium
- cclcgceli08.in2p3.fr:2119/jobmanager-bqs-short
- ce-01.grid.sissa.it:2119/jobmanager-lcgpbs-cert
- ce-01.roma3.infn.it:2119/jobmanager-lcgpbs-cert
- ce01-lhcb-t2.cr.cnaf.infn.it:2119/jobmanager-lcglsf-cert_t2
- ce02-lhcb-t2.cr.cnaf.infn.it:2119/jobmanager-lcglsf-cert_t2
- ce04-lcg.cr.cnaf.infn.it:2119/jobmanager-lcglsf-dteam
- ce05-lcg.cr.cnaf.infn.it:2119/jobmanager-lcglsf-dteam
- ce06-lcg.cr.cnaf.infn.it:2119/jobmanager-lcglsf-dteam
- ce103.cern.ch:2119/jobmanager-lcglsf-grid_2nh_dteam
- ce103.cern.ch:2119/jobmanager-lcglsf-grid_dteam
- ce104.cern.ch:2119/jobmanager-lcglsf-grid_2nh_dteam
- ce104.cern.ch:2119/jobmanager-lcglsf-grid_dteam
- ce105.cern.ch:2119/jobmanager-lcglsf-grid_2nh_dteam
- ce105.cern.ch:2119/jobmanager-lcglsf-grid_dteam
- ce106.cern.ch:2119/jobmanager-lcglsf-grid_2nh_dteam
- ce106.cern.ch:2119/jobmanager-lcglsf-grid_dteam
- ce107.cern.ch:2119/jobmanager-lcglsf-grid_2nh_dteam
- ce107.cern.ch:2119/jobmanager-lcglsf-grid_dteam
- ce112.cern.ch:2119/jobmanager-lcglsf-grid_2nh_dteam
- ce112.cern.ch:2119/jobmanager-lcglsf-grid_dteam
- ce113.cern.ch:2119/jobmanager-lcglsf-grid_2nh_dteam
- ce113.cern.ch:2119/jobmanager-lcglsf-grid_dteam
- ce114.cern.ch:2119/jobmanager-lcglsf-grid_2nh_dteam
- ce114.cern.ch:2119/jobmanager-lcglsf-grid_dteam
- ce124.cern.ch:2119/jobmanager-lcglsf-grid_2nh_dteam
- ce124.cern.ch:2119/jobmanager-lcglsf-grid_dteam
- ce125.cern.ch:2119/jobmanager-lcglsf-grid_2nh_dteam
- ce125.cern.ch:2119/jobmanager-lcglsf-grid_dteam
- ce126.cern.ch:2119/jobmanager-lcglsf-grid_2nh_dteam
- ce126.cern.ch:2119/jobmanager-lcglsf-grid_dteam
- ce127.cern.ch:2119/jobmanager-lcglsf-grid_2nh_dteam
- ce127.cern.ch:2119/jobmanager-lcglsf-grid_dteam
- ce128.cern.ch:2119/jobmanager-lcglsf-grid_2nh_dteam
- ce128.cern.ch:2119/jobmanager-lcglsf-grid_dteam
- ce129.cern.ch:2119/jobmanager-lcglsf-grid_2nh_dteam
- ce129.cern.ch:2119/jobmanager-lcglsf-grid_dteam
- ce130.cern.ch:2119/jobmanager-lcglsf-grid_2nh_dteam
- ce130.cern.ch:2119/jobmanager-lcglsf-grid_dteam
- ce131.cern.ch:2119/jobmanager-lcglsf-grid_2nh_dteam
- ce131.cern.ch:2119/jobmanager-lcglsf-grid_dteam
- ce132.cern.ch:2119/jobmanager-lcglsf-grid_2nh_dteam
- ce132.cern.ch:2119/jobmanager-lcglsf-grid_dteam
- ce133.cern.ch:2119/jobmanager-lcglsf-grid_2nh_dteam
- ce133.cern.ch:2119/jobmanager-lcglsf-grid_dteam
- cmsce01.na.infn.it:2119/jobmanager-lcgpbs-cert
- grid-ce-01.ba.infn.it:2119/jobmanager-lcgpbs-cert
- grid-ce.lns.infn.it:2119/jobmanager-lcgpbs-cert
- grid-ce.lns.infn.it:2119/jobmanager-lcgpbs-infinite
- grid-ce.lns.infn.it:2119/jobmanager-lcgpbs-long
- grid-ce.lns.infn.it:2119/jobmanager-lcgpbs-short
- grid-ce2.pr.infn.it:2119/jobmanager-pbs-cert
- grid-eo-engine04.esrin.esa.int:2119/jobmanager-lcgpbs-cert
- grid0.fe.infn.it:2119/jobmanager-lcgpbs-cert
- grid001.ts.infn.it:2119/jobmanager-lcglsf-cert
- grid002.ca.infn.it:2119/jobmanager-lcglsf-cert
- grid01.ge.infn.it:2119/jobmanager-lcglsf-cert
- grid012.ct.infn.it:2119/jobmanager-lcglsf-cert
- gridce.ilc.cnr.it:2119/jobmanager-lcgpbs-cert
- gridce.pg.infn.it:2119/jobmanager-lcgpbs-cert
- gridce.sns.it:2119/jobmanager-lcgpbs-cert
- gridce1.pi.infn.it:2119/jobmanager-lcglsf-cert
- gridce2.pi.infn.it:2119/jobmanager-lcglsf-cert
- gridit-ce-001.cnaf.infn.it:2119/jobmanager-lcgpbs-cert
- griditce01.na.infn.it:2119/jobmanager-lcgpbs-cert
- lcg-ce.research-infrastructures.eu:2119/jobmanager-lcgpbs-cert
- linucs-ce-01.cs.infn.it:2119/jobmanager-lcgpbs-atlasgcert
- pamelace01.na.infn.it:2119/jobmanager-lcgpbs-cert
- pbs-enmr.cerm.unifi.it:2119/jobmanager-lcgpbs-cert
- prod-ce-02.pd.infn.it:2119/jobmanager-lcglsf-cert
- t2-ce-01.lnl.infn.it:2119/jobmanager-lcglsf-cert1
- t2-ce-01.mi.infn.it:2119/jobmanager-lcgpbs-cert
- t2-ce-02.lnl.infn.it:2119/jobmanager-lcglsf-cert1
- t2-ce-02.mi.infn.it:2119/jobmanager-lcgcondor-cert
- t2-ce-02.to.infn.it:2119/jobmanager-lcgpbs-cert
- t2-ce-02.to.infn.it:2119/jobmanager-lcgpbs-short
- t2-ce-03.lnl.infn.it:2119/jobmanager-lcglsf-cert1
- t2-ce-04.lnl.infn.it:2119/jobmanager-lcglsf-cert1
- t2-ce-06.lnl.infn.it:2119/jobmanager-lcglsf-cert1
- test7200a.cnaf.infn.it:2119/jobmanager-lcgpbs-cert
- test7200a.cnaf.infn.it:2119/jobmanager-lcgpbs-parallel
- virgo-ce.roma1.infn.it:2119/jobmanager-lcgpbs-cert
List match with data
- with data:
Submission/GetOutput
Normal Jobs
-
Normal
jobs through
- ICE work:
- JC work:
DAG jobs
-
Dag
jobs through:
- JC work:
Collection jobs
-
Collection
jobs through:
- ICE work:
- JC work:
- also job-output for collections works even though only the parent node is set to 'Cleared'
Parametric jobs
-
Parametric
jobs through:
- ICE work:
- JC work:
-
Bulk
jobs sent both through ICE and JC and RetryCount = 0; :
- Submit a bulk of 3 jobs -> success 100%
both to ICE and JC
- Submit a bulk of 50 jobs -> success 100%
both to ICE and JC
- Submit a bulk of 100 jobs -> success 100%
both to ICE and JC
- Submit a bulk of 500 jobs -> success 99.9%
both to ICE and JC
- Submit a bulk of 1000 jobs -> success 99.9%
both to ICE and JC
- bulk test report to JC here
- bulk test report to ICE here
Perusal jobs
-
Perusal
jobs through:
- JC work:
- ICE work:
-
MPICH
jobs:
Cancel
- Normal jobs
- ICE:
- JC:
- Dag:
-
- Note that children nodes in status 'submitted' don't get cancelled
- Collection
- ICE:
- JC:
- Node of a collection:
- Note: collections stay in status 'waiting' when all the nodes are Done (Success) except for one that is 'Cancelled'
Others
-
BrokerInfo
- ICE creation
test report here
- JC creation:
test report here
-
Resubmission
- Shallow:
- Deep:
-
Job Recovery
- Tested with a few collections re-starting the wm while some node jobs are still in a 'submitted or 'waiting' status
-
Prologue
and Epilogue
jobs
- ICE:
- JC:
Check bugs
- Bug #42288
: Problem in forwarding cerequirements to a CREAM CE FIXED
- description of the problem --> "The parameters to be forwarded specified in the Requirements attribute of the .jdl classad are NOT considered and ICE does not send them to the CE, therefore the classad passed to BLAH does not contain them"
- submitted the following .jdl via WMS:
cat myjob_forwardReq.jdl
[
Type = "Job";
JobType = "normal";
InputSandbox = { "file:///home/emolinari/test.sh"};
VirtualOrganisation = "dteam";
Executable="test.sh";
Arguments="Hello ";
requirements = (other.GlueCEUniqueID == "cream-19.pd.infn.it:8443/cream-lsf-testbedB_1") && (other.GlueHostMainMemoryRAMSize >= 0) ;
Rank = 0;
myproxy = myproxy.cnaf.infn.it;
fuzzyrank = true;
StdOutput="message.txt";
StdError="err.log";
OutputSandbox={"message.txt","err.log",".BrokerInfo"};
RetryCount = 0;
ShallowRetryCount = 3;
]
- checked in the ice log file on the WMS, /var/log/glite/ice.log, that the CERequirement field of the .jdl gets populated as in the following
CeRequirements = "true && ( true && ( true && ( other.GlueHostMainMemoryRAMSize >= 0 ) ) )";
- checked on the CE that blah generates the correct classad with the requirements to be forwarded, as in the following:
cat /tmp/subfile
#!/bin/bash
# LSF job wrapper generated by lsf_submit.sh
# on Wed Apr 14 19:13:10 CEST 2010
#
# LSF directives:
#BSUB -L /bin/bash
#BSUB -J cre19_725998524
#BSUB -q testbedB_1
#BSUB -R "select[mem>=0]"
......
- Bug #48910
: Failure starting LM if its output jobdir doesn't exist; unprotected chown in WM/LM/JC startup scripts FIXED
- Stopped gLite services and deleted the jobdir under '/var/glite/workload_manager'
[root@wms007 jobdir]# service gLite stop
[...]
[root@wms007 workload_manager]# pwd
/var/glite/workload_manager
[root@wms007 workload_manager]# ls
ismdump.fl jobdir
[root@wms007 workload_manager]# rm -rf jobdir
[root@wms007 workload_manager]# ls
ismdump.fl
- Stopped gLite services and deleted the jobdir under '/var/glite/jobcontrol'
[root@wms007 jobcontrol]# pwd
/var/glite/jobcontrol
[root@wms007 jobcontrol]# rm -rf jobdir
[root@wms007 jobcontrol]# ls
condorio submit
- Stopped gLite services and deleted the jobdir under '/var/glite/ice'
[root@wms007 ice]# pwd
/var/glite/ice
[root@wms007 ice]# ls
jobdir persist_dir
[root@wms007 ice]# rm -rf jobdir/
[root@wms007 ice]# ls
persist_dir
- Stopped gLite services and deleted all the jobdirs
[root@wms007 glite]# ls workload_manager/ jobcontrol/ ice/
ice/:
persist_dir
jobcontrol/:
condorio submit
workload_manager/:
ismdump.fl
- Comment Input/InputType parameter in wms conf file (Sections: ICE, WorkloadManager and JobController).
- Try to start JobController:
[root@wms007 workload_manager]# /opt/glite/etc/init.d/glite-wms-jc start JobController
Starting !JobController daemon(s)
Please set Input parameter in glite_wms.conf - JC section [FAILED]
[root@wms007 workload_manager]# /opt/glite/etc/init.d/glite-wms-jc status JobController
JobController stopped.
- Try to start LogMonitor:
[root@wms007 workload_manager]# /opt/glite/etc/init.d/glite-wms-lm start
Starting LogMonitor...Please set Input parameter in glite_wms.conf - WM section
[FAILED]
[root@wms007 workload_manager]# /opt/glite/etc/init.d/glite-wms-lm status
LogMonitor stopped.
- Try to start ICE:
[root@wms007 workload_manager]# /opt/glite/etc/init.d/glite-wms-ice start
starting ICE... failure
[root@wms007 workload_manager]# /opt/glite/etc/init.d/glite-wms-ice status
/opt/glite/bin/glite-wms-ice-safe is not running
- Try to start WorkloadManager:
[root@wms007 workload_manager]# /opt/glite/etc/init.d/glite-wms-wm start
starting workload manager... Please set Input parameter in - WM section
Please set DispatcherType parameter in - WM section
Please set Input parameter in - JC section
Please set InputType parameter in - JC section
Please set Input parameter in - ICE section
Please set InputType parameter in - ICE section
failure
[root@wms007 workload_manager]# /opt/glite/etc/init.d/glite-wms-wm status
/opt/glite/bin/glite-wms-workload_manager is not running
- Bug #52934
: [ICE] Delegation in ICE doesn't refer to the myproxy server FIXED
- GridJobID: https://devel17.cnaf.infn.it:9000/dj8r_iFRd8tnWH4bThPNeg
- Deleg Proxy ID = [12692524052E32526wms0072Ecnaf2Einfn2Eit]
- Destination: cream-30.pd.infn.it:8443/cream-pbs-cream_B
- Owner = /C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Alessio Gianelle
- MyProxyServer = "myproxy.cern.ch";
- GridJobID: https://devel17.cnaf.infn.it:9000/UNB2dHJNn7euaDP3FvJ3og
- Deleg Proxy ID = [12692523642E948823wms0072Ecnaf2Einfn2Eit]
- Destination: cream-30.pd.infn.it:8443/cream-pbs-cream_B
- Owner = /C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Alessio Gianelle
- MyProxyServer = "myproxy.cnaf.infn.it";
- Bug #52937
: ICE uses the wrong DN to log to LB TO VERIFY
- Bug #53297
: [ yaim-wms ] glite_wms.conf hardcoded parameters FIXED
- tested by setting the parameter 'WMS_CONF_FILE_OVERWRITE' in the ~/siteinfo/services/glite-wms file
- set the parameter 'WMS_CONF_FILE_OVERWRITE' to true: a backup copy of the glite_wms.conf file gets created in /opt/glite/etc/glite_wms.conf.bkp_20100608_101305 and the glite_wms.conf file gets overwritten
- set the parameter 'WMS_CONF_FILE_OVERWRITE' to false: a new copy of the glite_wms.conf file gets created into /opt/glite/etc/glite_wms.conf.yaimnew_20100608_101633
- Bug #53460
: [ICE] Detection of job status changes for CREAM jobs should be improved FIXED
- Using a new CE (1.6) looking in ice's log there is:
2010-03-22 16:47:50,496 INFO - scoped_timer iceCommandEventQuery::execute() - SOAP Connection for QueryEvent - TID=[150673032] 1269272870.288498 1269272870.496129 0.207631
2010-03-22 16:47:50,496 DEBUG - iceCommandEventQuery::execute() - TID=[150673032] There're [2] event(s) for the couple DN [/C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Alessio Gianelle-/dteam/Role=NULL/Capability=NULL] CEUrl [https://cream-30.pd.infn.it:8443/ce-cream/services/CREAM2]
2010-03-22 16:47:50,496 DEBUG - iceCommandEventQuery::execute() - TID=[150673032] Database ID=[1261041182000]
2010-03-22 16:47:50,496 DEBUG - iceCommandEventQuery::execute() - TID=[150673032] Exec time ID=[3]
2010-03-22 16:47:50,496 DEBUG - iceCommandEventQuery::processEventsForJob() - TID=[150673032] Processing [2] event(s) for Job [gridJobID="https://devel17.cnaf.infn.it:9000/uKbQNcbh7kIohBz6bDMNZQ" CREAMJobID="https://cream-30.pd.infn.it:8443/CREAM396193798"] userdn [/C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Alessio Gianelle-/dteam/Role=NULL/Capability=NULL] and ce url [https://cream-30.pd.infn.it:8443/ce-cream/services/CREAM2].
2010-03-22 16:47:50,496 DEBUG - iceCommandEventQuery::processEventsForJob() - TID=[150673032] EventID [685143] timestsamp [1269272804]
2010-03-22 16:47:50,496 INFO - scoped_timer iceCommandEventQuery::processSingleEvent - TID=[150673032] InsertStat 1269272870.496682 1269272870.496864 0.000182
- Using an "old" CE instead the "poller" method is used:
2010-03-22 16:55:55,397 INFO - scoped_timer iceCommandEventQuery::execute() - SOAP Connection for QueryEvent - TID=[150673032] 1269273355.242918 1269273355.397806 0.154888
2010-03-22 16:55:55,397 ERROR - iceCommandEventQuery::execute() - TID=[150673032] Cannot query events for UserDN [/C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Alessio Gianelle-/dteam/Role=NULL/Capability=NULL] CEUrl [https://cream-34.pd.infn.it:8443/ce-cream/services/CREAM2]. Exception Internal ex is [Received NULL fault; the error is due to another cause: FaultString=[No such operation 'QueryEventRequest'] - FaultCode=["http://xml.apache.org/axis/":Client] - FaultSubCode=["http://xml.apache.org/axis/":Client] - FaultDetail=[<ns2:hostname>cream-34.pd.infn.it</ns2:hostname>]]
2010-03-22 16:55:55,398 WARN - iceCommandEventQuery::execute() - TID=[150673032] Not present QueryEvent on CE [https://cream-34.pd.infn.it:8443/ce-cream/services/CREAM2]. Falling back to old-style StatusPoller.
2010-03-22 16:55:55,398 INFO - iceCommandStatusPoller::execute() - Getting [100] jobs to poll for user [/C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Alessio Gianelle-/dteam/Role=NULL/Capability=NULL] creamurl [https://cream-34.pd.infn.it:8443/ce-cream/services/CREAM2]
2010-03-22 16:55:55,398 DEBUG - iceCommandStatusPoller::get_jobs_to_poll() - Collecting jobs to poll for userdn=[/C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Alessio Gianelle-/dteam/Role=NULL/Capability=NULL] creamurl=[https://cream-34.pd.infn.it:8443/ce-cream/services/CREAM2]. LIMIT set to [100]...
- Bug #55103
: [ICE] ICE port 7010 not cleaned up properly FIXED
- Bug #55452
: CMS production struck by waves of "Globus error 10: data transfer to the server failed" FIXED NOT CERTIFIED
- Bug #56636
: [ICE] statistics counters for monitoring FIXED
- Bug #57295
: [ICE] queryDb tool may create empty DB as root FIXED
- Verify:
[root@wms007 ~]# ll /var/glite/ice/persist_dir/ice.db
-rw-r--r-- 1 glite glite 1280000 Mar 22 17:05 /var/glite/ice/persist_dir/ice.db
[root@wms007 ~]# /opt/glite/bin/queryDb -c glite_wms.conf -s RUNNING,REALLY_RUNNING
0 item(s) found
[root@wms007 ~]# ll /var/glite/ice/persist_dir/ice.db
-rw-r--r-- 1 glite glite 1280000 Mar 22 17:05 /var/glite/ice/persist_dir/ice.db
- Bug #57579
: [ICE] Occasionally the ICE's start/stop script doesn't kill the ICE process HOPEFULLY FIXED
- Bug #57596
: [ICE] non resubmission if job failed for proxy expiration FIXED
- Verify:
2010-03-23 10:20:37,696 INFO - iceLBLogger::logEvent() - Job Done Failed Event, ExitCode=[0], FailureReason=[Proxy is expired; /opt/glite/bin/glite-lb-logevent: edg_wll_LogEvent*(): LB server (bkserver,lbproxy) store protocol error (edg_wll_LogEvent(): LB server (bkserver,lbproxy) store protocol error;; Logging library ERROR: LB server (bkserver,lbproxy) store protocol error;; edg_wll_DoLogEvent(): edg_wll_log_connect error Transport endpoint is not connected;; edg_wll_gss_connect();; System Error: Connection refused) /opt/glite/bin/glite-lb-logevent: edg_wll_LogEvent*(): LB server (bkserver,lbproxy) store protocol error (edg_wll_LogEvent(): LB server (bkserver,lbproxy) store protocol error;; Logging library ERROR: LB server (bkserver,lbproxy) store protocol error;; edg_wll_DoLogEvent(): edg_wll_log_connect error Transport endpoint is not connected;; edg_wll_gss_connect();; System Error: Connection refused) Proxy expired: job killed Terminated Master process killed] - [gridJobID="https://devel17.cnaf.infn.it:9000/jw2aeAy1skHY3mRJHCF8YA" CREAMJobID="https://ce202.cern.ch:8443/CREAM030114428"]
2010-03-23 10:20:37,817 DEBUG - iceLBContext::testCode() - L&B call succeeded.
2010-03-23 10:20:37,828 ERROR - Ice::resubmit_job() - Will NOT resubmit job [gridJobID="https://devel17.cnaf.infn.it:9000/jw2aeAy1skHY3mRJHCF8YA" CREAMJobID="https://ce202.cern.ch:8443/CREAM030114428"] because it's Input Sandbox proxy file is not valid: The proxy is EXPIRED!
2010-03-23 10:20:37,828 INFO - iceLBContext::setLoggingJob - Setting log job to jobid=[https://devel17.cnaf.infn.it:9000/jw2aeAy1skHY3mRJHCF8YA] LB server=[devel17.cnaf.infn.it:9000] (port is not used, actually...)
2010-03-23 10:20:37,828 INFO - iceLBLogger::logEvent() - Job Aborted Event, reason=[Input sandbox's proxy is missing. Cannot resubmit job] - [gridJobID="https://devel17.cnaf.infn.it:9000/jw2aeAy1skHY3mRJHCF8YA" CREAMJobID="https://ce202.cern.ch:8443/CREAM030114428"]
- Bug #58099
: WMS purger forces purge of jobs if LB cannot be reached FIXED
- Stop the LBServer and then run the cron purger:
07 Apr, 16:09:13 -E: [Error] query_job_status(purger.cpp:125): https://devel17.cnaf.infn.it:9000/yeoXs2eB1kvOaPp0Mtjthg:: edg_wll_JobStat [111] Connection refused(edg_wll_gss_connect())
[glite@wms007 ~]$
- Verify that the SandBox dir has not been removed:
[glite@wms007 ~]$ ls -l /var/glite/SandboxDir/ye/https_3a_2f_2fdevel17.cnaf.infn.it_3a9000_2fyeoXs2eB1kvOaPp0Mtjthg/
total 16
drwxrwx--- 2 dteam008 glite 4096 Apr 6 14:34 input
drwxrwx--- 2 dteam008 glite 4096 Apr 6 14:46 output
drwxrwx--- 2 dteam008 glite 4096 Apr 6 14:34 peek
lrwxrwxrwx 1 glite glite 102 Apr 6 14:34 user.proxy -> /var/glite/SandboxDir/Uo/https_3a_2f_2fdevel17.cnaf.infn.it_3a9000_2fUow8XY0NGbyumU3PPGMSng/user.proxy
- Restart LBServer and verify that now the SBD of the job is purged:
[glite@wms007 ~]$ /opt/glite/sbin/glite-wms-purgeStorage.sh -p /var/glite/SandboxDir/ye -t 10000
07 Apr, 16:18:07 -I: [Info] operator()(purger.cpp:449): https://devel17.cnaf.infn.it:9000/yeoXs2eB1kvOaPp0Mtjthg: removed DONE job
[glite@wms007 ~]$ ls -l /var/glite/SandboxDir/ye/https_3a_2f_2fdevel17.cnaf.infn.it_3a9000_2fyeoXs2eB1kvOaPp0Mtjthg/
ls: /var/glite/SandboxDir/ye/https_3a_2f_2fdevel17.cnaf.infn.it_3a9000_2fyeoXs2eB1kvOaPp0Mtjthg/: No such file or directory
- Bug #58387
: [ICE] should log a job aborted when it cannot resubmit the job for missing user proxy FIXED
- Verify:
*************************************************************
BOOKKEEPING INFORMATION:
Status info for the Job : https://devel17.cnaf.infn.it:9000/jw2aeAy1skHY3mRJHCF8YA
Current Status: Aborted
Logged Reason(s):
- Proxy is expired; /opt/glite/bin/glite-lb-logevent: edg_wll_LogEvent*(): LB server (bkserver,lbproxy) store protocol error (edg_wll_LogEvent(): LB server (bkserver,lbproxy) store protocol error;; Logging library ERROR: LB server (bkserver,lbproxy) store protocol error;; edg_wll_DoLogEvent(): edg_wll_log_connect error Transport endpoint is not connected;; edg_wll_gss_connect();; System Error: Connection refused) /opt/glite/bin/glite-lb-logevent: edg_wll_LogEvent*(): LB server (bkserver,lbproxy) store protocol error (edg_wll_LogEvent(): LB server (bkserver,lbproxy) store protocol error;; Logging library ERROR: LB server (bkserver,lbproxy) store protocol error;; edg_wll_DoLogEvent(): edg_wll_log_connect error Transport endpoint is not connected;; edg_wll_gss_connect();; System Error: Connection refused) Proxy expired: job killed Terminated Master process killed
Status Reason: Input sandbox's proxy is missing. Cannot resubmit job
Destination: ce202.cern.ch:8443/cream-lsf-grid_dteam
Submitted: Tue Mar 23 09:49:42 2010 CET
*************************************************************
- Bug #58977
: [ICE] Wrong database colum name in ICE SQL query FIXED
- Submit some jobs to a ce (e.g. cream-25.pd.infn.it:8443/cream-lsf-testbedB_2):
[root@wms007 20100409]# queryDb -v -C -G
[https://cream-25.pd.infn.it:8443/CREAM881525184] [https://devel17.cnaf.infn.it:9000/V2Lj0_-XWkrjRKMaf3f6ng]
[https://cream-25.pd.infn.it:8443/CREAM425827870] [https://devel17.cnaf.infn.it:9000/4TKQu1U_daCMMb2mRDR2cA]
[https://cream-25.pd.infn.it:8443/CREAM543141647] [https://devel17.cnaf.infn.it:9000/4wqozVNHEVUXWx5UzUaXqA]
[https://cream-25.pd.infn.it:8443/CREAM769586568] [https://devel17.cnaf.infn.it:9000/XLcaGE3kR3h8-oj8cMWE_A]
[https://cream-25.pd.infn.it:8443/CREAM192029588] [https://devel17.cnaf.infn.it:9000/PuOGoOxMf-pbfFu-wSkACw]
[https://cream-25.pd.infn.it:8443/CREAM378177464] [https://devel17.cnaf.infn.it:9000/T8ZKSu5zZPZY-Ee1gLRX5A]
[https://cream-25.pd.infn.it:8443/CREAM299069473] [https://devel17.cnaf.infn.it:9000/Xh1AMEor9hWOx4picngYkA]
[https://cream-25.pd.infn.it:8443/CREAM012571708] [https://devel17.cnaf.infn.it:9000/YjpCU6dfrLsDBs6wU_D3Hg]
[https://cream-25.pd.infn.it:8443/CREAM561236418] [https://devel17.cnaf.infn.it:9000/00Qc7RnutRORYVOd0ShIKg]
[https://cream-25.pd.infn.it:8443/CREAM972351884] [https://devel17.cnaf.infn.it:9000/ksz80OflJnDE_ynWHmKTwQ]
[https://cream-25.pd.infn.it:8443/CREAM827240561] [https://devel17.cnaf.infn.it:9000/WwfftKdV6_5lSgihPOUsaA]
[https://cream-25.pd.infn.it:8443/CREAM573497695] [https://devel17.cnaf.infn.it:9000/S5zbkyK72hv2LXwUD1vAFw]
[https://cream-25.pd.infn.it:8443/CREAM735112819] [https://devel17.cnaf.infn.it:9000/0J9nTy1tJxkcuRJ9oTZACw]
[https://cream-25.pd.infn.it:8443/CREAM526570551] [https://devel17.cnaf.infn.it:9000/Rcl0TypyUTXMwtLk86R3yA]
[https://cream-25.pd.infn.it:8443/CREAM992848449] [https://devel17.cnaf.infn.it:9000/xfH1fkIroQwNvlVBdn8N5A]
[https://cream-25.pd.infn.it:8443/CREAM944698480] [https://devel17.cnaf.infn.it:9000/xjiyHJo3rkUsXXHHe0s6yg]
[https://cream-25.pd.infn.it:8443/CREAM729677007] [https://devel17.cnaf.infn.it:9000/FIZA1Mjb4moUNel1N7UXvw]
[https://cream-25.pd.infn.it:8443/CREAM589660323] [https://devel17.cnaf.infn.it:9000/5DJlLG7M0v3C_-WMDKSdXQ]
[https://cream-25.pd.infn.it:8443/CREAM994745139] [https://devel17.cnaf.infn.it:9000/T_UdwnjC55dIVrPxJOVvmg]
[https://cream-25.pd.infn.it:8443/CREAM228224655] [https://devel17.cnaf.infn.it:9000/URw39mrv7jj-buJ3KDza8w]
[https://cream-25.pd.infn.it:8443/CREAM397635733] [https://devel17.cnaf.infn.it:9000/f3FOGwNoWpHyWkxO_87AIg]
[https://cream-25.pd.infn.it:8443/CREAM510341828] [https://devel17.cnaf.infn.it:9000/vEfH5j5_5R_7jFNrntEsog]
[https://cream-25.pd.infn.it:8443/CREAM788890890] [https://devel17.cnaf.infn.it:9000/y0IVYbdR_UWbTmrXY5O8fA]
------------------------------------------------
23 item(s) found
- Check also the db_id registered in the ice's database
[root@wms007 20100409]# sqlite3 /var/glite/ice/persist_dir/ice.db "SELECT db_id, ceurl from ce_dbid;"
1270820425000|https://cream-25.pd.infn.it:8443/ce-cream/services/CREAM2
- Stop the cream CE. Drop its database. Create a new empty one. Restart the CE.
- Check what happen in the ice log file:
2010-04-09 16:16:53,953 WARN - iceCommandEventQuery::execute() - TID=[150150560] *** CREAM HAS PROBABLY BEEN SCRATCHED. GOING TO ERASE ALL JOBS RELATED TO OLD DB_ID [1270820425000] ***
- Check if there are jobs in the Ice's database:
[root@wms007 persist_dir]# queryDb -v -C -G
------------------------------------------------
0 item(s) found
- and if the db_id has been changed:
[root@wms007 persist_dir]# sqlite3 /var/glite/ice/persist_dir/ice.db "SELECT db_id, ceurl from ce_dbid;"
1270822483000|https://cream-25.pd.infn.it:8443/ce-cream/services/CREAM2
- Look at the status of a job that has been removed:
Status info for the Job : https://devel17.cnaf.infn.it:9000/xjiyHJo3rkUsXXHHe0s6yg
Current Status: Aborted
Logged Reason(s):
- job completed
Status Reason: CREAM'S database has been scratched and all its jobs have been lost
Destination: cream-25.pd.infn.it:8443/cream-lsf-testbedB_2
Submitted: Fri Apr 9 16:11:34 2010 CEST
- Bug #59240
: [ICE] abort reasons not always printed in its logfile FIXED NOT CERTIFIED
- Bug #59399
: [ICE] doesn't correctly handle request in jobdir/old when it is restarted FIXED
- Verify submitting a big collection to cream CEs, and then restarting ICE in the middle of the submit process:
2010-03-23 15:55:43,604 DEBUG - iceCommandSubmit::try_to_submit() - TID=[168434952] Going to START CreamJobID [https://cream
-32.pd.infn.it:8443/CREAM036926381] related to GridJobID [https://devel17.cnaf.infn.it:9000/iM8C3YV12fwhvIG5mNip5Q]...
- restarting ice...
2010-03-23 15:55:45,760 DEBUG - ICE VersionID is [Fri Mar 19 13:53:17 CET 2010] ProcessID=[23579]
2010-03-23 15:55:45,760 INFO - glite-wms-ice::main() - Host certificate is [/home/glite/.certs/hostcert.pem]
2010-03-23 15:55:45,817 DEBUG - iceThreadPool::iceThreadPool(ICE Submission Pool) - Creating 10 worker threads
2010-03-23 15:55:45,819 DEBUG - iceThreadPool::iceThreadPool(ICE Poller Pool) - Creating 5 worker threads
[...]
2010-03-23 15:55:48,967 INFO - iceCommandSubmit::execute() - TID=[144321160] This request is a Submission...
2010-03-23 15:55:48,968 INFO - iceCommandSubmit::try_to_submit() - TID=[144321160] GridJobID [https://devel17.cnaf.infn.it:9
000/iM8C3YV12fwhvIG5mNip5Q] has already been REGISTERED. Will only START it...
2010-03-23 15:55:48,968 DEBUG - iceCommandSubmit::try_to_submit() - TID=[144321160] Going to START CreamJobID [https://cream
-32.pd.infn.it:8443/CREAM036926381] related to GridJobID [https://devel17.cnaf.infn.it:9000/iM8C3YV12fwhvIG5mNip5Q]...
2010-03-23 15:55:49,154 INFO - iceLBContext::setLoggingJob - Setting log job to jobid=[https://devel17.cnaf.infn.it:9000/iM8C
3YV12fwhvIG5mNip5Q] LB server=[devel17.cnaf.infn.it:9000] (port is not used, actually...)
2010-03-23 15:55:49,155 INFO - iceLBLogger::logEvent() - Cream Transfer OK Event - [gridJobID="https://devel17.cnaf.infn.it:9
000/iM8C3YV12fwhvIG5mNip5Q" CREAMJobID="https://cream-32.pd.infn.it:8443/CREAM036926381"]
- Bug #59453
: [ICE] polling needs to be improved FIXED NOT CERTIFIED
- Bug #60668
: [ICE] does not respect LB server/proxy selection through the LBproxy attribute FIXED
- Set LBProxy = false; in glite_wms.conf (section Common), restart ice and submit...
mysql> select * from events where jobid="YFyqjw3FF-BO-0U5BxCOtA";
+------------------------+-------+------+-----------------+---------------------+---------------------+----------------------------------+--------+-------+---------------------+
| jobid | event | code | prog | host | time_stamp | userid | usec | level | arrived |
+------------------------+-------+------+-----------------+---------------------+---------------------+----------------------------------+--------+-------+---------------------+
| YFyqjw3FF-BO-0U5BxCOtA | 0 | 5 | WorkloadManager | wms007.cnaf.infn.it | 2010-03-24 12:04:39 | bdd27610035bb0ec9287e2ecaa3da2eb | 394848 | 8 | 2010-03-24 12:04:39 |
| YFyqjw3FF-BO-0U5BxCOtA | 1 | 15 | WorkloadManager | wms007.cnaf.infn.it | 2010-03-24 12:04:39 | bdd27610035bb0ec9287e2ecaa3da2eb | 548652 | 8 | 2010-03-24 12:04:39 |
| YFyqjw3FF-BO-0U5BxCOtA | 2 | 4 | WorkloadManager | wms007.cnaf.infn.it | 2010-03-24 12:04:39 | bdd27610035bb0ec9287e2ecaa3da2eb | 608084 | 8 | 2010-03-24 12:04:39 |
| YFyqjw3FF-BO-0U5BxCOtA | 3 | 4 | WorkloadManager | wms007.cnaf.infn.it | 2010-03-24 12:04:39 | bdd27610035bb0ec9287e2ecaa3da2eb | 657231 | 8 | 2010-03-24 12:04:39 |
+------------------------+-------+------+-----------------+---------------------+---------------------+----------------------------------+--------+-------+---------------------+
4 rows in set (0.00 sec)
- * Set LBProxy = true; in glite_wms.conf (section Common), restart ice and submit...
mysql> select * from events where jobid="SlKOGSnaW0oKO3TJqw9tbA";
+------------------------+-------+------+-----------------+---------------------+---------------------+----------------------------------+--------+-------+---------------------+
| jobid | event | code | prog | host | time_stamp | userid | usec | level | arrived |
+------------------------+-------+------+-----------------+---------------------+---------------------+----------------------------------+--------+-------+---------------------+
| SlKOGSnaW0oKO3TJqw9tbA | 0 | 17 | NetworkServer | wms007.cnaf.infn.it | 2010-03-24 12:09:53 | 3f82b966e8a77413044be1a9144a4af4 | 342720 | 8 | 2010-03-24 12:09:53 |
| SlKOGSnaW0oKO3TJqw9tbA | 1 | 21 | NetworkServer | wms007.cnaf.infn.it | 2010-03-24 12:09:53 | 3f82b966e8a77413044be1a9144a4af4 | 470416 | 8 | 2010-03-24 12:09:53 |
| SlKOGSnaW0oKO3TJqw9tbA | 2 | 21 | NetworkServer | wms007.cnaf.infn.it | 2010-03-24 12:09:53 | 3f82b966e8a77413044be1a9144a4af4 | 526402 | 8 | 2010-03-24 12:09:53 |
| SlKOGSnaW0oKO3TJqw9tbA | 3 | 2 | NetworkServer | wms007.cnaf.infn.it | 2010-03-24 12:09:54 | 3f82b966e8a77413044be1a9144a4af4 | 606511 | 8 | 2010-03-24 12:09:54 |
| SlKOGSnaW0oKO3TJqw9tbA | 4 | 4 | NetworkServer | wms007.cnaf.infn.it | 2010-03-24 12:09:54 | 3f82b966e8a77413044be1a9144a4af4 | 712100 | 8 | 2010-03-24 12:09:54 |
| SlKOGSnaW0oKO3TJqw9tbA | 5 | 4 | NetworkServer | wms007.cnaf.infn.it | 2010-03-24 12:09:55 | 3f82b966e8a77413044be1a9144a4af4 | 43631 | 8 | 2010-03-24 12:09:55 |
| SlKOGSnaW0oKO3TJqw9tbA | 6 | 5 | WorkloadManager | wms007.cnaf.infn.it | 2010-03-24 12:09:55 | bdd27610035bb0ec9287e2ecaa3da2eb | 167414 | 8 | 2010-03-24 12:09:55 |
| SlKOGSnaW0oKO3TJqw9tbA | 7 | 15 | WorkloadManager | wms007.cnaf.infn.it | 2010-03-24 12:09:55 | bdd27610035bb0ec9287e2ecaa3da2eb | 297333 | 8 | 2010-03-24 12:09:55 |
| SlKOGSnaW0oKO3TJqw9tbA | 8 | 4 | WorkloadManager | wms007.cnaf.infn.it | 2010-03-24 12:09:55 | bdd27610035bb0ec9287e2ecaa3da2eb | 369636 | 8 | 2010-03-24 12:09:55 |
| SlKOGSnaW0oKO3TJqw9tbA | 9 | 4 | WorkloadManager | wms007.cnaf.infn.it | 2010-03-24 12:09:55 | bdd27610035bb0ec9287e2ecaa3da2eb | 431565 | 8 | 2010-03-24 12:09:55 |
| SlKOGSnaW0oKO3TJqw9tbA | 10 | 5 | JobController | wms007.cnaf.infn.it | 2010-03-24 12:09:55 | bdd27610035bb0ec9287e2ecaa3da2eb | 745052 | 8 | 2010-03-24 12:09:55 |
| SlKOGSnaW0oKO3TJqw9tbA | 11 | 1 | LogMonitor | wms007.cnaf.infn.it | 2010-03-24 12:09:55 | bdd27610035bb0ec9287e2ecaa3da2eb | 846002 | 8 | 2010-03-24 12:09:55 |
| SlKOGSnaW0oKO3TJqw9tbA | 12 | 1 | LogMonitor | wms007.cnaf.infn.it | 2010-03-24 12:10:04 | bdd27610035bb0ec9287e2ecaa3da2eb | 869424 | 8 | 2010-03-24 12:10:04 |
| SlKOGSnaW0oKO3TJqw9tbA | 13 | 8 | LogMonitor | wms007.cnaf.infn.it | 2010-03-24 12:11:39 | bdd27610035bb0ec9287e2ecaa3da2eb | 94855 | 8 | 2010-03-24 12:11:39 |
| SlKOGSnaW0oKO3TJqw9tbA | 14 | 25 | LogMonitor | wms007.cnaf.infn.it | 2010-03-24 12:11:39 | bdd27610035bb0ec9287e2ecaa3da2eb | 181448 | 8 | 2010-03-24 12:11:39 |
| SlKOGSnaW0oKO3TJqw9tbA | 15 | 10 | LogMonitor | wms007.cnaf.infn.it | 2010-03-24 12:11:39 | bdd27610035bb0ec9287e2ecaa3da2eb | 250291 | 8 | 2010-03-24 12:11:39 |
+------------------------+-------+------+-----------------+---------------------+---------------------+----------------------------------+--------+-------+---------------------+
16 rows in set (0.00 sec)
- Bug #61405
: [ICE] Missing proxy validity evaluation in ICE FIXED
- Bug #61413
: [ICE] should not call EventQuery for a userDN if he/she doesn't have active jobs FIXED
- Submit a job to a CreamCE and wait until it finished.
- Submit another job to a different CreamCE, you should not see any query to the previous used CreamCE.
- Bug #61748
: [ICE] EventQuery/Polling must be done also to blacklisted CE FIXED
- Submit some jobs to a CreamCEVerify
- Trigger a socket timeout so that ICE blacklisted the CreamCE :
2010-03-24 15:58:40,753 ERROR - CreamProxyMethod::execute() - Connection timed out to CREAM: "EOF detected during communicati
on. Probably service closed connection or SOCKET TIMEOUT occurred." on try 3/3. Blacklisting endpoint and giving up.
2010-03-24 15:58:40,753 DEBUG - CEBlackList::blacklist_endpoint() - Blacklisting CE https://cream-25.pd.infn.it:8443/ce-cream
/services/gridsite-delegation until Wed Mar 24 16:08:40 2010
- Verify that the QueryEvent commad is called in any case:
2010-03-24 16:05:28,952 DEBUG - eventStatusPoller::body() - Adding EventQuery command for couple (/C=IT/O=INFN/OU=Personal Ce
rtificate/L=Padova/CN=Alessio Gianelle-/dteam/Role=NULL/Capability=NULL, https://cream-25.pd.infn.it:8443/ce-cream/services/C
REAM2) to the thread pool...
- Instead a submission fails:
2010-03-24 15:58:43,265 DEBUG - Delegation_manager::delegate() - Creating new delegation with delegation id [12694427232E2651
16wms0072Ecnaf2Einfn2Eit] CREAM URL [https://cream-25.pd.infn.it:8443/ce-cream/services/CREAM2] Delegation URL [https://cream
-25.pd.infn.it:8443/ce-cream/services/gridsite-delegation] user DN [/C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Alessio
Gianelle-/dteam/Role=NULL/Capability=NULL] proxy hash [/C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Alessio Gianelle-/dte
am/Role=NULL/Capability=NULL] MyProxy Server [myproxy.cern.ch] Expiring on [Thu Mar 25 12:54:02 2010]
2010-03-24 15:58:43,265 DEBUG - CEBlackList::is_blacklisted() - CE https://cream-25.pd.infn.it:8443/ce-cream/services/gridsit
e-delegation is blacklisted until Wed Mar 24 16:08:40 2010
2010-03-24 15:58:43,265 ERROR - Delegation_manager::delegate() - FAILED Creation of a new delegation with delegation id [1269
4427232E265116wms0072Ecnaf2Einfn2Eit] CREAM URL [https://cream-25.pd.infn.it:8443/ce-cream/services/CREAM2] Delegation URL [h
ttps://cream-25.pd.infn.it:8443/ce-cream/services/gridsite-delegation] user DN [/C=IT/O=INFN/OU=Personal Certificate/L=Padova
/CN=Alessio Gianelle-/dteam/Role=NULL/Capability=NULL] proxy hash [/C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Alessio G
ianelle-/dteam/Role=NULL/Capability=NULL] MyProxy Server [myproxy.cern.ch] - ERROR is: [The endpoint is blacklisted]
2010-03-24 15:58:43,265 ERROR - iceCommandSubmit::execute() - TID=[159308760] Error during submission of jdl= Fatal Exceptio
n is:Failed to create a delegation id for job https://devel17.cnaf.infn.it:9000/UoVsvjIj1CPluHb81xM_pQ: reason is The endpoin
t is blacklisted
- Bug #63989
: [ICE] doesn't handle exception raised by jobDir::new_entries() FIXED
- Change the permission of the new directory in jobdir:
[root@wms007 jobdir]# chmod 111 new/
[root@wms007 jobdir]# ls -l
total 48
d--x--x--x 2 glite glite 40960 Mar 24 16:13 new
drwxr-xr-x 2 glite glite 4096 Mar 24 16:13 old
drwxr-xr-x 2 glite glite 4096 Mar 24 16:13 tmp
- Look in ICE's log:
2010-03-25 09:45:39,545 ERROR - Request_source_jobdir::get_requests() - Error returned by method jobDir::new_entries(): boost::filesystem::directory_iterator constructor: "/var/glite/ice/jobdir/new": Permission denied
2010-03-25 09:45:40,546 ERROR - Request_source_jobdir::get_requests() - Error returned by method jobDir::new_entries(): boost::filesystem::directory_iterator constructor: "/var/glite/ice/jobdir/new": Permission denied
- Bug #64698
: jobwrapper max osb limit should be considered only if the gridftp server is the wms FIXED only for LCG-CE
- Set MaxOutputSandboxSize = 10000000; in section WorkloadManager of file glite_wms.conf
- Submit a jdl with a file of more than 10Mb in the OutputSandbox parameter and set also the corresponding OutputSandboxDestURI parameter
- Check the output dir in the SE:
[root@devel18 tmp]# ls -lh
-rw-r--r-- 1 dteam044 dteam 50M Apr 7 16:23 bigfile
-rw-r--r-- 1 dteam044 dteam 646 Apr 7 16:23 ls.out
- If you don't set OutputSandboxDestURI in the jdl, than the SandBox dir in the WMS should contain a
.tail
file of less than 10Mb:
[root@wms007 persist_dir]# ls -lh /var/glite/SandboxDir/A1/https_3a_2f_2fdevel17.cnaf.infn.it_3a9000_2fA1cdkhzepvrCjiU_5fTaKbpg/output/
total 9.6M
-rw-r--r-- 1 dteam008 dteam 9.6M Apr 7 16:26 bigfile.tail
-rw-r--r-- 1 dteam008 dteam 637 Apr 7 16:26 ls.out
- Bug #66721
: Ineffective and never removed Job cancels FIXED
- submitted a job with the option '--register-only' as in the following:
glite-wms-job-submit --config ../glite_wms_devel14.conf --register-only -a ../myjob.jdl
Connecting to the service https://devel14.cnaf.infn.it:7443/glite_wms_wmproxy_server
====================== glite-wms-job-submit Success ======================
The job has been successfully registered to the WMProxy
Your job identifier is:
https://devel15.cnaf.infn.it:9000/mS8cXbg9szmcXsUETm8g2g
==========================================================================
To complete the operation, the following file containing the InputSandbox of the job needs to be transferred:
==========================================================================================================
ISB ZIP file : /tmp/ISBfiles_eCXKdK2egtqk7jZRlfFQpw_0.tar.gz
Destination : gsiftp://devel14.cnaf.infn.it:2811/var/glite/SandboxDir/mS/https_3a_2f_2fdevel15.cnaf.infn.it_3a9000_2fmS8cXbg9szmcXsUETm8g2g/input/ISBfiles_eCXKdK2egtqk7jZRlfFQpw_0.tar.gz
-----------------------------------------------------------------------------
then start the job by issuing a submissiong with the option:
--start https://devel15.cnaf.infn.it:9000/mS8cXbg9szmcXsUETm8g2g
- cancel the previously submitted job as in the following:
glite-wms-job-cancel https://devel15.cnaf.infn.it:9000/mS8cXbg9szmcXsUETm8g2g
Are you sure you want to remove specified job(s) [y/n]y : y
Connecting to the service https://devel14.cnaf.infn.it:7443/glite_wms_wmproxy_server
============================= glite-wms-job-cancel Success =============================
The cancellation request has been successfully submitted for the following job(s):
- https://devel15.cnaf.infn.it:9000/mS8cXbg9szmcXsUETm8g2g
========================================================================================
- Bug #66986
: ICE must be able to print out on file the stack trace trapping SIGSEGV, SIGILL, SIGABRT etc. TO VERIFY
- Bug #67097
: [yaim-wms] Removed lcg-condor-extra usage FIXED
--
AlessioGianelle - 2010-02-05