TESTs (WMS: devel18)
- without data: OK
- with data: OK
Submission/GetOutput
-
Dag
jobs through:
- JC work: OK
- tested with the following
[
Type = "dag";
VirtualOrganisation = "dteam";
Max_nodes_running = 10;
InputSandbox = "test.sh";
FuzzyRank = true;
Nodes = [
nodeA = [
file= "test_dag.jdl";
];
nodeB = [
file= "test_dag.jdl";
];
nodeC = [
file= "test_dag.jdl";
];
nodeD = [
file= "test_dag.jdl";
];
nodeE = [
file= "test_dag.jdl";
];
nodeF= [
file= "test_dag.jdl";
];
];
Dependencies = {
{{nodeA, nodeB}, nodeC},{nodeD,nodeE.nodeF}
}
]
-
Bulk
jobs sent both through ICE and JC and RetryCount = 0; :
- Submit a bulk of 3 jobs -> success 100 % OK
- Submit a bulk of 50 jobs -> success 99.99 % OK
- Submit a bulk of 100 jobs -> success 99.99 % OK
- Submit a bulk of 500 jobs -> success 99.99 % OK
- Submit a bulk of 1000 jobs -> success 99.99 % OK
Cancel
- Normal jobs
- Dag: OK
- Collection: OK
- Node of a collection: OK
Others
-
BrokerInfo
- ICE creation: OK
- JC creation: OK
- Verify all the
glite-brokerinfo
functions with the generated file
Check bugs:
- BUG #53106: Inefficient ICE's database access HOPEFULLY FIXED
- BUG #53223: Proxy renewal of ICE should be enhanced FIXED
- BUG #53502: Using sqlite database transaction instead of "old" ICE's mutex. HOPEFULLY FIXED
- BUG #53714: WMS PURGER SHOULD NOT directly FORCE PURGE OF jobs when its DN is not authorized on LB server FIXED
- BUG #55237: WMS job wrapper first customization point should be moved HOPEFULLY FIXED
- Check the job wrapper created
- BUG #55290: ICE's delegation renewal needs several enhancements. HOPEFULLY FIXED
- BUG #55329: BAD delegation ID generation in ICE HOPEFULLY FIXED
- BUG #55606: glite-wms-job-listmatch is sometimes slow HOPEFULLY FIXED
- The cron job glite-wms-wmproxy-purge-proxycache.cron removes proxy files from '/var/glite/proxycache/' every six hours
- BUG #55709: problems with glite-wms-wm restart in WMS 3.2 FIXED
TESTs only on ICE
4) Test starts on Wed Sep 16 at 14:08:40 CEST 2009 (WMS: devel20)
Description:
- 7200 collections each of 40 jobs
- One collection every 60 seconds
- Four users
- max_ice_threads = 10
- Use all the CEs of testbedB
- Use automatic-delegation
- Use 2 proxy renewal servers ("myproxy.cern.ch" and "myproxy.cnaf.infn.it")
- The job is a "sleep 4242"
- Resubmission has been enabled after 2 days of test
- Lease mechanism is not used
- Interventions in the testbed:
- PBS CEs
- qmgr -c "set server node_pack = False"
- restart of pbs and maui services
- PBS WNs
Submissions finish on Mon Sep at 21 14:03:44 CEST 2009
- 7059 collections submitted in 157326 seconds: 4/22.3/145 (min/avg/max)
- 141 (1.96%) submission(s) fail(s)
Final results taken on Thu Sep 24 at 17:36:18 CEST 2009
- Collections correctly submitted: 7059 (282360 jobs)
- DONE OK: 277253 (98.19%)
- NOT TERMINATED: 3238 (1.15%) ****
- ABORTED+CANCELLED: 1819+49 (0.66%)
- Resubmitted: 3441 (1.22%)
- Errors found (3959)
- Transfer to CREAM failed due to exception:
- FaultCause=[Batch System lsf not supported!]" (1010 times) *
- FaultSubCode=[SOAP-ENV:Client] (2 times)
- CREAM Register raised std::exception Connection to service [https://cream-23.pd.infn.it:8443/ce-cream/services/CREAM2] failed (3 times)
- Authentication error: Unable to open the file [/var/glite/SandboxDir/<xx >/<jobid >/user.proxy] : No such file or directory (423 times) **
- Cannot move ISB (1774 times)
- blah error: send command timeout (509 times)
- pbs_reason=127 (206 times) ***
- lsf_reason=306 (1 time)
- lsf_reason=11 (3 times)
- Cannot take token; /opt/edg/libexec/edg-gridftp-base-rm: error globus_ftp_client: the server responded with an error 421 Service not available, closing control connection Cannot take token (28 times)
Note:
* The blparser on cream-23 was not running. It was restarted on Fri Sep 18 at 11:22:06.
** The proxy renewal service arrived late to renew the job's proxy
*** Jobs stucked on pbs queue (error = 15020)
**** The "NOT TERMINATED" are distributed in this way:
- 73 Collections (i.e. 2920 jobs) failed to be submitted (by WM) with reason request expired
- 6 Collections (i.e. 240 jobs) failed to be submitted (by wmproxy) with reason Register DAG subjobs failed Exit code: 1416 LB[Proxy] Error: LB server (bkserver,lbproxy) store protocol error
- 77 jobs are running
- 1 job is scheduled
3) Test starts on Fri Sep 11 at 16:28:30 CEST 2009 (WMS: devel20)
Description:
- 3600 collections each of 40 jobs
- One collection every 60 seconds
- Four users
- max_ice_threads = 10
- Use all the CEs of testbedB
- Use automatic-delegation
- Use 2 proxy renewal servers ("myproxy.cern.ch" and "myproxy.cnaf.infn.it")
- The job is a "sleep 4242"
- Resubmission is NOT enabled
- Lease mechanism is not used
Submissions finish on Mon Sep at 14 04:22:55 CEST 2009
- 3600 collections submitted in 25710 seconds: 3/7.14/54 (min/avg/max)
Final results taken on Wed Sep 16 at 09:11:18 CEST 2009
- Collections correctly submitted: 3600 (144000 jobs)
- DONE OK: 143485 (99.64%)
- NOT TERMINATED: 0 (0%)
- ABORTED+CANCELLED: 2+515 (0.36%)
- Resubmitted: - (-%)
- Errors found (2)
- Transfer to CREAM failed due to exception: (2 times)
- CREAM Register raised std::exception Received NULL fault; the error is due to another cause: FaultString=[] - FaultCode=[SOAP-ENV:Server.generalException] - FaultSubCode=[SOAP-ENV:Server.generalException] - FaultDetail=[invoke2009-09-14T01:18:56.638Z0cannot write the authN proxy to file: nullcannot write the authN proxy to file: nullorg.glite.ce.faults.AuthenticationFaultcream-29.pd.infn.it]
- CREAM Start raised exception Received NULL fault; the error is due to another cause: FaultString=[] - FaultCode=[SOAP-ENV:Server.generalException] - FaultSubCode=[SOAP-ENV:Server.generalException] - FaultDetail=[invoke2009-09-13T19:31:01.591Z0USER_VO_LABEL not defined in msgContextUSER_VO_LABEL not defined in msgContextorg.glite.ce.faults.AuthenticationFaultcream-34.pd.infn.it]
2) Test starts on Thu Sep 10 at 16:00:00 CEST 2009 (WMS: devel20)
Description:
- 720 collections each of 40 jobs
- One collection every 60 seconds
- Four users
- max_ice_threads = 10
- Used all the CEs of testbedB
- Used automatic-delegation
- Use two proxy renewal servers ("myproxy.cern.ch" and "myproxy.cnaf.infn.it") and also submit 33% of jobs without setting MyproxyServer
- The job is a "sleep 2424"
- Resubmission is NOT enabled
- Lease mechanism is not used
- Changes in the software wrt previous test:
- CREAM
- Use old blparser for all the CEs
Submissions finish on Fri Sep 11 at 03:51:54 CEST 2009
- 718 collections submitted in 4540 seconds: 3/6.3/37 (min/avg/max)
Final results taken on Fri Sep 11 at 15:20:33 CEST 2009
- Collections correctly submitted: 718 (28720 jobs)
- DONE OK: 28319 (98.6%)
- NOT TERMINATED: 0 (0%)
- ABORTED+CANCELLED: 5+396 (1.4%)
- Resubmitted: - (-%)
- Errors found (5)
- Cannot move ISB [...] : (5 times)
Note:
- Some failures are due to bug #54949
1) Test starts on Wed Sep 9 at 16:08:00 CEST 2009 (WMS: devel20)
Description:
- 800 collections each of 30 jobs
- One collection every 60 seconds
- Four users
- max_ice_threads = 10
- Used all the CEs of testbedB
- Used automatic-delegation and 2 proxy renewal servers ("myproxy.cern.ch" and "myproxy.cnaf.infn.it")
- The job is a "sleep 4242"
- Resubmission is NOT enabled
- Lease mechanism is not used
- Changes in the software wrt previous test:
- Use a WMS updates with patches #3156 and #3183
- ICE
- Use new delegation renewal mechanism
- CREAM
- Use new blparser for pbs CEs
Submissions finish on Thu Sep 10 at 05:20:33 CEST 2009
- 799 collections submitted in 4499 seconds: 3/5.6/48 (min/avg/max)
Final results taken on Fri Sep 11 at 09:20:33 CEST 2009
- Collections correctly submitted: 799 (23970 jobs)
- DONE OK: 23752 (99.09%)
- NOT TERMINATED: 36 (0.15%)
- ABORTED+CANCELLED: 95+87 (0.76%)
- Resubmitted: - (-%)
- Errors found (95)
- Cannot move ISB [...] : (1 time)
- reason=999: (1 time)
- Proxy is expired; (93 times)
Note:
- Some failures are due to the use of the new blparser
--
AlessioGianelle - 10 Sep 2009