Tags:
, view all tags

TESTs on ICE

13) Test starts on Tue Jun 16 at 17:20:06 CEST 2009 (WMS: devel20)

Description:
  • 2880 collections each of 50 jobs
  • One collection every 60 seconds
  • Four users
  • max_ice_threads = 20
  • Used all the CEs of testbedB (except cert-07.cnaf)
  • Used automatic-delegation and proxy renewal service (MyProxyServer = "myproxy.cern.ch")
  • Proxy has 8 hours of lifetime (and it is renewed every 4 hours)
  • The job is a "sleep 2424"
  • Resubmission is enabled
  • Lease mechanism is not used
  • Notification mechanism is not used: start_listener = false;
  • Use "jobdir" input mechanism

  • Changes in the software wrt previous test:
    • ICE
      • fixed a wrong sql query
      • Transaction to database is tried indefinitely (no more DbLockedException can be thrown)
      • new SQL commands to get the terminated jobs from the database

12) Test starts on Fri Jun 12 at 15:59:59 CEST 2009 (WMS: devel20)

Description:
  • 4320 collections each of 40 jobs
  • One collection every 60 seconds
  • Four users
  • max_ice_threads = 20
  • Used all the CEs of testbedB (except cert-07.cnaf)
  • Used automatic-delegation and proxy renewal service (MyProxyServer = "myproxy.cern.ch")
  • Proxy has 8 hours of lifetime (and it is renewed every 4 hours)
  • The job is a "sleep 2424"
  • Resubmission is enabled
  • Lease mechanism is not used
  • Notification mechanism is not used: start_listener = false;
  • Use "jobdir" input mechanism

  • Changes in the software wrt previous test:
    • ICE
      • fixed bad handling of decrement of the job counter for the better proxy
      • removing delegation if the better proxy is not there anymore
    • CE
      • Update BLAH rpm
      • Add some more logs to JobWrapper stdout for debugging purpose

Submissions finish on Mon Jun 15 at 15:57:55 CEST 2009

  • 4219 collections submitted in 39736 seconds: 4/9/97 (min/avg/max)
    • 101 submissions fail due to load limiter

Final results

  • Collections correctly submitted: 4219 (168760 jobs)
    • DONE OK: 168748 (99.99%)
    • NOT TERMINATED: 12 (0.01%)
    • ABORTED: 0 (0%)
    • Resubmitted: 2414 (1.43%)

  • Errors found (2446)
    • Cannot move ISB: (18 times)
    • Transfer to CREAM fails: (1948 times)
      • CREAM Register raised std::exception Received NULL fault; the error is due to another cause: FaultString=[] - FaultCode=[SOAP-ENV:Server.generalException] - FaultSubCode=[SOAP-ENV:Server.generalException] - FaultDetail=[invoke2009-06-13T06:23:42.151Z0cannot write the authN proxy to file: nullcannot write the authN proxy to file: nullorg.glite.ce.faults.AuthenticationFaultcream-34.pd.infn.it]
      • CREAM Start raised exception Received NULL fault; the error is due to another cause: FaultString=[] - FaultCode=[SOAP-ENV:Server.generalException] - FaultSubCode=[SOAP-ENV:Server.generalException] - FaultDetail=[invoke2009-06-14T07:00:11.781Z0USER_VO_LABEL not defined in msgContextUSER_VO_LABEL not defined in msgContextorg.glite.ce.faults.AuthenticationFaultcream-32.pd.infn.it]
      • Failed to create a delegation id for job https://devel15.cnaf.infn.it:9000/00-d12pjYkJYAIz_eHUiCg: reason is Connection to service [https://cream-21.pd.infn.it:8443/ce-cream/services/gridsite-delegation] failed: (1946 times)
    • Cannot take token (3 times)
    • lsf_reason=65280
    • pbs_reason=271 (3 times)
    • Proxy is expired (473 times)

icemem12.png

11) Test starts on Fri May 29 at 18:20:13 CEST 2009 (WMS: devel20)

Description:
  • 4320 collections each of 40 jobs
  • One collection every 60 seconds
  • Four users
  • max_ice_threads = 20
  • Used all the CEs of testbedB (except cert-07.cnaf)
  • Used automatic-delegation and proxy renewal service (MyProxyServer = "myproxy.cern.ch")
  • Proxy has 5 hours of lifetime (and it is renewed every 4 hours)
  • The job is a "sleep 2424"
  • Resubmission is enabled
  • Lease mechanism is not used
  • Notification mechanism is not used start_listener = false;
  • Use "jobdir" input mechanism

Submissions finish on Mon Jun 1 at 18:17:14 CEST 2009

  • 2788 collections submitted in 32565 seconds: 5/11/77 (min/avg/max)
    • 1532 submission(s) fail(s) due to load limiter

Final results taken on Wed Jun 3 at 16:23:46 CEST 2009

  • Collections correctly submitted: 2788 (111520 jobs)
    • DONE OK: 110757 (99.318%)
    • NOT TERMINATED: 2 (0.002%)
    • ABORTED: 761* (0.68%)
    • Resubmitted: 5131 (4.6%)

(Note: the 760 jobs aborted are exactly 19 collections which stay in the ice queue too long, so their proxy expired)

  • Errors found (7620)
    • BLAH error: (1030 times)
      • blah error: send command timeout: (41 times)
      • submission command failed (exit code = 1) (stdout:) (stderr:pps: Queue has been closed. Job not submitted.-TERM environment variable not set.-) N/A (887 times)
      • BLAH error: submission command failed (exit code = -15) (stdout:) (stderr: exe_getouterr: 200 seconds timeout expired, killing child process.- killed by signal 15.-) N/A (102 times)
    • Cannot move ISB: (1550 times)
    • Transfer to CREAM fails: (4432 times)
      • Authentication error: Unable to open the file [/var/glite/SandboxDir/zZ/https_3a_2f_2fdevel15.cnaf.infn.it_3a9000_2fzZq20TYizStcltyqfblxwA/user.proxy] : No such file or directory (3358 times)
      • CREAM Register returned error "MethodName=[jobRegister] Timestamp=[Mon 01 Jun 2009 01:18:50] ErrorCode=[0] Description=[system error] FaultCause=[cannot create the job's working directory! The problem seems to be related to glexec [glexec reported = "glexec policy violation: see glexec log for more details"]]" (27 times)
      • Authentication error: The proxy is EXPIRED! (1041 times)
    • Proxy expired: (527 times)
    • Cannot take token: (78 times)
      • /opt/edg/libexec/edg-gridftp-base-rm: error globus_ftp_client: the server responded with an error 421 Service not available, closing control connection Cannot take token (30 times)
      • /opt/edg/libexec/edg-gridftp-base-rm: error globus_ftp_client: the server responded with an error 500 500-Command failed : System error in unlink: No such file or directory 500-A system call failed: No such file or directory 500 End. Cannot take token (3 times)
    • pbs_reason: (3 times)

icemem11.png

10) Test starts on Tue May 26 at 15:56:53 CEST 2009 (WMS: devel20)

Description:
  • 4320 collections each of 40 jobs
  • One collection every 60 seconds
  • Four users
  • max_ice_threads = 20
  • Used all the CEs of testbedB (except cert-07.cnaf)
  • Used automatic-delegation and proxy renewal service (MyProxyServer = "myproxy.cern.ch")
  • Proxy has 5 hours of lifetime (and it is renewed every 4 hours)
  • The job is a "sleep 2424"
  • Resubmission is enabled
  • Lease mechanism is not used
  • Use "jobdir" input mechanism

Submissions finish on Fri May 29 at 15:53:26 CEST 2009

  • 3872 collections submitted in 33435 seconds: 4/8/54 (min/avg/max)
    • 448 submissions fails due to load limiter

Final results taken on Thu May 28 at 12:04:48 CEST 2009

  • Collections correctly submitted: 3872 (154880 jobs)
    • DONE OK: 154873 (99.995%)
    • NOT TERMINATED: 7 (0.005%)
    • ABORTED: 0 (0%)
    • Resubmitted: 1165 (0.75%)

  • Errors found (1165)
    • BLAH error: 185
    • Cannot move ISB: 86
    • Transfer to CREAM fails: 6
    • Proxy expired: 884
    • pbs_reason: 4

icemem10.png

Open Bugs:

  • #50875: CREAM: reason for cancelled jobs should be reported
  • #50876: CREAM reports that the proxy expired even when the problem is in detecting the lifetime of the proxy
  • #51046: CREAM: DelegProxyInfo info sometimes is wrong

9) Test starts on Wed May 22 at 18:01:14 CEST 2009 (WMS: devel20)

Description:
  • 4320 collections each of 40 jobs
  • One collection every 60 seconds
  • Four users
  • max_ice_threads = 20
  • Used all the CEs of testbedB
  • Used automatic-delegation and proxy renewal service (MyProxyServer = "myproxy.cern.ch")
  • Proxy has 5 hours of lifetime (and it is renewed every 4 hours)
  • The job is a "sleep 2424"
  • Resubmission is enabled
  • Lease mechanism is not used
  • Use "jobdir" input mechanism

  • Changes in the software wrt previous test:
    • ICE
      • Limiting SELECT queries SQL to 100 records (for poller)
      • Setting on cache size of database to a low value in order to limit heap consumption
    • CE
      • Use version 1_11
  • Use a WMS 3.2 (patch 2597)
  • Redesign of the PBS section of the testbedB (1 pbs server per CE)

Submissions finish on Mon May 25 at 17:58:16 CEST 2009

  • 2969 collecions submitted in 18907 seconds: 3/6/66 (min/avg/max)
    • 1351 submissions fails duo to load limiter

Final results taken on Tue May 26 at 10:58:16 CEST 2009

  • Collections correctly submitted: 2969 (118760 jobs)
    • DONE OK: 115175 (96.98%)
    • NOT TERMINATED: 0 (0%)
    • ABORTED: 3584 (3.02%)
    • Resubmitted: 22693 (19.11%)

  • Errors found (42042)
    • BLAH error: 26
    • Cannot move ISB: 1186
    • Transfer to CREAM failed due to exception: Authentication error: 20354
    • Proxy expired: 20327
    • Cannot take token: 103
    • lsf_reason: 4
    • pbs_reason: 42

icemem9.png

NOTE:

  • All the errors (and so the aborted) are due to a "proxy expired" problem, which happen for a misconfiguration in the proxy renewal service daemon.
  • In the last day we use as MyProxyServer: myproxy.cnaf.infn.it

8) Test starts on Mon May 11 at 11:44:15 CEST 2009 (WMS: devel14)

Description:
  • 4320 collections each of 20 jobs
  • One collection every 60 seconds
  • Four users
  • max_ice_threads = 20
  • Used all the lsf CEs of testbedB
  • Used automatic-delegation and proxy renewal service (MyProxyServer = "myproxy.cern.ch")
  • Proxy has 5 hours of lifetime (and it is renewed every 4 hours)
  • The job is a "sleep 2424"
  • Resubmission is enabled
  • Lease mechanism is not used
  • Poller mechanism is not used (start_poller = false;)

Submissions finish on Thu May 14 at 11:41:46 CEST 2009

  • 4320 collections submitted in 35444 seconds: 4/8/44 (min/avg/max)

Final results taken on Tue May 19 at 10:11:16 CEST 2009

  • Collections correctly submitted: 4320 (86400 jobs)
    • DONE OK: 72242 (83.61%)
    • NOT TERMINATED: 13846 (16.03%)
    • ABORTED: 312 (0.36%)
    • Resubmitted: 1389 (1.6%)

  • Errors found (3146)
    • BLAH error: (3134 times 99.63%)
      • submission command failed (exit code = 1) (stdout:) (stderr:Cannot connect to LSF. Please wait ...-Cannot connect to LSF. Please wait ...-Cannot connect to LSF. Please wait ...-Cannot connect to LSF. Please wait ...-Cannot connect to LSF. Please wait ...-Cannot connect to LSF. Please wait ...-Cannot connect to LSF. Please wait ...-Cannot connect to LSF. Please wait ...-Failed in an LSF library call: Failed in sending/receiving a message: Connection reset by peer. Job not submitted.-TERM environment variable not set.-) N/A (jobId = [ ... ] ) (283 times)
      • submission command failed (exit code = 1) (stdout:) (stderr:Cannot connect to LSF. Please wait ...-Cannot connect to LSF. Please wait ...-Cannot connect to LSF. Please wait ...-Cannot create job info file. Job not submitted.-TERM environment variable not set.-) N/A (jobId = [ ... ] ) ( times 1858)
      • no jobId in submission script's output (stdout:) (stderr: exe_getouterr: 200 seconds timeout expired, killing child process.-) N/A (jobId = [ ... ]) (8 times)
      • submission command failed (exit code = -15) (stdout:) (stderr:Cannot connect to LSF. Please wait ...-Cannot connect to LSF. Please wait ...-Cannot connect to LSF. Please wait ...-Cannot connect to LSF. Please wait ...-Cannot connect to LSF. Please wait ...-Cannot connect to LSF. Please wait ...-LSF is down. Please wait ...-LSF is down. Please wait ...-LSF is down. Please wait ...- exe_getouterr: 200 seconds timeout expired, killing child process.- killed by signal 15.-) N/A (jobId = [ ... ]) (8 times)
      • send command timeout (977 times)
    • lsf reason (6 times 0.19%)
      • lsf_reason=-1 (5 times)
      • lsf_reason=36608 (1 time)
    • Cannot take token (1 time 0.03%)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception Received NULL fault; the error is due to another cause: FaultString=[Client fault] - FaultCode=[SOAP-ENV:Client] - FaultSubCode=[SOAP-ENV:Client] (3 times 0.09%)
    • The job cannot be submitted because the blparser service is not alive (2 times 0.06%)

NOTE:

  • For the jobs in "not terminated" status, the notification of the "Done Ok" event has been lost.
  • Probably most of the "BLAH errors" are due to a malfunction in the LSF batch system at Cnaf

icemem8.png

7) Test starts on Mon May 4 at 15:47:37 CEST 2009 (WMS: devel18)

Description:
  • 4320 collections each of 20 jobs
  • One collection every 60 seconds
  • Four users
  • max_ice_threads = 20
  • Used all the lsf CEs of testbedB
  • Used automatic-delegation and proxy renewal service (MyProxyServer = "myproxy.cern.ch")
  • Proxy has 5 hours of lifetime (and it is renewed every 4 hours)
  • The job is a "sleep 2424"
  • Resubmission is enabled
  • Lease mechanism is not used

  • Changes in the software wrt previous test:
    • ICE
      • Improvement in the memory management of the SQL queries

Submissions finish on Thu May 7 at 15:45:40 CEST 2009

  • 4317 collections submitted in 48130 seconds: 4/11/62 (min/avg/max)
    • 3 submission(s) fail(s) due to load limiter

Results:

  • Collections correctly submitted: 4317 (86340 jobs)
    • DONE OK: 64334 (74.51%)
    • NOT TERMINATED: 21997 (25.48%)
    • ABORTED: 9 (0.01%)
    • Resubmitted: 250 (0.29%)

  • Errors found (336)
    • BLAH error: (1 time 0.3%)
      • submission command failed (exit code = 120) (stdout:) (stderr:glexec policy violation: see glexec log for more details-) N/A (jobId = CREAM539587168)
    • lsf reason (138 times 41.07%)
      • lsf_reason=1603 (2 times)
      • lsf_reason=-1 (136 times)
    • The job cannot be submitted because the blparser service is not alive (3 times 0.9%)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception Connection to service [ ... ] failed: (194 times 57.73%)

NOTE:

  • Poller's call doesn't work due to a mistake in the code

6) Test starts on Thu Apr 23 13:23:35 CEST 2009 (WMS: devel14)

Description:
  • 7200 collections each of 40 jobs
  • One collection every 60 seconds
  • Four users
  • max_ice_threads = 20
  • max_ice_mem = 1048000;
  • Used all the CEs of testbedB (except cert-06.cnaf and cream-28.pd)
  • Used automatic-delegation and proxy renewal service (MyProxyServer = "myproxy.cern.ch")
  • Proxy has 5 hours of lifetime (and it is renewed every 4 hours)
  • The job is a "sleep 2424"
  • Resubmission is enabled
  • Lease mechanism is not used

  • Changes in the software wrt previous test:
    • ICE
      • SQL queries fixed

Results taken on Wed Apr 30 12:42:00 CEST 2009

  • Collections correctly submitted: 5787 (231480 jobs)
    • DONE OK: 86978 (37.58%)
    • CANCELLED: 83568 (36.10%)
    • NOT TERMINATED: 59604 (25.75%)
    • ABORTED: 1330 (0.57%)
    • Resubmitted: 1908 (0.82%)

  • Errors found (12814)
    • BLAH error: (3943 times)
      • submission command failed (exit code = 1) (3839 times)
      • submission command failed (exit code = 106) (67 times)
      • submission command failed (exit code = 120) (2 times)
      • submission command failed (exit code = -15) (9 times)
      • submission command failed (exit code = 107) (17 times)
      • no jobId in submission script's output (3 time)
      • send command timeout (6 times)
    • The job cannot be submitted because the blparser service is not alive (25 times)
    • lsf_reason=-1 (4 times)
    • Cannot move ISB (171 times)
    • Transfer to CREAM failed due to exception (8613 times)
      • Authentication error: The proxy is EXPIRED! (1700 times)
      • Authentication error: Unable to open the file [ ... ] : No such file or directory (6736 times)
      • CREAM Register raised std::exception Cannot set credentials in the gsoap-plugin context (45 times)
      • CREAM Register raised std::exception Connection to service [https://cream-28.pd.infn.it:8443/ce-cream/services/CREAM2] failed: (11 times)
      • CREAM Register raised std::exception EOF detected during communication. Probably service closed connection or SOCKET TIMEOUT occurred. (11 times)
      • CREAM Register raised std::exception The endpoint is blacklisted (59 times)
      • CREAM Register returned error "MethodName=[jobRegister] Timestamp=[Mon 27 Apr 2009 13:18:26] ErrorCode=[0] Description=[system error] FaultCause=[cannot create the job's working directory! The problem seems to be related to glexec]" (46 times)
      • CREAM Start raised exception The endpoint is blacklisted (4 time)
      • CREAM Register raised std::exception Received NULL fault; the error is due to another cause: FaultString=[java.lang.NullPointerException] - FaultCode=[SOAP-ENV:Server.userException] - FaultSubCode=[SOAP-ENV:Server.userException] - FaultDetail=[cream-28.pd.infn.it] (1 time)
    • Proxy is expired (58 times)

ice6.png

Problems

  • ProxyRenewal service daemon:
    Trying to renew proxy in f50d5ddd407f12c2cc55f102b7eb1f18.1411
    Error contacting MyProxy server for proxy f50d5ddd407f12c2cc55f102b7eb1f18.1411: ERROR from myproxy-server (myproxy.cern.ch):
    certificate chain verification failed
    X509_verify_cert() failed: certificate has expired
    authentication failed
  • Maui:
    04/27 09:59:45 WARNING:  job buffer overflow (cannot add job '1545422')
    04/27 09:59:45 ERROR:    job buffer is full  (ignoring job '1545422.cream-28.pd.infn.it')

5) Test starts on Tue Apr 15 at 11:40:01 CEST 2009 (WMS: devel14)

Description:
  • 3400 collections each of 40 jobs
  • One collection every 60 seconds
  • Four users
  • max_ice_threads = 20
  • max_ice_mem = 2800000;
  • Used all the CEs of testbedB (except cert-06.cnaf and cream-22.pd)
  • Used automatic-delegation and proxy renewal service (MyProxyServer = "myproxy.cern.ch")
  • Proxy has 5 hours of lifetime (and it is renewed every 4 hours)
  • The job is a "sleep 2424"
  • Resubmission is enabled
  • Lease mechanism is not used

  • Changes in the software wrt previous test:
    • ICE:
      • Improvements in the memory management in database's queries
      • Fixed a bug in the management of notifications

Test interrupted on 2009-04-16 at 15:10:40

  • ICE has exited with this message:
    terminate called after throwing an instance of 'glite::wms::ice::db::DbOperationException'
      what():  Query [UPDATE jobs SET failure_reason='Transfer to CREAM failed due to exception: CREAM Register returned error "MethodName=[jobRegister] Timestamp=[Thu 16 Apr 2009 13:32:10] ErrorCode=[0] Description=[system error] FaultCause=[cannot create the job's working directory! The problem seems to be related to glexec]"' WHERE gridjobid='https://devel15.cnaf.infn.it:9000/UV2KNfU-ypvg-tbr6GnrNQ';] failed due to error [near "s": syntax error]

Partial results:

  • Collections correctly submitted: 1535 (61400 jobs)
    • DONE OK: 32000 (52.12%)
    • NOT TERMINATED: 29400 (47.88%)
    • Resubmitted: 7 (0.01%)

  • Errors found (7):
    • BLAH error: submission command failed (exit code = 1) (stdout:) (stderr:pbs_iff: cannot read reply from pbs_server-No Permission.-qsub: cannot connect to server cream-28.pd.infn.it (errno=15007)-TERM environment variable not set.-) N/A (jobId = CREAM524685673) (7 times)

4) Test starts on Fri Apr 10 at 17:10:12 CEST 2009 (WMS: devel14)

Description:
  • 4320 collections each of 40 jobs
  • One collection every 60 seconds
  • Five users
  • max_ice_threads = 20
  • Used all the CEs of testbedB (except cert-06.cnaf.infn.it)
  • Used automatic-delegation and proxy renewal service (MyProxyServer = "myproxy.cern.ch")
  • Proxy has 5 hours of lifetime (and it is renewed every 4 hours)
  • The job is a "sleep 4242"
  • Resubmission is enabled
  • Lease mechanism is not used

Submissions finish on Mon Apr 13 at 20:35:45 CEST 2009

  • 3608 collections submitted in 118319 seconds: 5/32/213 (min/avg/max)
    • 712 submission(s) fail(s) due to load limiter

Test interrupted on 2009-04-13 at 07:55:40

  • ICE stopped work on the WMS:
    terminate called after throwing an instance of 'std::bad_alloc'
      what():  St9bad_alloc
    Aborted (core dumped)
  • MAUI stopped work on the torque-CEs:
    04/12 05:36:08 ERROR:  job hash table is FULL.  cannot add MJob[107] '1385803'
    04/12 05:36:08 ERROR:    job buffer is full  (ignoring job '1385803.cream-28.pd.infn.it')
    04/12 05:36:08 ERROR:  job hash table is FULL.  cannot add MJob[107] '1385804'
    04/12 05:36:08 ERROR:    job buffer is full  (ignoring job '1385804.cream-28.pd.infn.it')
    04/12 05:36:08 INFO:     35507 PBS jobs detected on RM base
    04/12 05:36:08 INFO:     jobs detected: 35507
    and
    04/12/2009 05:36:09;0080;PBS_Server;Req;dis_request_read;req header bad, dis error 7 (Premature end of message), type=Connect
    04/12/2009 05:36:09;0080;PBS_Server;Req;req_reject;Reject reply code=15056(Bad DIS based Request Protocol MSG=cannot decode message), aux=0, type=Connect, fr
    om @
    04/12/2009 05:36:09;0002;PBS_Server;Req;dis_reply_write;DIS reply failure, -1

Results:

  • Collections correctly submitted: 3608 (144320 jobs)
    • DONE OK: 31687 (21.96%)
    • NOT TERMINATED: 112633 (78.04%)
    • Resubmitted: 92 (0.06%)

ice4.png

3) Test starts on Fri Apr 10 at 12:11:20 CEST 2009 (WMS: devel14)

Description:
  • 150 collections each of 15 jobs
  • Three collections every 60 seconds
  • Five users
  • max_ice_threads = 20
  • Used all the CEs of testbedB (except cert-06.cnaf.infn.it)
  • Used automatic-delegation
  • The job is a "sleep 666"
  • Resubmission is enabled
  • Lease mechanism is not used

Submissions finish on Fri Apr 10 at 12:59:04 CEST 2009

  • 150 collections correctly submitted

Final results

  • Collections correctly submitted: 150 (2250 jobs)
    • DONE OK: 2250 (100%)

2) Test starts on Thu Apr 9 at 15:57:41 CEST 2009 (WMS: devel14)

Description:
  • 250 collections each of 40 jobs
  • One collection every 60 seconds
  • Five users
  • max_ice_threads = 20
  • Used all the CEs of testbedB (except cert-06.cnaf.infn.it)
  • Used automatic-delegation and proxy renewal service (MyProxyServer = "myproxy.cern.ch")
  • Proxy has 5 hours of lifetime (and it is renewed every 4 hours)
  • The job is a "sleep 666"
  • Resubmission is enabled
  • Lease mechanism is not used

  • Changes in the software wrt previous test:
    • fixed a bug in ICE about the decrement counter associated with the "super" proxy

Submissions finish on Thu Apr 9 20:04:05 CEST 2009

  • 250 collections submitted in 2213 seconds: 4/8/30 (min/avg/max)

Final results

  • Collections correctly submitted: 250 (10000 jobs)
    • DONE OK: 10000 (100%)
    • Resubmitted: 4 (0.04%)

ice2.png

1) Test starts on Wed Apr 8 at 16:28:22 CEST 2009 (WMS: devel14)

Description:
  • 250 collections each of 40 jobs
  • One collection every 60 seconds
  • Five users
  • max_ice_threads = 20
  • Used all the CEs of testbedB (except cert-06.cnaf.infn.it)
  • Used automatic-delegation and proxy renewal service (MyProxyServer = "myproxy.cern.ch")
  • Proxy has 5 hours of lifetime (and it is renewed every 4 hours)
  • The job is a "sleep 616"
  • Resubmission is enabled
  • Lease mechanism is not used

Submissions finish on Wed Apr 8 at 20:34:44 CEST 2009

  • 250 collections submitted in 2030 seconds: 5/8/29 (min/avg/max)

Final results

  • Collections correctly submitted: 250 (10000 jobs)
    • DONE OK: 10000 (100%)
    • Resubmitted: 555 (5.55%)

  • Errors found (561):
    • Cannot move ISB (232 times 41.36%)
    • Cannot move OSB (79 times 14.08%)
    • Proxy is expired (216 times 38.5%)
    • lsf_reason=1603 (34 times 6.06%)

ice1.png

-- AlessioGianelle - 05 Mar 2009

Topic attachments
I Attachment Action Size Date Who Comment
PNGpng ice1.png manage 5.1 K 2009-04-09 - 08:56 AlessioGianelle Submission rate. Test 1
PNGpng ice2.png manage 4.7 K 2009-04-10 - 07:57 AlessioGianelle Submission rate. Test 2
PNGpng ice4.png manage 5.1 K 2009-04-14 - 10:20 AlessioGianelle Submission rate. Test 4
PNGpng ice6.png manage 8.6 K 2009-04-30 - 12:18 AlessioGianelle Ice graph. Test 6
PNGpng icemem10.png manage 6.1 K 2009-05-29 - 15:52 AlessioGianelle Ice graph. Test 10
PNGpng icemem11.png manage 6.3 K 2009-06-03 - 10:08 AlessioGianelle Ice graph. Test 11
PNGpng icemem12.png manage 6.3 K 2009-06-15 - 14:45 AlessioGianelle Ice graph. Test 12
PNGpng icemem8.png manage 6.4 K 2009-05-14 - 15:24 AlessioGianelle Ice graph. Test 8
PNGpng icemem9.png manage 6.3 K 2009-05-26 - 14:11 AlessioGianelle Ice graph. Test 9
Edit | Attach | PDF | History: r63 | r41 < r40 < r39 < r38 | Backlinks | Raw View | More topic actions...
Topic revision: r39 - 2009-06-16 - AlessioGianelle
 
  • Edit
  • Attach
This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback