Tags:
, view all tags

TESTs on ICE (Query Event)

13) Test starts on Thu Feb 11 at 15:48:14 CET 2010 (WMS: devel20)

Description:
  • 6000 collections each of 20 jobs
  • One collection every 30 seconds
  • Four users
  • max_ice_threads = 10
  • We use these CEs distributed between Padua and Bologna:
    • 6 CEs SL5/64b with cream version 1.12 (2 lsf + 4 torque)
    • 4 CEs SL4 with cream version 1.11 (2 lsf + 2 torque)
    • 11 CEs SL4 with cream version 1.12 (5 lsf + 6 torque)
  • Use automatic-delegation
  • The job is a "sleep random(7200)"
  • Resubmission is enabled
  • Use proxy renewal service (myproxy.cern.ch)
  • Lease mechanism is not used

  • Changes in the software wrt previous test:
    • Use only one HD.
    • Changes in my.conf file:
      set-variable = innodb_flush_log_at_trx_commit=2

12) Test starts on Mon Feb 8 at 17:26:42 CET 2010 (WMS: devel20)

Description:
  • 2880 collections each of 20 jobs
  • One collection every 30 seconds
  • Four users
  • max_ice_threads = 10
  • We use these CEs distributed between Padua and Bologna:
    • 6 CEs SL5/64b with cream version 1.12 (2 lsf + 4 torque)
    • 4 CEs SL4 with cream version 1.11 (2 lsf + 2 torque)
    • 11 CEs SL4 with cream version 1.12 (5 lsf + 6 torque)
  • Use automatic-delegation
  • The job is a "sleep random(7200)"
  • Resubmission is enabled
  • Use proxy renewal service (myproxy.cern.ch)
  • Lease mechanism is not used

  • Changes in the software wrt previous test:
    • Use two HDs; in the second ones we put the "persist directory" of ice (i.e. internal database), and the mysql directory (i.e. /var/lib/mysql)
    • Changes in my.conf file:
      set-variable = innodb_buffer_pool_size=1800M
      set-variable = innodb_additional_mem_pool_size=200M
      set-variable = innodb_flush_log_at_trx_commit=0
      set-variable = innodb_log_file_size=100M
      set-variable = innodb_log_group_home_dir=/var/lib/mysql_logfiles
    • LBProxy and LBServer databases have been scratched
    • All SandBox directories have been removed

Submissions finish on Tue Feb 9 at 17:25:27 CET 2010

  • 2876 collections submitted in 12475 seconds: 3/4/37 (min/avg/max)
    • 4 submissions fails

Final results

  • Collections correctly submitted: 2876 (57520 jobs)
    • DONE OK: 56971 (99.05 %)
    • NOTDONE: 0 (0 %)
    • ABORTED: 0 (0 %)
    • CANCELLED: 549 (0.95 %) (Stucked in pbs queues)

  • Errors found (363)
    • BLAH error (9 times)
    • proxy expired (341 times)
    • Cannot take token (7 times)
    • reason=1; /opt/edg/libexec/edg-gridftp-base-rm: timeout exceeded Cannot take token (5 times)
    • Transfer to CREAM failed due to exception (1 time)

ice12.png

11) Test starts on Thu Feb 4 at 11:16:00 CET 2010 (WMS: devel20)

Description:
  • 1600 collections each of 20 jobs
  • One collection every 30 seconds
  • Four users
  • max_ice_threads = 10
  • We use these CEs distributed between Padua and Bologna:
    • 3 CEs SL5/64b with cream version 1.12 (2 lsf + 1 torque)
    • 4 CEs SL4 with cream version 1.11 (2 lsf + 2 torque)
    • 11 CEs SL4 with cream version 1.12 (5 lsf + 6 torque)
  • Use automatic-delegation
  • The job is a "sleep random(3600)"
  • Resubmission is enabled
  • Use proxy renewal service (myproxy.cern.ch)
  • Lease mechanism is not used
  • Logging to LB from ICE is disabled

Submissions finish on Fri Feb 5 at 00:41:33 CET 2010

  • 1600 collections submitted in 16233 seconds: 4/10/74 (min/avg/max)

Final results

  • Collections correctly submitted: 1600 (32000 jobs)
    • DONE OK: 31328 (97.9 %)
    • NOTDONE: 0 (0 %)
    • ABORTED: 7 (0.02 %)
    • CANCELLED: 665 (2.08 %) (test interrupted)

  • Errors found (198)
    • reason=999 (22 times)
    • reason=1 [...] proxy expired (176 times)

ice11.png

10) Test starts on Tue Feb 2 at 12:38:25 CET 2010 (WMS: devel20)

Description:
  • 2400 collections each of 20 jobs
  • One collection every 30 seconds
  • Four users
  • max_ice_threads = 10
  • We use these CEs distributed between Padua and Bologna:
    • 3 CEs SL5/64b with cream version 1.12 (2 lsf + 1 torque)
    • 4 CEs SL4 with cream version 1.11 (2 lsf + 2 torque)
    • 11 CEs SL4 with cream version 1.12 (5 lsf + 6 torque)
  • Use automatic-delegation
  • The job is a "sleep random(3600)"
  • Resubmission is enabled
  • Use proxy renewal service (myproxy.cern.ch)
  • Lease mechanism is not used

Submissions finish on Wed Feb 3 at 08:59:20 CET 2010

  • 2397 collections submitted in 39661 seconds: 4/16/99 (min/avg/max)
    • 3 submissions fail (due to limiter "FTP connections")

Final results

  • Collections correctly submitted: 2397 (47940 jobs)
    • DONE OK: 47728 (99.56 %)
    • NOTDONE: 20* (0.04 %)
    • ABORTED: 17** (0.04 %)
    • CANCELLED: 175*** (0.36 %) (jobs hold in torque system)
    • Resubmitted: 210 (0.44 %)

  • Errors found (213****)
    • reason=999*** (163 times)
    • reason=127; /opt/lcg/libexec/jobwrapper: line 42: ./CREAM500950657_jobWrapper.sh: No such file or directory (1 time)
    • reason=255 (1 time)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception EOF detected during communication. Probably service closed connection or SOCKET TIMEOUT occurred. (3 times)
    • The endpoint is blacklisted (43 times)
    • Transfer to CREAM failed due to exception: CREAM Register returned error "MethodName=[jobRegister] Timestamp=[Wed 03 Feb 2010 01:27:19] ErrorCode=[0] Description=[cannot store the delegation proxy locally] FaultCause=[Cannot run program "chmod": java.io.IOException: error=12, Cannot allocate memory]" (2 times)

ice10.png

Note:

* The "NOT TERMINATED" jobs are distributed in this way:

  • 1 Collections (i.e. 20 jobs) stucked on wmproxy (midnight problem)

** All jobs are aborted due to "Input sandbox's proxy is missing. Cannot resubmit job". Probably proxyrenewal daemon arrives late to renew collection's proxy.

*** The jobs cancelled are jobs blocked in the pbs queues (in these cases the reported error due to a qdel done by the sysadmin is "reason=999")

**** Quite all errors occur on the same CE: cream-34.pd.infn.it (a 1.11 cream ce).

9) Test starts on Wed Jan 27 at 07:59:34 CET 2010 (WMS: devel20)

Description:
  • 2400 collections each of 20 jobs
  • One collection every 30 seconds
  • Four users
  • max_ice_threads = 10
  • We use these CEs distributed between Padua and Bologna:
    • 4 CEs SL4 with cream version 1.11 (2 lsf + 2 torque)
    • 3 CEs SL5/64b with cream version 1.12 (2 lsf + 1 torque)
    • 12 CEs SL4 with cream version 1.12 (6 lsf + 6 torque)
  • Use automatic-delegation
  • The job is a "sleep random(3600)"
  • Resubmission is enabled
  • Use proxy renewal service (myproxy.cern.ch)
  • Lease mechanism is not used

Submissions finish on Thu Jan 28 at 04:24:31 CET 2010

  • 2397 collections submitted in 43027 seconds: 4/17/102 (min/avg/max)
    • 3 submissions fail

Final results

  • Collections correctly submitted: 2397 (47940 jobs)
    • DONE OK: 47592 (99.27 %)
    • NOTDONE: 85* (0.18 %)
    • ABORTED: 0 (0 %)
    • CANCELLED: 263** (0.55 %) (jobs hold in torque system)
    • Resubmitted: 369 (0.77 %)

  • Errors found (379)
    • BLAH error ... (20 times)
    • Cannot move ISB ... (3 times)
    • Cannot take token (13 times)
    • reason=127 ... (3 times)
    • Problem to detect the lifetime of the proxy ... (5 times)
    • reason=1 (1 time)
    • reason=255 (1 time)
    • reason=999** (279 times)
    • SOCKET TIMEOUT occurred ... (3 times)
    • The endpoint is blacklisted ... (51 times)

ice09.png

qe09.png

Note:

* The "NOT TERMINATED" are distributed in this way:

  • 2 Collections (i.e. 40 jobs) stucked on wmproxy (midnight problem)
  • 43 jobs are stucked on pbs queue (see bug #62070)

** The jobs cancelled are jobs blocked in the pbs queues (in these cases the reported error due to a qdel done by the sysadmin is "reason=999")

8) Test starts on Tue Nov 17 at 11:56:53 CEST 2009 (WMS: devel20)

Description:
  • 4000 collections each of 20 jobs
  • One collection every 30 seconds
  • Four users
  • max_ice_threads = 10
  • Use all the CEs of testbedA
  • Use automatic-delegation
  • The job is a "sleep random(900)"
  • Resubmission is enabled
  • Use proxy renewal service (myproxy.cern.ch)
  • Lease mechanism is not used

Submissions finish on Wed Nov 18 at 21:23:06 CEST 2009

  • 3997 collections submitted in 35909 seconds: 2/8/70 (min/avg/max)
    • 3 submission(s) fail(s)

Final results taken on Mon Nov 19 at 12:09:23 CEST 2009

  • Collections correctly submitted: 3997 (79940 jobs)
    • DONE OK: 79798 (99.82 %)
    • NOTDONE: 0 (0 %)
    • ABORTED: 0 (0 %)
    • CANCELLED: 142 (0.18 %) (jobs hold in torque system)
    • Resubmitted: 13415 (16.78 %)

  • Errors found (13230)
    • Cannot take token (22 times)
    • reason=1 (14 times)
    • reason=127; /opt/lcg/libexec/jobwrapper: line 42: ./CREAM927980874_jobWrapper.sh: No such file or directory (1 time)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception EOF detected during communication. Probably service closed connection or SOCKET TIMEOUT occurred. (527 times)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception Received NULL fault; the error is due to another cause: FaultString=[] - FaultCode=[SOAP-ENV:Server.generalException] - FaultSubCode=[SOAP-ENV:Server.generalException] - FaultDetail=[invoke2009-11-17T20:59:46.997Z0cannot write the authN proxy to file: nullcannot write the authN proxy to file: nullorg.glite.ce.faults.AuthenticationFaultcream-04.pd.infn.it] (1 time)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception The endpoint is blacklisted (12602 times)
    • Transfer to CREAM failed due to exception: CREAM Start raised exception The endpoint is blacklisted (63 times)

ice08.png

7) Test starts on Fri Nov 13 16:13:38 CEST 2009 (WMS: devel20)

Description:
  • 4000 collections each of 20 jobs
  • One collection every 30 seconds
  • Four users
  • max_ice_threads = 10
  • Use all the CEs of testbedA
  • Use automatic-delegation
  • The job is a "sleep random(300)"
  • Resubmission is enabled
  • Use proxy renewal service (myproxy.cern.ch)
  • Lease mechanism is not used

  • Changes in the software wrt previous test:
    • CEs
      • Update to new release candidate version 1.12

Submissions finish on Sun Nov at 15 01:33:23 CEST 2009

  • 3443 collections submitted in 31829 seconds: 2/9/44 (min/avg/max)
    • 557 submission(s) fail(s)

Final results taken on Mon Nov 16 at 10:09:23 CEST 2009

  • Collections correctly submitted: 3443 (68860 jobs)
    • DONE OK: 60522 (87.89 %)
    • NOTDONE: 6727 (9.77 %)
    • ABORTED: 1611 (2.34 %)
    • Resubmitted: 26726 (38.81 %)

  • Errors found (44031)
    • Cannot take token (6 times)
    • reason=1 (7 times)
    • reason=127 (3 times)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception EOF detected during communication. Probably service closed connection or SOCKET TIMEOUT occurred (185 times)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception MethodName=[jobRegister] ErrorCode=[0] Description=[The CREAM service cannot accept jobs at the moment] FaultCause=[Threshold for Load Average(15 min): 20 => Detected value for Load Average(15 min): (289 times)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception The endpoint is blacklisted (43432 times)
    • Transfer to CREAM failed due to exception: CREAM Start raised exception The endpoint is blacklisted (108 times)
    • Transfer to CREAM failed due to exception: CREAM Start raised exception Received NULL fault; the error is due to another cause: FaultString=[] - FaultCode=[SOAP-ENV:Server.generalException] - FaultSubCode=[SOAP-ENV:Server.generalException] - FaultDetail=[invoke2009-11-14T14:00:38.733Z0cannot write the authN proxy to file: nullcannot write the authN proxy to file: nullorg.glite.ce.faults.AuthenticationFaultcream-04.pd.infn.it] (1 time)

ice07.png

6) Test starts on Fri Oct 30 at 15:23:47 CEST 2009 (WMS: devel20)

Description:
  • 7200 collections each of 40 jobs
  • One collection every 60 seconds
  • Four users
  • max_ice_threads = 10
  • Use all the CEs of testbedB (i.e. Production CEs 1.11, query event is not implemented)
  • Use automatic-delegation
  • Use proxy renewal service (myproxy.cern.ch)
  • The job is a "sleep random(2447)"
  • Resubmission is enabled
  • Lease mechanism is not used

Submissions finish on Wed Nov 4 at 15:56:11 CEST 2009

  • 7091 collections submitted in 164788 seconds: 4/23/117 (min/avg/max)
    • 109 submission(s) fail(s)

Final results taken on Fri Nov 06 at 12:08:43 CEST 2009

  • Collections correctly submitted: 7091 (283640 jobs)
    • DONE OK: 275956 (97.29 %)
    • NOTDONE: 4823 (1.7 %) *
    • ABORTED: 8 (~0%)
    • CANCELLED: 2853 (1.01 %) **
    • Resubmitted: 2933 (1.03 %)

  • Errors found (3972)
    • blah error: send command timeout (50 times)
    • BLAH error: submission command failed (exit code = -15) (stdout:) (stderr: exe_getouterr: 200 seconds timeout expired, killing child process.- killed by signal 15.-) N/A (jobId = CREAM110305536) (1 time)
    • BLAH error: submission command failed (exit code = 1) (stdout:) (stderr:Bad host name, host group name or cluster name. Job not submitted.-TERM environment variable not set.- execute_cmd: poll() got an unknown event (stdout 0x0010 - stderr: 0x0000).-) N/A (jobId = CREAM198982235) (1 time)
    • BLAH error: submission command failed (exit code = 1) (stdout:) (stderr:Cannot connect to default server host 'cream-32.pd.infn.it' - check pbs_server daemon.-qsub: cannot connect to server cream-32.pd.infn.it (errno=111)-TERM environment variable not set.-) N/A (jobId = CREAM946959077) (1 time)
    • BLAH error: submission command failed (exit code = 1) (stdout:) (stderr:Failed in an LSF library call: Failed in sending/receiving a message: Connection reset by peer. Job not submitted.-TERM environment variable not set.-) N/A (jobId = CREAM166499182) (1 time)
    • BLAH error: submission command failed (exit code = 1) (stdout:) (stderr:Master batch daemon internal error. Job not submitted.-TERM environment variable not set.-) N/A (jobId = CREAM105778508) (7 times)
    • Cannot move ISB (1820 times)
    • Cannot take token (190 times)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception EOF detected during communication. Probably service closed connection or SOCKET TIMEOUT occurred. (6 times)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception Received NULL fault; the error is due to another cause: FaultString=[Client fault] - FaultCode=[SOAP-ENV:Client] - FaultSubCode=[SOAP-ENV:Client] (3 times) * Transfer to CREAM failed due to exception: CREAM Register raised std::exception The endpoint is blacklisted (40 times)
    • lsf_reason=32512; /opt/lcg/libexec/jobwrapper: line 42: ./CREAM391495093_jobWrapper.sh: No such file or directory (12 times)
    • lsf_reason=-1 (5 times)
    • lsf_reason=2 (1 time)
    • pbs_reason=-1 (1616 times)
    • pbs_reason=1 (8 times)
    • reason=1; /opt/edg/libexec/edg-gridftp-base-rm: error globus_ftp_client: the server responded with an error 500 500-Command failed : System error in unlink: No such file or directory 500-A system call failed: No such file or directory 500 End. Cannot take token (15 times)
    • reason=127; /opt/lcg/libexec/jobwrapper: line 42: ./CREAM077961558_jobWrapper.sh: No such file or directory (1 time)
    • reason=999 (194 times)

Note:

* The "NOT TERMINATED" are distributed in this way:

  • 1000 Collections (i.e. 4000 jobs) failed to be submitted (by WM) with reason request expired
  • 361 jobs are running
  • 462 Done (FAILED)

** Jobs has been cancelled from pbs queue because maui set them as "Blocked Jobs"

ice06.png

5) Test starts on Thu Oct 22 at 12:51:04 CEST 2009 (WMS: devel20)

Description:
  • 2000 collections each of 20 jobs
  • One collection every 30 seconds
  • Four users
  • max_ice_threads = 10
  • Use all the CEs of testbedA (cream-12.pd, cream-04.pd and devel03.cnaf)
  • Use automatic-delegation
  • The job is a "sleep random(2447)"
  • Resubmission is enabled
  • Use proxy renewal service (myproxy.cern.ch)
  • Lease mechanism is not used

Submissions finish on Sat Oct 23 at 05:30:36 CEST 2009

  • 1455 collections submitted in 16993 seconds: 4/11/48 (min/avg/max)
    • 545 submission(s) fail(s)

Final results taken on Thu Oct 23 at 16:08:43 CEST 2009

  • Collections correctly submitted: 1455 (29100 jobs)
    • DONE OK: 26714 (91.8 %)
    • NOTDONE: 168 (0.58 %)
    • ABORTED: 2218 (7.62 %)
    • Resubmitted: 4101 (14.09 %)

  • Errors found (1758)
    • BLAH error: no jobId in submission script's output (stdout:) (stderr: execute_cmd: 200 seconds timeout expired, killing child process.-) N/A (jobId = xxx) (27 times)
    • blah error: send command timeout (21 times)
    • Cannot move ISB (${globus_transfer_cmd} gsiftp://devel20.cnaf.infn.it:2811...... ): proxy expired (1 time)
    • Cannot take token (39 times)
    • reason=999 (1 time)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception Connection to service [https://cream-04.pd.infn.it:8443/ce-cream/services/CREAM2] failed: (852 times)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception EOF detected during communication. Probably service closed connection or SOCKET TIMEOUT occurred (54 times)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception MethodName=[invoke] ErrorCode=[0] Description=[Authorization error: Cannot set permissions to the store proxy certificate] FaultCause=[Authorization error: Cannot set permissions to the store proxy certificate] Timestamp=[Fri 23 Oct 2009 04:31:45] (21 times)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception MethodName=[invoke] ErrorCode=[0] Description=[Authorization error: Cannot store proxy certificate] FaultCause=[Authorization error: Cannot store proxy certificate] Timestamp=[Fri 23 Oct 2009 04:32:16] (3 times)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception The endpoint is blacklisted (733 times)
    • Transfer to CREAM failed due to exception: CREAM Register returned error "MethodName=[jobRegister] Timestamp=[Fri 23 Oct 2009 04:44:45] ErrorCode=[0] Description=[system error] FaultCause=[The problem seems to be related to glexec]"_(1 time)_
    • Transfer to CREAM failed due to exception: CREAM Start raised exception The endpoint is blacklisted (5 times)

ice05.png

4) Test starts on Fri Oct 19 at 12:00:05 CEST 2009 (WMS: devel20)

Description:
  • 4000 collections each of 20 jobs
  • One collection every 30 seconds
  • Four users
  • max_ice_threads = 10
  • Use all the CEs of testbedA (cream-12.pd, cream-04.pd and devel03.cnaf)
  • Use automatic-delegation
  • The job is a "sleep random(4242)"
  • Resubmission is enabled
  • Use proxy renewal service (myproxy.cern.ch)
  • Lease mechanism is not used

Test interrupted on Tue Oct 20 at 10:29:16 CEST 2009

  • Problems with the CEs that are blacklisted
    • cream-04.pd.indn.it:
      java.net.SocketException
      MESSAGE: Too many open files
      20 Oct 2009 12:55:42,323 org.glite.voms.PKIStore - Cannot refresh store: null
    • cream-12.pd.infn.it and devel03.cnaf.infn.it probably have problems with the new BLParser

  • Restarted cream-04.pd.infn.it at 12:49 on Wed Oct 21

Partial results taken on Thu Oct 22 at 09:41:43 CEST 2009

  • Collections correctly submitted: 1832 (36640 jobs)
    • DONE OK: 30808 (- %)
    • CANCELLED: 4905 (- %)
    • ABORTED: 927 (- %)
    • Resubmitted: 7140 (- %)

  • Errors found (2970)
    • BLAH error: no jobId in submission script's output (stdout:) (stderr: execute_cmd: 200 seconds timeout expired, killing child process.-) N/A (jobId = ...) (13 times)
    • blah error: send command timeout (22 times)
    • Cannot move ISB (${globus_transfer_cmd} gsiftp://devel20.cnaf.infn.it:2811/var/glite/SandboxDir/9d/https_3a_2f_2fdevel15.cnaf.infn.it_3a9000_2f9dVthtBkKwyOaSHnSLeXSQ/input/pippo file:///home/dteam028/home_cream_638539945/CREAM638539945/pippo): Problem to detect the lifetime of the proxy (1 time)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception Connection to service [https://cream-04.pd.infn.it:8443/ce-cream/services/CREAM2] failed: (603 times)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception EOF detected during communication. Probably service closed connection or SOCKET TIMEOUT occurred. (66 times)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception MethodName=[invoke] ErrorCode=[0] Description=[Authorization error: Cannot store proxy certificate] FaultCause=[Authorization error: Cannot store proxy certificate] Timestamp=[Tue 20 Oct 2009 05:33:28] (3 times)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception Received NULL fault; the error is due to another cause: FaultString=[] - FaultCode=[SOAP-ENV:Server.generalException] - FaultSubCode=[SOAP-ENV:Server.generalException] - FaultDetail=[invoke2009-10-19T16:08:21.143Z0cannot write the authN proxy to file: nullcannot write the authN proxy to file: nullorg.glite.ce.faults.AuthenticationFaultcream-12.pd.infn.it] (2 times)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception The endpoint is blacklisted (1740 times)
    • Transfer to CREAM failed due to exception: CREAM Start raised exception The endpoint is blacklisted (9 times)
    • Transfer to CREAM failed due to exception: Failed to create a delegation id for job https://devel15.cnaf.infn.it:9000/01bDqoEMYLtCgBJAwkGVBQ: reason is Connection to service [https://cream-04.pd.infn.it:8443/ce-cream/services/gridsite-delegation] failed: (511 times)

ice04.png

3) Test starts on Fri Oct 16 at 13:50:05 CEST 2009 (WMS: devel20)

Description:
  • 2000 collections each of 25 jobs
  • One collection every 30 seconds
  • Four users
  • max_ice_threads = 10
  • Use all the CEs of testbedA (cream-12.pd, cream-04.pd and devel03.cnaf)
  • Use automatic-delegation
  • The job is a "sleep 666"
  • Resubmission is enabled
  • Use proxy renewal service (myproxy.cern.ch)
  • Lease mechanism is not used

Submissions finish on Sat Oct 17 at 06:29:16 CEST 2009

  • 1746 collections correctly submitted in 15578 seconds: 4/8/25 (min/avg/max)
    • 254 submissions failures sue to load limiter

Final results taken on Mon Oct 19 at 09:41:43 CEST 2009

  • Collections correctly submitted: 1746 (43650 jobs)
    • DONE OK: 43642 (99.98 %)
    • CANCELLED: 8 (0.02%)
    • Resubmitted: 961 (2.2%)

  • Errors found (1097)
    • BLAH error: no jobId in submission script's output (stdout:) (stderr: execute_cmd: 200 seconds timeout expired, killing child process.-) N/A (jobId = xxxxx) (91 times)
    • BLAH error: submission command failed (exit code = 201) (stdout:) (stderr:[gLExec]: gLExec has detected an input file change during the use of the file. It's unknown if this file-jacking was accidental or intentional.-) N/A (jobId = xxxxx) (1 time)
    • Cannot move ISB: proxy expired (1 time)
    • Cannot take token (105 times)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception EOF detected during communication. Probably service closed connection or SOCKET TIMEOUT occurred. (42 times)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception Received NULL fault; the error is due to another cause: FaultString=[] - FaultCode=[SOAP-ENV:Server.generalException] - FaultSubCode=[SOAP-ENV:Server.generalException] - FaultDetail=[invoke</MethodName.2009-10-16T15:38:04.085Z0cannot write the authN proxy to file: nullcannot write the authN proxy to file: nullorg.glite.ce.faults.AuthenticationFaultcream-12.pd.infn.it] (1 time)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception The endpoint is blacklisted (856 times)

ice03.png

2) Test starts on Wed Oct 15 at 12:21:53 CEST 2009 (WMS: devel20)

Description:
  • 2000 collections each of 25 jobs
  • One collection every 30 seconds
  • Four users
  • max_ice_threads = 10
  • Use all the CEs of testbedA (cream-12.pd, cream-04.pd and devel03.cnaf)
  • Use automatic-delegation
  • The job is a "sleep 666"
  • Resubmission is enabled
  • Lease mechanism is not used

  • Changes in the software wrt previous test:
    • WMS

Submissions finish on Wed Oct 16 at 05:22:55 CEST 2009

  • 1834 collections correctly submitted
    • 166 submissions failures

Final results taken on Thu Oct 16 10:26:21 CEST 2009

  • Collections correctly submitted: 1834 (45850 jobs)
    • DONE OK: 42868 (-%)
    • ABORTED: 2976 (-%) *
    • Resubmitted: 2976+8 (-%)

  • Errors found
    • Cannot take token (19 times)
    • BLAH error: submission command failed (exit code = 201) (stdout:) (stderr:[gLExec]: gLExec has detected an input file change during the use of the file. It's unknown if this file-jacking was accidental or intentional.- execute_cmd: poll() got an unknown event (stdout 0x0010 - stderr: 0x0000).-) N/A (jobId = CREAM603493778) (https://devel15.cnaf.infn.it:9000/hzShPwSsvd6S_1kQ2XsbvA)
    • Proxy is expired

ice02.png

Note:

* All the aborted are due to "proxy expired" reason because I forgot to activate proxy renewal service.

1) Test starts on Wed Oct 14 at 15:04:19 CEST 2009 (WMS: devel20)

Description:
  • 400 collections each of 25 jobs
  • One collection every 30 seconds
  • Four users
  • max_ice_threads = 10
  • Use all the CEs of testbedA (cream-12.pd, cream-04.pd and devel03.cnaf)
  • Use automatic-delegation
  • The job is a "sleep 666"
  • Resubmission is enabled
  • Lease mechanism is not used

Submissions finish on Wed Oct 14 at 18:22:55 CEST 2009

  • 400 collections submitted in 2371 seconds: 3/5/15 (min/avg/max)

Final results taken on Thu Oct 15 10:26:21 CEST 2009

  • Collections correctly submitted: 400 (10000 jobs)
    • DONE OK: 10000 (100%)
    • Resubmitted: 3 (0.03%)

  • Errors found (3)
    • Cannot take token (3 times)

ice01.png

-- AlessioGianelle - 13 Oct 2009

Topic attachments
I Attachment Action Size Date Who Comment
Unknown file formatjobid DoneFailed.jobid manage 25.7 K 2009-11-06 - 15:37 AlessioGianelle DoneFailed_06
Unknown file formatjobid Running06.jobid manage 38.5 K 2009-11-06 - 15:41 AlessioGianelle Running_06
PNGpng ice01.png manage 6.1 K 2009-10-15 - 08:34 AlessioGianelle Ice graph. Test 01
PNGpng ice02.png manage 5.9 K 2009-10-16 - 11:04 AlessioGianelle Ice graph. Test 02
PNGpng ice03.png manage 5.7 K 2009-10-19 - 08:28 AlessioGianelle Ice graph. Test 03
PNGpng ice04.png manage 6.3 K 2009-10-21 - 14:07 AlessioGianelle Ice graph. Test 04
PNGpng ice05.png manage 8.0 K 2009-10-23 - 15:22 AlessioGianelle Ice graph. Test 05
PNGpng ice06.png manage 7.8 K 2009-11-06 - 12:15 AlessioGianelle Ice graph. Test 06
PNGpng ice07.png manage 5.5 K 2009-11-16 - 15:50 AlessioGianelle Ice graph. Test 07
PNGpng ice08.png manage 18.8 K 2009-11-19 - 17:14 AlessioGianelle Ice graph. Test 08
PNGpng ice09.png manage 8.1 K 2010-02-02 - 11:19 AlessioGianelle Ice graph. Test 09
PNGpng ice10.png manage 10.7 K 2010-02-04 - 10:55 AlessioGianelle Ice graph. Test 10
PNGpng ice11.png manage 5.4 K 2010-02-05 - 11:47 AlessioGianelle Ice graph. Test 11
PNGpng ice12.png manage 12.8 K 2010-02-11 - 09:51 AlessioGianelle Ice graph. Test 12
PNGpng qe09.png manage 8.7 K 2010-02-01 - 13:13 AlessioGianelle Query events test 09
Edit | Attach | PDF | History: r48 | r39 < r38 < r37 < r36 | Backlinks | Raw View | More topic actions...
Topic revision: r37 - 2010-02-12 - AlessioGianelle
 
  • Edit
  • Attach
This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback