Difference: WmsTestsICE4 (1 vs. 48)

Revision 482011-02-24 - AlessioGianelle

Line: 1 to 1
Changed:
<
<
META TOPICPARENT name="TestWokPlan"
>
>
META TOPICPARENT name="TestPage"
 

TESTs on ICE (Query Event)

16) Test starts on Wed Mar 10 at 16:13:15 CET 2010 (WMS: devel20)

Revision 472010-03-15 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"

TESTs on ICE (Query Event)

Line: 29 to 29
 
    • 14 submissions fails
Added:
>
>

Final results

  • Collections correctly submitted: 3551 (142040 jobs)
    • DONE OK: 142039 (99.99%)
    • NOTDONE: 0 (0 %)
    • ABORTED: 1 ( - %)
    • CANCELLED: 0 (0 %)
    • Resubmitted: 5 ( - %)

Note:

  • 755 Collections fail to be submitted by the workload-manager due to:
     11 Mar, 13:51:43 -E: [Error] unrecoverable_collection(submit_request.cpp:93): https://devel15.cnaf.infn.it:9000/otaMq9ix-WGKwJLEcXeoAQ: unable to retrieve children information from jobstatus
    11 Mar, 13:51:43 -E: [Error] unrecoverable(submit_request.cpp:111): https://devel15.cnaf.infn.it:9000/otaMq9ix-WGKwJLEcXeoAQ failed (request expired)

ice16.png

QE16.png

 
Line: 722 to 748
 
META FILEATTACHMENT attachment="qe.png" attr="" comment="Query events test 14" date="1267187905" name="qe.png" path="qe.png" size="13944" stream="qe.png" tmpFilename="/usr/tmp/CGItemp7174" user="AlessioGianelle" version="1"
META FILEATTACHMENT attachment="ice15.png" attr="" comment="Ice graph. Test 15" date="1268064515" name="ice15.png" path="ice15.png" size="6219" stream="ice15.png" tmpFilename="/usr/tmp/CGItemp9753" user="AlessioGianelle" version="2"
META FILEATTACHMENT attachment="qe15.png" attr="" comment="Query events test 15" date="1268062026" name="qe15.png" path="qe15.png" size="12248" stream="qe15.png" tmpFilename="/usr/tmp/CGItemp7058" user="AlessioGianelle" version="1"
Added:
>
>
META FILEATTACHMENT attachment="ice16.png" attr="" comment="Ice graph. Test 16" date="1268657615" name="ice16.png" path="ice16.png" size="5473" stream="ice16.png" tmpFilename="/usr/tmp/CGItemp7178" user="AlessioGianelle" version="1"
META FILEATTACHMENT attachment="QE16.png" attr="" comment="Query events test 16" date="1268657979" name="QE16.png" path="QE16.png" size="19120" stream="QE16.png" tmpFilename="/usr/tmp/CGItemp9555" user="AlessioGianelle" version="2"

Revision 462010-03-15 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"

TESTs on ICE (Query Event)

Line: 24 to 24
 
Added:
>
>

Submissions finish on Sun Mar 14 at 15:01:38 CET 2010

  • 4306 collections submitted in 268194 seconds: 5/62/529 (min/avg/max)
    • 14 submissions fails

 

15) Test starts on Fri Feb 26 at 15:23:29 CET 2010 (WMS: devel20)

Description:
  • 14400 collections each of 20 jobs

Revision 452010-03-11 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"

TESTs on ICE (Query Event)

Added:
>
>

16) Test starts on Wed Mar 10 at 16:13:15 CET 2010 (WMS: devel20)

Description:
  • 4320 collections each of 40 jobs
  • One collection every 60 seconds
  • Four users
  • max_ice_threads = 10
  • We use these CEs located at Padua and CNAF:
    • 6 CEs SL5/64b with cream version 1.12 (2 lsf + 4 torque)
    • 11 CEs SL4 with cream version 1.12 (5 lsf + 6 torque)
  • Use automatic-delegation
  • The job is a "sleep random(7200)"
  • Resubmission is enabled
  • Use proxy renewal service (myproxy.cern.ch)
  • Lease mechanism is not used

  • Changes in the software wrt previous test:
    • Update QueryEvents mechanism introducing parallelism

 

15) Test starts on Fri Feb 26 at 15:23:29 CET 2010 (WMS: devel20)

Description:
  • 14400 collections each of 20 jobs
  • One collection every 30 seconds
  • Four users
  • max_ice_threads = 10
Changed:
<
<
  • We use these CEs located at Padua:
>
>
  • We use these CEs located at Padua and CNAF:
 
    • 6 CEs SL5/64b with cream version 1.12 (2 lsf + 4 torque)
    • 4 CEs SL4 with cream version 1.11 (2 lsf + 2 torque)
    • 11 CEs SL4 with cream version 1.12 (5 lsf + 6 torque)

Revision 442010-03-08 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"

TESTs on ICE (Query Event)

Line: 24 to 24
 
    • 100 submissions fails
Added:
>
>

Final results

  • Collections correctly submitted: 13994 (279880 jobs)
    • DONE OK: 278744 (99.594%)
    • NOTDONE: 0 (0 %)
    • ABORTED: 10 (0.004 %)
    • CANCELLED: 1126 (0.402 %) (Stucked in torque queues)
    • Resubmitted: 832 (0.3 %)

  • Errors found (921)
    • BLAH error: (358 times)
      • submission command failed (exit code = 1) (stdout:) (stderr:Connection timed out-qsub: cannot connect to server devel03.cnaf.infn.it (errno=110) Connection timed out-) N/A
      • no jobId in submission script's output (stdout:) (stderr: execute_cmd: 200 seconds timeout expired, killing child process.-)
      • submission command failed (exit code = 1) (stdout:) (stderr:Failed in an LSF library call: Error 0. Job not submitted.-)
      • submission command failed (exit code = 1) (stdout:) (stderr:pbs_iff: cannot read reply from pbs_server-No Permission.-qsub: cannot connect to server devel03.cnaf.infn.it (errno=15007) Unauthorized Request -)
    • Cannot move OSB ... proxy expired (93 times)
    • Cannot take token (6 times)
    • reason=1 ... proxy expired (9 times)
    • reason=999 (453 times)
    • Transfer to CREAM failed due to exception.. (2 times)

ice15.png

qe15.png

Note:

  • 306 Collections (i.e. 6120 jobs) stucked on wmproxy
  • All jobs are aborted due to "Input sandbox's proxy is missing. Cannot resubmit job". Probably proxyrenewal daemon arrives late to renew collection's proxy.
 

14) Test starts on Wed Feb 24 at 17:35:29 CET 2010 (WMS: devel20)

Description:
  • 2880 collections each of 20 jobs
Line: 661 to 690
 
META FILEATTACHMENT attachment="ice13.png" attr="" comment="Ice graph. Test 13" date="1266231953" name="ice13.png" path="ice13.png" size="5550" stream="ice13.png" tmpFilename="/usr/tmp/CGItemp7124" user="AlessioGianelle" version="1"
META FILEATTACHMENT attachment="ice14.png" attr="" comment="Ice graph. Test 14" date="1267187773" name="ice14.png" path="ice14.png" size="5879" stream="ice14.png" tmpFilename="/usr/tmp/CGItemp6836" user="AlessioGianelle" version="1"
META FILEATTACHMENT attachment="qe.png" attr="" comment="Query events test 14" date="1267187905" name="qe.png" path="qe.png" size="13944" stream="qe.png" tmpFilename="/usr/tmp/CGItemp7174" user="AlessioGianelle" version="1"
Added:
>
>
META FILEATTACHMENT attachment="ice15.png" attr="" comment="Ice graph. Test 15" date="1268064515" name="ice15.png" path="ice15.png" size="6219" stream="ice15.png" tmpFilename="/usr/tmp/CGItemp9753" user="AlessioGianelle" version="2"
META FILEATTACHMENT attachment="qe15.png" attr="" comment="Query events test 15" date="1268062026" name="qe15.png" path="qe15.png" size="12248" stream="qe15.png" tmpFilename="/usr/tmp/CGItemp7058" user="AlessioGianelle" version="1"

Revision 432010-03-08 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"

TESTs on ICE (Query Event)

Line: 11 to 11
 
  • We use these CEs located at Padua:
    • 6 CEs SL5/64b with cream version 1.12 (2 lsf + 4 torque)
    • 4 CEs SL4 with cream version 1.11 (2 lsf + 2 torque)
Changed:
<
<
    • 110 CEs SL4 with cream version 1.12 (5 lsf + 6 torque)
>
>
    • 11 CEs SL4 with cream version 1.12 (5 lsf + 6 torque)
 
  • Use automatic-delegation
  • The job is a "sleep random(7200)"
  • Resubmission is enabled

Revision 422010-03-05 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"

TESTs on ICE (Query Event)

Changed:
<
<

15) Test starts on Fri Feb 26 at 15:30:29 CET 2010 (WMS: devel20)

>
>

15) Test starts on Fri Feb 26 at 15:23:29 CET 2010 (WMS: devel20)

  Description:
  • 14400 collections each of 20 jobs
  • One collection every 30 seconds
Line: 19 to 19
 
  • Lease mechanism is not used
Added:
>
>

Submissions finish on Thu Mar 4 at 23:49:18 CET 2010

  • 14300 collections submitted in 492083 seconds: 4/34/228 (min/avg/max)
    • 100 submissions fails
 

14) Test starts on Wed Feb 24 at 17:35:29 CET 2010 (WMS: devel20)

Description:
  • 2880 collections each of 20 jobs

Revision 412010-03-01 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"

TESTs on ICE (Query Event)

Added:
>
>

15) Test starts on Fri Feb 26 at 15:30:29 CET 2010 (WMS: devel20)

Description:
  • 14400 collections each of 20 jobs
  • One collection every 30 seconds
  • Four users
  • max_ice_threads = 10
  • We use these CEs located at Padua:
    • 6 CEs SL5/64b with cream version 1.12 (2 lsf + 4 torque)
    • 4 CEs SL4 with cream version 1.11 (2 lsf + 2 torque)
    • 110 CEs SL4 with cream version 1.12 (5 lsf + 6 torque)
  • Use automatic-delegation
  • The job is a "sleep random(7200)"
  • Resubmission is enabled
  • Use proxy renewal service (myproxy.cern.ch)
  • Lease mechanism is not used
 

14) Test starts on Wed Feb 24 at 17:35:29 CET 2010 (WMS: devel20)

Description:
  • 2880 collections each of 20 jobs

Revision 402010-02-26 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"

TESTs on ICE (Query Event)

Added:
>
>

14) Test starts on Wed Feb 24 at 17:35:29 CET 2010 (WMS: devel20)

Description:
  • 2880 collections each of 20 jobs
  • One collection every 30 seconds
  • Four users
  • max_ice_threads = 10
  • We use these CEs located at Padua:
    • 5 CEs SL5/64b with cream version 1.12 (2 lsf + 3 torque)
    • 4 CEs SL4 with cream version 1.11 (2 lsf + 2 torque)
    • 10 CEs SL4 with cream version 1.12 (5 lsf + 5 torque)
  • Use automatic-delegation
  • The job is a "sleep random(7200)"
  • Resubmission is enabled
  • Use proxy renewal service (myproxy.cern.ch)
  • Lease mechanism is not used

  • Changes in the software wrt previous test:
    • Update QueryEvents mechanism (only useful events are send by the CEs)

Submissions finish on Thu Feb 25 at 17:34:45 CET 2010

  • 2878 collections submitted in 18140 seconds: 3/6/42 (min/avg/max)
    • 2 submissions fails

Final results

  • Collections correctly submitted: 2878 (57560 jobs)
    • DONE OK: 57297 (99.54%)
    • NOTDONE: 0 (0 %)
    • ABORTED: 9 (0.02 %)
    • CANCELLED: 254 (0.44 %) (Stucked in pbs queues)
    • Resubmitted: 79 (0.14 %)

  • Errors found (90)
    • BLAH error: submission command failed (exit code = 1) (stdout:) (stderr:Failed in an LSF library call: Error 0. Job not submitted.-TERM environment variable not set.-) N/A (3 times)
    • Cannot move ISB ... proxy expired (33 times)
    • Cannot move OSB ... proxy expired (40 times)
    • reason=1 (4 times)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception Received NULL fault; the error is due to another cause: FaultString=[] - FaultCode=[SOAP-ENV:Server.generalException] - FaultSubCode=[SOAP-ENV:Server.generalException] - FaultDetail=[invoke2010-02-24T18:14:38.863Z0cannot write the authN proxy to file: nullcannot write the authN proxy to file: nullorg.glite.ce.faults.AuthenticationFaultcream-26.pd.infn.it] (1 time)
    • Input sandbox's proxy is missing. Cannot resubmit job (9 times) This is the reason for the aborted jobs (See bugs #52710 and #43577 )

ice14.png

qe.png

 

13) Test starts on Thu Feb 11 at 15:48:14 CET 2010 (WMS: devel20)

Description:
  • 6000 collections each of 20 jobs
Line: 589 to 637
 
META FILEATTACHMENT attachment="ice11.png" attr="" comment="Ice graph. Test 11" date="1265370447" name="ice11.png" path="ice11.png" size="5502" stream="ice11.png" tmpFilename="/usr/tmp/CGItemp4827" user="AlessioGianelle" version="1"
META FILEATTACHMENT attachment="ice12.png" attr="" comment="Ice graph. Test 12" date="1265881911" name="ice12.png" path="ice12.png" size="13089" stream="ice12.png" tmpFilename="/usr/tmp/CGItemp16702" user="AlessioGianelle" version="1"
META FILEATTACHMENT attachment="ice13.png" attr="" comment="Ice graph. Test 13" date="1266231953" name="ice13.png" path="ice13.png" size="5550" stream="ice13.png" tmpFilename="/usr/tmp/CGItemp7124" user="AlessioGianelle" version="1"
Added:
>
>
META FILEATTACHMENT attachment="ice14.png" attr="" comment="Ice graph. Test 14" date="1267187773" name="ice14.png" path="ice14.png" size="5879" stream="ice14.png" tmpFilename="/usr/tmp/CGItemp6836" user="AlessioGianelle" version="1"
META FILEATTACHMENT attachment="qe.png" attr="" comment="Query events test 14" date="1267187905" name="qe.png" path="qe.png" size="13944" stream="qe.png" tmpFilename="/usr/tmp/CGItemp7174" user="AlessioGianelle" version="1"

Revision 392010-02-15 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"

TESTs on ICE (Query Event)

Line: 31 to 31
 
    • 3 submissions fails
Changed:
<
<
>
>

Final results

  • Collections correctly submitted: 5997 (119940 jobs)
    • DONE OK: 117898 (98.3%)
    • NOTDONE: 0 (0 %)
    • ABORTED: 66 (0.05 %)
    • CANCELLED: 1976 (1.65 %) (Stucked in pbs queues)
    • Resubmitted: 645 (0.54 %)

  • Errors found (806)
    • Input sandbox's proxy is missing. Cannot resubmit job (66 times) This is the reason for the ABORTED jobs
    • BLAH error: submission command failed (exit code = 1) (stdout:) (stderr:Failed in an LSF library call: Error 0. Job not submitted.-TERM environment variable not set.-) (10 times)
    • Proxy expired: (687 times)
      • Cannot move ISB
      • Cannot move OSB
      • reason=1
      • reason=271
    • Transfer to CREAM failed due to exception (4 times)
    • Cannot take token _(25 times)
    • /opt/edg/libexec/edg-gridftp-base-rm: timeout exceeded (14 times)
 

ice13.png

Revision 382010-02-15 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"

TESTs on ICE (Query Event)

Line: 25 to 25
 set-variable = innodb_flush_log_at_trx_commit=2
Added:
>
>

Submissions finish on Sat Feb 13 at 17:48:25 CET 2010

  • 5997 collections submitted in 33256 seconds: 3/5/56 (min/avg/max)
    • 3 submissions fails

ice13.png

 

12) Test starts on Mon Feb 8 at 17:26:42 CET 2010 (WMS: devel20)

Description:
  • 2880 collections each of 20 jobs
Line: 559 to 572
 
META FILEATTACHMENT attachment="ice10.png" attr="" comment="Ice graph. Test 10" date="1265280924" name="ice10.png" path="ice10.png" size="10942" stream="ice10.png" tmpFilename="/usr/tmp/CGItemp4996" user="AlessioGianelle" version="1"
META FILEATTACHMENT attachment="ice11.png" attr="" comment="Ice graph. Test 11" date="1265370447" name="ice11.png" path="ice11.png" size="5502" stream="ice11.png" tmpFilename="/usr/tmp/CGItemp4827" user="AlessioGianelle" version="1"
META FILEATTACHMENT attachment="ice12.png" attr="" comment="Ice graph. Test 12" date="1265881911" name="ice12.png" path="ice12.png" size="13089" stream="ice12.png" tmpFilename="/usr/tmp/CGItemp16702" user="AlessioGianelle" version="1"
Added:
>
>
META FILEATTACHMENT attachment="ice13.png" attr="" comment="Ice graph. Test 13" date="1266231953" name="ice13.png" path="ice13.png" size="5550" stream="ice13.png" tmpFilename="/usr/tmp/CGItemp7124" user="AlessioGianelle" version="1"

Revision 372010-02-12 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"

TESTs on ICE (Query Event)

Added:
>
>

13) Test starts on Thu Feb 11 at 15:48:14 CET 2010 (WMS: devel20)

Description:
  • 6000 collections each of 20 jobs
  • One collection every 30 seconds
  • Four users
  • max_ice_threads = 10
  • We use these CEs distributed between Padua and Bologna:
    • 6 CEs SL5/64b with cream version 1.12 (2 lsf + 4 torque)
    • 4 CEs SL4 with cream version 1.11 (2 lsf + 2 torque)
    • 11 CEs SL4 with cream version 1.12 (5 lsf + 6 torque)
  • Use automatic-delegation
  • The job is a "sleep random(7200)"
  • Resubmission is enabled
  • Use proxy renewal service (myproxy.cern.ch)
  • Lease mechanism is not used

  • Changes in the software wrt previous test:
    • Use only one HD.
    • Changes in my.conf file:
      set-variable = innodb_flush_log_at_trx_commit=2
 

12) Test starts on Mon Feb 8 at 17:26:42 CET 2010 (WMS: devel20)

Description:
  • 2880 collections each of 20 jobs
Line: 149 to 172
 
  • Four users
  • max_ice_threads = 10
  • We use these CEs distributed between Padua and Bologna:
Deleted:
<
<
    • 3 CEs SL5/64b with cream version 1.12 (2 lsf + 1 torque)
 
    • 4 CEs SL4 with cream version 1.11 (2 lsf + 2 torque)
Changed:
<
<
    • 12 CEs SL4 with cream version 1.12 (6 lsf + 6 torque)
>
>
    • 3 CEs SL5/64b with cream version 1.12 (2 lsf + 1 torque)
    • 12 CEs SL4 with cream version 1.12 (6 lsf + 6 torque)
 
  • Use automatic-delegation
  • The job is a "sleep random(3600)"
  • Resubmission is enabled

Revision 362010-02-11 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"

TESTs on ICE (Query Event)

Line: 18 to 18
 
  • Use proxy renewal service (myproxy.cern.ch)
  • Lease mechanism is not used
Added:
>
>
  • Changes in the software wrt previous test:
    • Use two HDs; in the second ones we put the "persist directory" of ice (i.e. internal database), and the mysql directory (i.e. /var/lib/mysql)
    • Changes in my.conf file:
      set-variable = innodb_buffer_pool_size=1800M
      set-variable = innodb_additional_mem_pool_size=200M
      set-variable = innodb_flush_log_at_trx_commit=0
      set-variable = innodb_log_file_size=100M
      set-variable = innodb_log_group_home_dir=/var/lib/mysql_logfiles
    • LBProxy and LBServer databases have been scratched
    • All SandBox directories have been removed

 

Submissions finish on Tue Feb 9 at 17:25:27 CET 2010

  • 2876 collections submitted in 12475 seconds: 3/4/37 (min/avg/max)
Line: 29 to 44
 
    • DONE OK: 56971 (99.05 %)
    • NOTDONE: 0 (0 %)
    • ABORTED: 0 (0 %)
Changed:
<
<
    • CANCELLED: 549 (0.95 %)
>
>
    • CANCELLED: 549 (0.95 %) (Stucked in pbs queues)
 
  • Errors found (363)
    • BLAH error (9 times)
Changed:
<
<
    • proxy expired (340 times)
>
>
    • proxy expired (341 times)
 
    • Cannot take token (7 times)
    • reason=1; /opt/edg/libexec/edg-gridftp-base-rm: timeout exceeded Cannot take token (5 times)
Added:
>
>
    • Transfer to CREAM failed due to exception (1 time)

ice12.png

 

11) Test starts on Thu Feb 4 at 11:16:00 CET 2010 (WMS: devel20)

Line: 517 to 535
 
META FILEATTACHMENT attachment="qe09.png" attr="" comment="Query events test 09" date="1265030035" name="qe09.png" path="qe09.png" size="8893" stream="qe09.png" tmpFilename="/usr/tmp/CGItemp7106" user="AlessioGianelle" version="1"
META FILEATTACHMENT attachment="ice10.png" attr="" comment="Ice graph. Test 10" date="1265280924" name="ice10.png" path="ice10.png" size="10942" stream="ice10.png" tmpFilename="/usr/tmp/CGItemp4996" user="AlessioGianelle" version="1"
META FILEATTACHMENT attachment="ice11.png" attr="" comment="Ice graph. Test 11" date="1265370447" name="ice11.png" path="ice11.png" size="5502" stream="ice11.png" tmpFilename="/usr/tmp/CGItemp4827" user="AlessioGianelle" version="1"
Added:
>
>
META FILEATTACHMENT attachment="ice12.png" attr="" comment="Ice graph. Test 12" date="1265881911" name="ice12.png" path="ice12.png" size="13089" stream="ice12.png" tmpFilename="/usr/tmp/CGItemp16702" user="AlessioGianelle" version="1"

Revision 352010-02-10 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"

TESTs on ICE (Query Event)

Line: 23 to 23
 
  • 2876 collections submitted in 12475 seconds: 3/4/37 (min/avg/max)
    • 4 submissions fails
Added:
>
>

Final results

  • Collections correctly submitted: 2876 (57520 jobs)
    • DONE OK: 56971 (99.05 %)
    • NOTDONE: 0 (0 %)
    • ABORTED: 0 (0 %)
    • CANCELLED: 549 (0.95 %)

  • Errors found (363)
    • BLAH error (9 times)
    • proxy expired (340 times)
    • Cannot take token (7 times)
    • reason=1; /opt/edg/libexec/edg-gridftp-base-rm: timeout exceeded Cannot take token (5 times)
 

11) Test starts on Thu Feb 4 at 11:16:00 CET 2010 (WMS: devel20)

Description:
  • 1600 collections each of 20 jobs

Revision 342010-02-09 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"

TESTs on ICE (Query Event)

Added:
>
>

12) Test starts on Mon Feb 8 at 17:26:42 CET 2010 (WMS: devel20)

Description:
  • 2880 collections each of 20 jobs
  • One collection every 30 seconds
  • Four users
  • max_ice_threads = 10
  • We use these CEs distributed between Padua and Bologna:
    • 6 CEs SL5/64b with cream version 1.12 (2 lsf + 4 torque)
    • 4 CEs SL4 with cream version 1.11 (2 lsf + 2 torque)
    • 11 CEs SL4 with cream version 1.12 (5 lsf + 6 torque)
  • Use automatic-delegation
  • The job is a "sleep random(7200)"
  • Resubmission is enabled
  • Use proxy renewal service (myproxy.cern.ch)
  • Lease mechanism is not used

Submissions finish on Tue Feb 9 at 17:25:27 CET 2010

  • 2876 collections submitted in 12475 seconds: 3/4/37 (min/avg/max)
    • 4 submissions fails
 

11) Test starts on Thu Feb 4 at 11:16:00 CET 2010 (WMS: devel20)

Description:
  • 1600 collections each of 20 jobs

Revision 332010-02-05 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"

TESTs on ICE (Query Event)

Added:
>
>

11) Test starts on Thu Feb 4 at 11:16:00 CET 2010 (WMS: devel20)

Description:
  • 1600 collections each of 20 jobs
  • One collection every 30 seconds
  • Four users
  • max_ice_threads = 10
  • We use these CEs distributed between Padua and Bologna:
    • 3 CEs SL5/64b with cream version 1.12 (2 lsf + 1 torque)
    • 4 CEs SL4 with cream version 1.11 (2 lsf + 2 torque)
    • 11 CEs SL4 with cream version 1.12 (5 lsf + 6 torque)
  • Use automatic-delegation
  • The job is a "sleep random(3600)"
  • Resubmission is enabled
  • Use proxy renewal service (myproxy.cern.ch)
  • Lease mechanism is not used
  • Logging to LB from ICE is disabled

Submissions finish on Fri Feb 5 at 00:41:33 CET 2010

  • 1600 collections submitted in 16233 seconds: 4/10/74 (min/avg/max)

Final results

  • Collections correctly submitted: 1600 (32000 jobs)
    • DONE OK: 31328 (97.9 %)
    • NOTDONE: 0 (0 %)
    • ABORTED: 7 (0.02 %)
    • CANCELLED: 665 (2.08 %) (test interrupted)

  • Errors found (198)
    • reason=999 (22 times)
    • reason=1 [...] proxy expired (176 times)

ice11.png

 

10) Test starts on Tue Feb 2 at 12:38:25 CET 2010 (WMS: devel20)

Description:
Line: 445 to 480
 
META FILEATTACHMENT attachment="ice09.png" attr="" comment="Ice graph. Test 09" date="1265109592" name="ice09.png" path="ice09.png" size="8322" stream="ice09.png" tmpFilename="/usr/tmp/CGItemp7663" user="AlessioGianelle" version="3"
META FILEATTACHMENT attachment="qe09.png" attr="" comment="Query events test 09" date="1265030035" name="qe09.png" path="qe09.png" size="8893" stream="qe09.png" tmpFilename="/usr/tmp/CGItemp7106" user="AlessioGianelle" version="1"
META FILEATTACHMENT attachment="ice10.png" attr="" comment="Ice graph. Test 10" date="1265280924" name="ice10.png" path="ice10.png" size="10942" stream="ice10.png" tmpFilename="/usr/tmp/CGItemp4996" user="AlessioGianelle" version="1"
Added:
>
>
META FILEATTACHMENT attachment="ice11.png" attr="" comment="Ice graph. Test 11" date="1265370447" name="ice11.png" path="ice11.png" size="5502" stream="ice11.png" tmpFilename="/usr/tmp/CGItemp4827" user="AlessioGianelle" version="1"

Revision 322010-02-04 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"

TESTs on ICE (Query Event)

Line: 23 to 23
 
  • 2397 collections submitted in 39661 seconds: 4/16/99 (min/avg/max)
    • 3 submissions fail (due to limiter "FTP connections")
Added:
>
>

Final results

  • Collections correctly submitted: 2397 (47940 jobs)
    • DONE OK: 47728 (99.56 %)
    • NOTDONE: 20* (0.04 %)
    • ABORTED: 17** (0.04 %)
    • CANCELLED: 175*** (0.36 %) (jobs hold in torque system)
    • Resubmitted: 210 (0.44 %)

  • Errors found (213****)
    • reason=999*** (163 times)
    • reason=127; /opt/lcg/libexec/jobwrapper: line 42: ./CREAM500950657_jobWrapper.sh: No such file or directory (1 time)
    • reason=255 (1 time)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception EOF detected during communication. Probably service closed connection or SOCKET TIMEOUT occurred. (3 times)
    • The endpoint is blacklisted (43 times)
    • Transfer to CREAM failed due to exception: CREAM Register returned error "MethodName=[jobRegister] Timestamp=[Wed 03 Feb 2010 01:27:19] ErrorCode=[0] Description=[cannot store the delegation proxy locally] FaultCause=[Cannot run program "chmod": java.io.IOException: error=12, Cannot allocate memory]" (2 times)

ice10.png

Note:

* The "NOT TERMINATED" jobs are distributed in this way:

  • 1 Collections (i.e. 20 jobs) stucked on wmproxy (midnight problem)

** All jobs are aborted due to "Input sandbox's proxy is missing. Cannot resubmit job". Probably proxyrenewal daemon arrives late to renew collection's proxy.

*** The jobs cancelled are jobs blocked in the pbs queues (in these cases the reported error due to a qdel done by the sysadmin is "reason=999")

**** Quite all errors occur on the same CE: cream-34.pd.infn.it (a 1.11 cream ce).

 

9) Test starts on Wed Jan 27 at 07:59:34 CET 2010 (WMS: devel20)

Description:
  • 2400 collections each of 20 jobs
Line: 71 to 101
 

Note:

* The "NOT TERMINATED" are distributed in this way:

Changed:
<
<
  • 2 Collections (i.e. 40 jobs) stucked on wmproxy
>
>
  • 2 Collections (i.e. 40 jobs) stucked on wmproxy (midnight problem)
 
  • 43 jobs are stucked on pbs queue (see bug #62070)

** The jobs cancelled are jobs blocked in the pbs queues (in these cases the reported error due to a qdel done by the sysadmin is "reason=999")

Line: 414 to 444
 
META FILEATTACHMENT attachment="ice08.png" attr="" comment="Ice graph. Test 08" date="1258650880" name="ice08.png" path="ice08.png" size="19239" stream="ice08.png" tmpFilename="/usr/tmp/CGItemp10450" user="AlessioGianelle" version="5"
META FILEATTACHMENT attachment="ice09.png" attr="" comment="Ice graph. Test 09" date="1265109592" name="ice09.png" path="ice09.png" size="8322" stream="ice09.png" tmpFilename="/usr/tmp/CGItemp7663" user="AlessioGianelle" version="3"
META FILEATTACHMENT attachment="qe09.png" attr="" comment="Query events test 09" date="1265030035" name="qe09.png" path="qe09.png" size="8893" stream="qe09.png" tmpFilename="/usr/tmp/CGItemp7106" user="AlessioGianelle" version="1"
Added:
>
>
META FILEATTACHMENT attachment="ice10.png" attr="" comment="Ice graph. Test 10" date="1265280924" name="ice10.png" path="ice10.png" size="10942" stream="ice10.png" tmpFilename="/usr/tmp/CGItemp4996" user="AlessioGianelle" version="1"

Revision 312010-02-03 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"

TESTs on ICE (Query Event)

Added:
>
>

10) Test starts on Tue Feb 2 at 12:38:25 CET 2010 (WMS: devel20)

Description:
  • 2400 collections each of 20 jobs
  • One collection every 30 seconds
  • Four users
  • max_ice_threads = 10
  • We use these CEs distributed between Padua and Bologna:
    • 3 CEs SL5/64b with cream version 1.12 (2 lsf + 1 torque)
    • 4 CEs SL4 with cream version 1.11 (2 lsf + 2 torque)
    • 11 CEs SL4 with cream version 1.12 (5 lsf + 6 torque)
  • Use automatic-delegation
  • The job is a "sleep random(3600)"
  • Resubmission is enabled
  • Use proxy renewal service (myproxy.cern.ch)
  • Lease mechanism is not used

Submissions finish on Wed Feb 3 at 08:59:20 CET 2010

  • 2397 collections submitted in 39661 seconds: 4/16/99 (min/avg/max)
    • 3 submissions fail (due to limiter "FTP connections")
 

9) Test starts on Wed Jan 27 at 07:59:34 CET 2010 (WMS: devel20)

Description:
  • 2400 collections each of 20 jobs

Revision 302010-02-02 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"

TESTs on ICE (Query Event)

Line: 391 to 391
 
META FILEATTACHMENT attachment="Running06.jobid" attr="" comment="Running_06" date="1257522093" name="Running06.jobid" path="Running06.jobid" size="39380" stream="Running06.jobid" tmpFilename="/usr/tmp/CGItemp7643" user="AlessioGianelle" version="1"
META FILEATTACHMENT attachment="ice07.png" attr="" comment="Ice graph. Test 07" date="1258386654" name="ice07.png" path="ice07.png" size="5660" stream="ice07.png" tmpFilename="/usr/tmp/CGItemp7524" user="AlessioGianelle" version="1"
META FILEATTACHMENT attachment="ice08.png" attr="" comment="Ice graph. Test 08" date="1258650880" name="ice08.png" path="ice08.png" size="19239" stream="ice08.png" tmpFilename="/usr/tmp/CGItemp10450" user="AlessioGianelle" version="5"
Changed:
<
<
META FILEATTACHMENT attachment="ice09.png" attr="" comment="Ice graph. Test 09" date="1265030123" name="ice09.png" path="ice09.png" size="7534" stream="ice09.png" tmpFilename="/usr/tmp/CGItemp9630" user="AlessioGianelle" version="2"
>
>
META FILEATTACHMENT attachment="ice09.png" attr="" comment="Ice graph. Test 09" date="1265109592" name="ice09.png" path="ice09.png" size="8322" stream="ice09.png" tmpFilename="/usr/tmp/CGItemp7663" user="AlessioGianelle" version="3"
 
META FILEATTACHMENT attachment="qe09.png" attr="" comment="Query events test 09" date="1265030035" name="qe09.png" path="qe09.png" size="8893" stream="qe09.png" tmpFilename="/usr/tmp/CGItemp7106" user="AlessioGianelle" version="1"

Revision 292010-02-01 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"

TESTs on ICE (Query Event)

Line: 25 to 25
 

Final results

  • Collections correctly submitted: 2397 (47940 jobs)
    • DONE OK: 47592 (99.27 %)
Changed:
<
<
    • NOTDONE: 85 (0.18 %)
>
>
    • NOTDONE: 85* (0.18 %)
 
    • ABORTED: 0 (0 %)
Changed:
<
<
    • CANCELLED: 263 (0.55 %) (jobs hold in torque system)
>
>
    • CANCELLED: 263** (0.55 %) (jobs hold in torque system)
 
    • Resubmitted: 369 (0.77 %)

  • Errors found (379)
Changed:
<
<
>
>
    • BLAH error ... (20 times)
    • Cannot move ISB ... (3 times)
    • Cannot take token (13 times)
    • reason=127 ... (3 times)
    • Problem to detect the lifetime of the proxy ... (5 times)
    • reason=1 (1 time)
    • reason=255 (1 time)
    • reason=999** (279 times)
    • SOCKET TIMEOUT occurred ... (3 times)
    • The endpoint is blacklisted ... (51 times)
  ice09.png

qe09.png

Added:
>
>

Note:

* The "NOT TERMINATED" are distributed in this way:

  • 2 Collections (i.e. 40 jobs) stucked on wmproxy
  • 43 jobs are stucked on pbs queue (see bug #62070)

** The jobs cancelled are jobs blocked in the pbs queues (in these cases the reported error due to a qdel done by the sysadmin is "reason=999")

 

8) Test starts on Tue Nov 17 at 11:56:53 CEST 2009 (WMS: devel20)

Description:
  • 4000 collections each of 20 jobs

Revision 282010-02-01 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"

TESTs on ICE (Query Event)

Added:
>
>

9) Test starts on Wed Jan 27 at 07:59:34 CET 2010 (WMS: devel20)

Description:
  • 2400 collections each of 20 jobs
  • One collection every 30 seconds
  • Four users
  • max_ice_threads = 10
  • We use these CEs distributed between Padua and Bologna:
    • 3 CEs SL5/64b with cream version 1.12 (2 lsf + 1 torque)
    • 4 CEs SL4 with cream version 1.11 (2 lsf + 2 torque)
    • 12 CEs SL4 with cream version 1.12 (6 lsf + 6 torque)
  • Use automatic-delegation
  • The job is a "sleep random(3600)"
  • Resubmission is enabled
  • Use proxy renewal service (myproxy.cern.ch)
  • Lease mechanism is not used

Submissions finish on Thu Jan 28 at 04:24:31 CET 2010

  • 2397 collections submitted in 43027 seconds: 4/17/102 (min/avg/max)
    • 3 submissions fail

Final results

  • Collections correctly submitted: 2397 (47940 jobs)
    • DONE OK: 47592 (99.27 %)
    • NOTDONE: 85 (0.18 %)
    • ABORTED: 0 (0 %)
    • CANCELLED: 263 (0.55 %) (jobs hold in torque system)
    • Resubmitted: 369 (0.77 %)

  • Errors found (379)

ice09.png

qe09.png

 

8) Test starts on Tue Nov 17 at 11:56:53 CEST 2009 (WMS: devel20)

Description:
  • 4000 collections each of 20 jobs
Line: 337 to 373
 
META FILEATTACHMENT attachment="Running06.jobid" attr="" comment="Running_06" date="1257522093" name="Running06.jobid" path="Running06.jobid" size="39380" stream="Running06.jobid" tmpFilename="/usr/tmp/CGItemp7643" user="AlessioGianelle" version="1"
META FILEATTACHMENT attachment="ice07.png" attr="" comment="Ice graph. Test 07" date="1258386654" name="ice07.png" path="ice07.png" size="5660" stream="ice07.png" tmpFilename="/usr/tmp/CGItemp7524" user="AlessioGianelle" version="1"
META FILEATTACHMENT attachment="ice08.png" attr="" comment="Ice graph. Test 08" date="1258650880" name="ice08.png" path="ice08.png" size="19239" stream="ice08.png" tmpFilename="/usr/tmp/CGItemp10450" user="AlessioGianelle" version="5"
Added:
>
>
META FILEATTACHMENT attachment="ice09.png" attr="" comment="Ice graph. Test 09" date="1265030123" name="ice09.png" path="ice09.png" size="7534" stream="ice09.png" tmpFilename="/usr/tmp/CGItemp9630" user="AlessioGianelle" version="2"
META FILEATTACHMENT attachment="qe09.png" attr="" comment="Query events test 09" date="1265030035" name="qe09.png" path="qe09.png" size="8893" stream="qe09.png" tmpFilename="/usr/tmp/CGItemp7106" user="AlessioGianelle" version="1"

Revision 272009-11-19 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"

TESTs on ICE (Query Event)

Line: 336 to 336
 
META FILEATTACHMENT attachment="DoneFailed.jobid" attr="" comment="DoneFailed_06" date="1257521852" name="DoneFailed.jobid" path="DoneFailed.jobid" size="26334" stream="DoneFailed.jobid" tmpFilename="/usr/tmp/CGItemp7668" user="AlessioGianelle" version="1"
META FILEATTACHMENT attachment="Running06.jobid" attr="" comment="Running_06" date="1257522093" name="Running06.jobid" path="Running06.jobid" size="39380" stream="Running06.jobid" tmpFilename="/usr/tmp/CGItemp7643" user="AlessioGianelle" version="1"
META FILEATTACHMENT attachment="ice07.png" attr="" comment="Ice graph. Test 07" date="1258386654" name="ice07.png" path="ice07.png" size="5660" stream="ice07.png" tmpFilename="/usr/tmp/CGItemp7524" user="AlessioGianelle" version="1"
Added:
>
>
META FILEATTACHMENT attachment="ice08.png" attr="" comment="Ice graph. Test 08" date="1258650880" name="ice08.png" path="ice08.png" size="19239" stream="ice08.png" tmpFilename="/usr/tmp/CGItemp10450" user="AlessioGianelle" version="5"

Revision 262009-11-19 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"

TESTs on ICE (Query Event)

Added:
>
>

8) Test starts on Tue Nov 17 at 11:56:53 CEST 2009 (WMS: devel20)

Description:
  • 4000 collections each of 20 jobs
  • One collection every 30 seconds
  • Four users
  • max_ice_threads = 10
  • Use all the CEs of testbedA
  • Use automatic-delegation
  • The job is a "sleep random(900)"
  • Resubmission is enabled
  • Use proxy renewal service (myproxy.cern.ch)
  • Lease mechanism is not used

Submissions finish on Wed Nov 18 at 21:23:06 CEST 2009

  • 3997 collections submitted in 35909 seconds: 2/8/70 (min/avg/max)
    • 3 submission(s) fail(s)

Final results taken on Mon Nov 19 at 12:09:23 CEST 2009

  • Collections correctly submitted: 3997 (79940 jobs)
    • DONE OK: 79798 (99.82 %)
    • NOTDONE: 0 (0 %)
    • ABORTED: 0 (0 %)
    • CANCELLED: 142 (0.18 %) (jobs hold in torque system)
    • Resubmitted: 13415 (16.78 %)

  • Errors found (13230)
    • Cannot take token (22 times)
    • reason=1 (14 times)
    • reason=127; /opt/lcg/libexec/jobwrapper: line 42: ./CREAM927980874_jobWrapper.sh: No such file or directory (1 time)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception EOF detected during communication. Probably service closed connection or SOCKET TIMEOUT occurred. (527 times)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception Received NULL fault; the error is due to another cause: FaultString=[] - FaultCode=[SOAP-ENV:Server.generalException] - FaultSubCode=[SOAP-ENV:Server.generalException] - FaultDetail=[invoke2009-11-17T20:59:46.997Z0cannot write the authN proxy to file: nullcannot write the authN proxy to file: nullorg.glite.ce.faults.AuthenticationFaultcream-04.pd.infn.it] (1 time)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception The endpoint is blacklisted (12602 times)
    • Transfer to CREAM failed due to exception: CREAM Start raised exception The endpoint is blacklisted (63 times)

ice08.png

 

7) Test starts on Fri Nov 13 16:13:38 CEST 2009 (WMS: devel20)

Description:
  • 4000 collections each of 20 jobs

Revision 252009-11-16 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"

TESTs on ICE (Query Event)

Added:
>
>

7) Test starts on Fri Nov 13 16:13:38 CEST 2009 (WMS: devel20)

Description:
  • 4000 collections each of 20 jobs
  • One collection every 30 seconds
  • Four users
  • max_ice_threads = 10
  • Use all the CEs of testbedA
  • Use automatic-delegation
  • The job is a "sleep random(300)"
  • Resubmission is enabled
  • Use proxy renewal service (myproxy.cern.ch)
  • Lease mechanism is not used

  • Changes in the software wrt previous test:
    • CEs
      • Update to new release candidate version 1.12

Submissions finish on Sun Nov at 15 01:33:23 CEST 2009

  • 3443 collections submitted in 31829 seconds: 2/9/44 (min/avg/max)
    • 557 submission(s) fail(s)

Final results taken on Mon Nov 16 at 10:09:23 CEST 2009

  • Collections correctly submitted: 3443 (68860 jobs)
    • DONE OK: 60522 (87.89 %)
    • NOTDONE: 6727 (9.77 %)
    • ABORTED: 1611 (2.34 %)
    • Resubmitted: 26726 (38.81 %)

  • Errors found (44031)
    • Cannot take token (6 times)
    • reason=1 (7 times)
    • reason=127 (3 times)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception EOF detected during communication. Probably service closed connection or SOCKET TIMEOUT occurred (185 times)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception MethodName=[jobRegister] ErrorCode=[0] Description=[The CREAM service cannot accept jobs at the moment] FaultCause=[Threshold for Load Average(15 min): 20 => Detected value for Load Average(15 min): (289 times)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception The endpoint is blacklisted (43432 times)
    • Transfer to CREAM failed due to exception: CREAM Start raised exception The endpoint is blacklisted (108 times)
    • Transfer to CREAM failed due to exception: CREAM Start raised exception Received NULL fault; the error is due to another cause: FaultString=[] - FaultCode=[SOAP-ENV:Server.generalException] - FaultSubCode=[SOAP-ENV:Server.generalException] - FaultDetail=[invoke2009-11-14T14:00:38.733Z0cannot write the authN proxy to file: nullcannot write the authN proxy to file: nullorg.glite.ce.faults.AuthenticationFaultcream-04.pd.infn.it] (1 time)

ice07.png

 

6) Test starts on Fri Oct 30 at 15:23:47 CEST 2009 (WMS: devel20)

Description:
  • 7200 collections each of 40 jobs
Line: 256 to 298
 
META FILEATTACHMENT attr="" autoattached="1" comment="Ice graph. Test 01" date="1255595673" name="ice01.png" path="ice01.png" size="6236" user="Main.AlessioGianelle" version="1"
META FILEATTACHMENT attachment="DoneFailed.jobid" attr="" comment="DoneFailed_06" date="1257521852" name="DoneFailed.jobid" path="DoneFailed.jobid" size="26334" stream="DoneFailed.jobid" tmpFilename="/usr/tmp/CGItemp7668" user="AlessioGianelle" version="1"
META FILEATTACHMENT attachment="Running06.jobid" attr="" comment="Running_06" date="1257522093" name="Running06.jobid" path="Running06.jobid" size="39380" stream="Running06.jobid" tmpFilename="/usr/tmp/CGItemp7643" user="AlessioGianelle" version="1"
Added:
>
>
META FILEATTACHMENT attachment="ice07.png" attr="" comment="Ice graph. Test 07" date="1258386654" name="ice07.png" path="ice07.png" size="5660" stream="ice07.png" tmpFilename="/usr/tmp/CGItemp7524" user="AlessioGianelle" version="1"

Revision 242009-11-06 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"

TESTs on ICE (Query Event)

Line: 21 to 21
 

Final results taken on Fri Nov 06 at 12:08:43 CEST 2009

  • Collections correctly submitted: 7091 (283640 jobs)
Changed:
<
<
    • DONE OK: 276116 (97.35 %)
    • NOTDONE: 4663 (1.64 %)
>
>
    • DONE OK: 275956 (97.29 %)
    • NOTDONE: 4823 (1.7 %) *
 
    • ABORTED: 8 (~0%)
Changed:
<
<
    • CANCELLED: 2853 (1.01 %)
>
>
    • CANCELLED: 2853 (1.01 %) **
 
    • Resubmitted: 2933 (1.03 %)

  • Errors found (3972)
Line: 48 to 48
 
    • reason=127; /opt/lcg/libexec/jobwrapper: line 42: ./CREAM077961558_jobWrapper.sh: No such file or directory (1 time)
    • reason=999 (194 times)
Added:
>
>

Note:

* The "NOT TERMINATED" are distributed in this way:

  • 1000 Collections (i.e. 4000 jobs) failed to be submitted (by WM) with reason request expired
  • 361 jobs are running
  • 462 Done (FAILED)

** Jobs has been cancelled from pbs queue because maui set them as "Blocked Jobs"

 ice06.png

5) Test starts on Thu Oct 22 at 12:51:04 CEST 2009 (WMS: devel20)

Line: 244 to 254
 
META FILEATTACHMENT attr="" autoattached="1" comment="Ice graph. Test 03" date="1255940933" name="ice03.png" path="ice03.png" size="5824" user="Main.AlessioGianelle" version="1"
META FILEATTACHMENT attr="" autoattached="1" comment="Ice graph. Test 05" date="1256311329" name="ice05.png" path="ice05.png" size="8163" user="Main.AlessioGianelle" version="1"
META FILEATTACHMENT attr="" autoattached="1" comment="Ice graph. Test 01" date="1255595673" name="ice01.png" path="ice01.png" size="6236" user="Main.AlessioGianelle" version="1"
Added:
>
>
META FILEATTACHMENT attachment="DoneFailed.jobid" attr="" comment="DoneFailed_06" date="1257521852" name="DoneFailed.jobid" path="DoneFailed.jobid" size="26334" stream="DoneFailed.jobid" tmpFilename="/usr/tmp/CGItemp7668" user="AlessioGianelle" version="1"
META FILEATTACHMENT attachment="Running06.jobid" attr="" comment="Running_06" date="1257522093" name="Running06.jobid" path="Running06.jobid" size="39380" stream="Running06.jobid" tmpFilename="/usr/tmp/CGItemp7643" user="AlessioGianelle" version="1"

Revision 232009-11-06 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"

TESTs on ICE (Query Event)

Line: 8 to 8
 
  • One collection every 60 seconds
  • Four users
  • max_ice_threads = 10
Changed:
<
<
  • Use all the CEs of testbedB (i.e. Production CEs 1.11, query event is not implemented)
>
>
  • Use all the CEs of testbedB (i.e. Production CEs 1.11, query event is not implemented)
 
  • Use automatic-delegation
  • Use proxy renewal service (myproxy.cern.ch)
  • The job is a "sleep random(2447)"
  • Resubmission is enabled
  • Lease mechanism is not used
Changed:
<
<

Partial results taken on Tue Nov 03 at 15:08:43 CEST 2009

  • Collections correctly submitted: 5298 (211920jobs)
    • DONE OK: 146550 (- %)
    • NOTDONE: 65370 (- %)
    • Resubmitted: 539 (- %)

  • Errors found (1027)
    • blah error: send command timeout (39 times)
    • Cannot move ISB (974 times)
    • Cannot take token (10 times)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception Received NULL fault; the error is due to another cause: FaultString=[Client fault] - FaultCode=[SOAP-ENV:Client] - FaultSubCode=[SOAP-ENV:Client] (2 times)
    • Transfer to CREAM failed due to exception: CREAM Start raised exception Received NULL fault; the error is due to another cause: FaultString=[Client fault] - FaultCode=[SOAP-ENV:Client] - FaultSubCode=[SOAP-ENV:Client] (1 time)
    • lsf_reason=32512 (1 time)
>
>

Submissions finish on Wed Nov 4 at 15:56:11 CEST 2009

  • 7091 collections submitted in 164788 seconds: 4/23/117 (min/avg/max)
    • 109 submission(s) fail(s)

Final results taken on Fri Nov 06 at 12:08:43 CEST 2009

  • Collections correctly submitted: 7091 (283640 jobs)
    • DONE OK: 276116 (97.35 %)
    • NOTDONE: 4663 (1.64 %)
    • ABORTED: 8 (~0%)
    • CANCELLED: 2853 (1.01 %)
    • Resubmitted: 2933 (1.03 %)

  • Errors found (3972)
    • blah error: send command timeout (50 times)
    • BLAH error: submission command failed (exit code = -15) (stdout:) (stderr: exe_getouterr: 200 seconds timeout expired, killing child process.- killed by signal 15.-) N/A (jobId = CREAM110305536) (1 time)
    • BLAH error: submission command failed (exit code = 1) (stdout:) (stderr:Bad host name, host group name or cluster name. Job not submitted.-TERM environment variable not set.- execute_cmd: poll() got an unknown event (stdout 0x0010 - stderr: 0x0000).-) N/A (jobId = CREAM198982235) (1 time)
    • BLAH error: submission command failed (exit code = 1) (stdout:) (stderr:Cannot connect to default server host 'cream-32.pd.infn.it' - check pbs_server daemon.-qsub: cannot connect to server cream-32.pd.infn.it (errno=111)-TERM environment variable not set.-) N/A (jobId = CREAM946959077) (1 time)
    • BLAH error: submission command failed (exit code = 1) (stdout:) (stderr:Failed in an LSF library call: Failed in sending/receiving a message: Connection reset by peer. Job not submitted.-TERM environment variable not set.-) N/A (jobId = CREAM166499182) (1 time)
    • BLAH error: submission command failed (exit code = 1) (stdout:) (stderr:Master batch daemon internal error. Job not submitted.-TERM environment variable not set.-) N/A (jobId = CREAM105778508) (7 times)
    • Cannot move ISB (1820 times)
    • Cannot take token (190 times)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception EOF detected during communication. Probably service closed connection or SOCKET TIMEOUT occurred. (6 times)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception Received NULL fault; the error is due to another cause: FaultString=[Client fault] - FaultCode=[SOAP-ENV:Client] - FaultSubCode=[SOAP-ENV:Client] (3 times) * Transfer to CREAM failed due to exception: CREAM Register raised std::exception The endpoint is blacklisted (40 times)
    • lsf_reason=32512; /opt/lcg/libexec/jobwrapper: line 42: ./CREAM391495093_jobWrapper.sh: No such file or directory (12 times)
    • lsf_reason=-1 (5 times)
    • lsf_reason=2 (1 time)
    • pbs_reason=-1 (1616 times)
    • pbs_reason=1 (8 times)
    • reason=1; /opt/edg/libexec/edg-gridftp-base-rm: error globus_ftp_client: the server responded with an error 500 500-Command failed : System error in unlink: No such file or directory 500-A system call failed: No such file or directory 500 End. Cannot take token (15 times)
    • reason=127; /opt/lcg/libexec/jobwrapper: line 42: ./CREAM077961558_jobWrapper.sh: No such file or directory (1 time)
    • reason=999 (194 times)
  ice06.png
Line: 220 to 239
 -- AlessioGianelle - 13 Oct 2009

META FILEATTACHMENT attr="" autoattached="1" comment="Ice graph. Test 04" date="1256134051" name="ice04.png" path="ice04.png" size="6444" user="Main.AlessioGianelle" version="2"
Changed:
<
<
META FILEATTACHMENT attr="" autoattached="1" comment="Ice graph. Test 06" date="1257168492" name="ice06.png" path="ice06.png" size="6463" user="Main.AlessioGianelle" version="1"
>
>
META FILEATTACHMENT attachment="ice06.png" attr="" comment="Ice graph. Test 06" date="1257509711" name="ice06.png" path="ice06.png" size="7952" stream="ice06.png" tmpFilename="/usr/tmp/CGItemp7638" user="AlessioGianelle" version="2"
 
META FILEATTACHMENT attr="" autoattached="1" comment="Ice graph. Test 02" date="1255691072" name="ice02.png" path="ice02.png" size="5993" user="Main.AlessioGianelle" version="1"
META FILEATTACHMENT attr="" autoattached="1" comment="Ice graph. Test 03" date="1255940933" name="ice03.png" path="ice03.png" size="5824" user="Main.AlessioGianelle" version="1"
META FILEATTACHMENT attr="" autoattached="1" comment="Ice graph. Test 05" date="1256311329" name="ice05.png" path="ice05.png" size="8163" user="Main.AlessioGianelle" version="1"

Revision 222009-11-03 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"

TESTs on ICE (Query Event)

Line: 15 to 15
 
  • Resubmission is enabled
  • Lease mechanism is not used
Changed:
<
<

Partial results taken on Tue Nov 03 at 10:08:43 CEST 2009

  • Collections correctly submitted: 3929 (157160 jobs)
    • DONE OK: 138545 (- %)
    • NOTDONE: 18368 (- %)
>
>

Partial results taken on Tue Nov 03 at 15:08:43 CEST 2009

  • Collections correctly submitted: 5298 (211920jobs)
    • DONE OK: 146550 (- %)
    • NOTDONE: 65370 (- %)
 
    • Resubmitted: 539 (- %)

  • Errors found (1027)

Revision 212009-11-03 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"

TESTs on ICE (Query Event)

Line: 15 to 15
 
  • Resubmission is enabled
  • Lease mechanism is not used
Changed:
<
<

Partial results taken on Mon Nov 02 at 17:08:43 CEST 2009

>
>

Partial results taken on Tue Nov 03 at 10:08:43 CEST 2009

 
  • Collections correctly submitted: 3929 (157160 jobs)
Changed:
<
<
    • DONE OK: 124268 (- %)
    • NOTDONE: 32892 (- %)
>
>
    • DONE OK: 138545 (- %)
    • NOTDONE: 18368 (- %)
 
    • Resubmitted: 539 (- %)

  • Errors found (1027)

Revision 202009-11-02 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"

TESTs on ICE (Query Event)

Line: 15 to 15
 
  • Resubmission is enabled
  • Lease mechanism is not used
Changed:
<
<

Partial results taken on Mon Nov 02 at 16:08:43 CEST 2009

>
>

Partial results taken on Mon Nov 02 at 17:08:43 CEST 2009

 
  • Collections correctly submitted: 3929 (157160 jobs)
Changed:
<
<
    • DONE OK: 118822 (- %)
    • NOTDONE: 38338 (- %)
    • Resubmitted: 537 (- %)
>
>
    • DONE OK: 124268 (- %)
    • NOTDONE: 32892 (- %)
    • Resubmitted: 539 (- %)
 
  • Errors found (1027)
    • blah error: send command timeout (39 times)

Revision 192009-11-02 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"

TESTs on ICE (Query Event)

Line: 15 to 15
 
  • Resubmission is enabled
  • Lease mechanism is not used
Added:
>
>

Partial results taken on Mon Nov 02 at 16:08:43 CEST 2009

  • Collections correctly submitted: 3929 (157160 jobs)
    • DONE OK: 118822 (- %)
    • NOTDONE: 38338 (- %)
    • Resubmitted: 537 (- %)

  • Errors found (1027)
    • blah error: send command timeout (39 times)
    • Cannot move ISB (974 times)
    • Cannot take token (10 times)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception Received NULL fault; the error is due to another cause: FaultString=[Client fault] - FaultCode=[SOAP-ENV:Client] - FaultSubCode=[SOAP-ENV:Client] (2 times)
    • Transfer to CREAM failed due to exception: CREAM Start raised exception Received NULL fault; the error is due to another cause: FaultString=[Client fault] - FaultCode=[SOAP-ENV:Client] - FaultSubCode=[SOAP-ENV:Client] (1 time)
    • lsf_reason=32512 (1 time)
 ice06.png

5) Test starts on Thu Oct 22 at 12:51:04 CEST 2009 (WMS: devel20)

Line: 35 to 49
 
  • 1455 collections submitted in 16993 seconds: 4/11/48 (min/avg/max)
    • 545 submission(s) fail(s)
Changed:
<
<

FInal results taken on Thu Oct 23 at 16:08:43 CEST 2009

>
>

Final results taken on Thu Oct 23 at 16:08:43 CEST 2009

 
  • Collections correctly submitted: 1455 (29100 jobs)
    • DONE OK: 26714 (91.8 %)
    • NOTDONE: 168 (0.58 %)
Line: 206 to 220
 -- AlessioGianelle - 13 Oct 2009

META FILEATTACHMENT attr="" autoattached="1" comment="Ice graph. Test 04" date="1256134051" name="ice04.png" path="ice04.png" size="6444" user="Main.AlessioGianelle" version="2"
Added:
>
>
META FILEATTACHMENT attr="" autoattached="1" comment="Ice graph. Test 06" date="1257168492" name="ice06.png" path="ice06.png" size="6463" user="Main.AlessioGianelle" version="1"
 
META FILEATTACHMENT attr="" autoattached="1" comment="Ice graph. Test 02" date="1255691072" name="ice02.png" path="ice02.png" size="5993" user="Main.AlessioGianelle" version="1"
META FILEATTACHMENT attr="" autoattached="1" comment="Ice graph. Test 03" date="1255940933" name="ice03.png" path="ice03.png" size="5824" user="Main.AlessioGianelle" version="1"
META FILEATTACHMENT attr="" autoattached="1" comment="Ice graph. Test 05" date="1256311329" name="ice05.png" path="ice05.png" size="8163" user="Main.AlessioGianelle" version="1"
META FILEATTACHMENT attr="" autoattached="1" comment="Ice graph. Test 01" date="1255595673" name="ice01.png" path="ice01.png" size="6236" user="Main.AlessioGianelle" version="1"
Deleted:
<
<
META FILEATTACHMENT attachment="ice06.png" attr="" comment="Ice graph. Test 06" date="1257168492" name="ice06.png" path="ice06.png" size="6463" stream="ice06.png" user="Main.AlessioGianelle" version="1"

Revision 182009-11-02 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"

TESTs on ICE (Query Event)

Added:
>
>

6) Test starts on Fri Oct 30 at 15:23:47 CEST 2009 (WMS: devel20)

Description:
  • 7200 collections each of 40 jobs
  • One collection every 60 seconds
  • Four users
  • max_ice_threads = 10
  • Use all the CEs of testbedB (i.e. Production CEs 1.11, query event is not implemented)
  • Use automatic-delegation
  • Use proxy renewal service (myproxy.cern.ch)
  • The job is a "sleep random(2447)"
  • Resubmission is enabled
  • Lease mechanism is not used

ice06.png

 

5) Test starts on Thu Oct 22 at 12:51:04 CEST 2009 (WMS: devel20)

Description:
  • 2000 collections each of 20 jobs
Line: 195 to 210
 
META FILEATTACHMENT attr="" autoattached="1" comment="Ice graph. Test 03" date="1255940933" name="ice03.png" path="ice03.png" size="5824" user="Main.AlessioGianelle" version="1"
META FILEATTACHMENT attr="" autoattached="1" comment="Ice graph. Test 05" date="1256311329" name="ice05.png" path="ice05.png" size="8163" user="Main.AlessioGianelle" version="1"
META FILEATTACHMENT attr="" autoattached="1" comment="Ice graph. Test 01" date="1255595673" name="ice01.png" path="ice01.png" size="6236" user="Main.AlessioGianelle" version="1"
Added:
>
>
META FILEATTACHMENT attachment="ice06.png" attr="" comment="Ice graph. Test 06" date="1257168492" name="ice06.png" path="ice06.png" size="6463" stream="ice06.png" user="Main.AlessioGianelle" version="1"

Revision 172009-10-26 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"

TESTs on ICE (Query Event)

Revision 162009-10-23 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"

TESTs on ICE (Query Event)

Line: 40 to 40
 
    • Transfer to CREAM failed due to exception: CREAM Register returned error "MethodName=[jobRegister] Timestamp=[Fri 23 Oct 2009 04:44:45] ErrorCode=[0] Description=[system error] FaultCause=[The problem seems to be related to glexec]"_(1 time)_
    • Transfer to CREAM failed due to exception: CREAM Start raised exception The endpoint is blacklisted (5 times)
Added:
>
>
ice05.png
 

4) Test starts on Fri Oct 19 at 12:00:05 CEST 2009 (WMS: devel20)

Description:
  • 4000 collections each of 20 jobs
Line: 190 to 192
 
META FILEATTACHMENT attr="" autoattached="1" comment="Ice graph. Test 04" date="1256134051" name="ice04.png" path="ice04.png" size="6444" user="Main.AlessioGianelle" version="2"
META FILEATTACHMENT attr="" autoattached="1" comment="Ice graph. Test 02" date="1255691072" name="ice02.png" path="ice02.png" size="5993" user="Main.AlessioGianelle" version="1"
META FILEATTACHMENT attr="" autoattached="1" comment="Ice graph. Test 03" date="1255940933" name="ice03.png" path="ice03.png" size="5824" user="Main.AlessioGianelle" version="1"
Added:
>
>
META FILEATTACHMENT attr="" autoattached="1" comment="Ice graph. Test 05" date="1256311329" name="ice05.png" path="ice05.png" size="8163" user="Main.AlessioGianelle" version="1"
 
META FILEATTACHMENT attr="" autoattached="1" comment="Ice graph. Test 01" date="1255595673" name="ice01.png" path="ice01.png" size="6236" user="Main.AlessioGianelle" version="1"

Revision 152009-10-23 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"

TESTs on ICE (Query Event)

Line: 19 to 19
 
  • 1455 collections submitted in 16993 seconds: 4/11/48 (min/avg/max)
    • 545 submission(s) fail(s)
Changed:
<
<

Partial results taken on Thu Oct 23 at 10:41:43 CEST 2009

>
>

FInal results taken on Thu Oct 23 at 16:08:43 CEST 2009

 
  • Collections correctly submitted: 1455 (29100 jobs)
Changed:
<
<
    • DONE OK: 25646 (- %)
    • CANCELLED: 2782 (- %)
    • ABORTED: 672 (- %)
    • Resubmitted: 2164 (- %)
>
>
    • DONE OK: 26714 (91.8 %)
    • NOTDONE: 168 (0.58 %)
    • ABORTED: 2218 (7.62 %)
    • Resubmitted: 4101 (14.09 %)
 
  • Errors found (1758)
    • BLAH error: no jobId in submission script's output (stdout:) (stderr: execute_cmd: 200 seconds timeout expired, killing child process.-) N/A (jobId = xxx) (27 times)

Revision 142009-10-23 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"

TESTs on ICE (Query Event)

Line: 21 to 21
 

Partial results taken on Thu Oct 23 at 10:41:43 CEST 2009

  • Collections correctly submitted: 1455 (29100 jobs)
Changed:
<
<
    • DONE OK: 22675 (- %)
    • CANCELLED: 5942 (- %)
    • ABORTED: 483 (- %)
    • Resubmitted: 1313 (- %)
>
>
    • DONE OK: 25646 (- %)
    • CANCELLED: 2782 (- %)
    • ABORTED: 672 (- %)
    • Resubmitted: 2164 (- %)
 
  • Errors found (1758)
    • BLAH error: no jobId in submission script's output (stdout:) (stderr: execute_cmd: 200 seconds timeout expired, killing child process.-) N/A (jobId = xxx) (27 times)

Revision 132009-10-23 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"

TESTs on ICE (Query Event)

Line: 15 to 15
 
  • Use proxy renewal service (myproxy.cern.ch)
  • Lease mechanism is not used
Changed:
<
<
>
>

Submissions finish on Sat Oct 23 at 05:30:36 CEST 2009

  • 1455 collections submitted in 16993 seconds: 4/11/48 (min/avg/max)
    • 545 submission(s) fail(s)

Partial results taken on Thu Oct 23 at 10:41:43 CEST 2009

  • Collections correctly submitted: 1455 (29100 jobs)
    • DONE OK: 22675 (- %)
    • CANCELLED: 5942 (- %)
    • ABORTED: 483 (- %)
    • Resubmitted: 1313 (- %)

  • Errors found (1758)
    • BLAH error: no jobId in submission script's output (stdout:) (stderr: execute_cmd: 200 seconds timeout expired, killing child process.-) N/A (jobId = xxx) (27 times)
    • blah error: send command timeout (21 times)
    • Cannot move ISB (${globus_transfer_cmd} gsiftp://devel20.cnaf.infn.it:2811...... ): proxy expired (1 time)
    • Cannot take token (39 times)
    • reason=999 (1 time)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception Connection to service [https://cream-04.pd.infn.it:8443/ce-cream/services/CREAM2] failed: (852 times)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception EOF detected during communication. Probably service closed connection or SOCKET TIMEOUT occurred (54 times)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception MethodName=[invoke] ErrorCode=[0] Description=[Authorization error: Cannot set permissions to the store proxy certificate] FaultCause=[Authorization error: Cannot set permissions to the store proxy certificate] Timestamp=[Fri 23 Oct 2009 04:31:45] (21 times)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception MethodName=[invoke] ErrorCode=[0] Description=[Authorization error: Cannot store proxy certificate] FaultCause=[Authorization error: Cannot store proxy certificate] Timestamp=[Fri 23 Oct 2009 04:32:16] (3 times)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception The endpoint is blacklisted (733 times)
    • Transfer to CREAM failed due to exception: CREAM Register returned error "MethodName=[jobRegister] Timestamp=[Fri 23 Oct 2009 04:44:45] ErrorCode=[0] Description=[system error] FaultCause=[The problem seems to be related to glexec]"_(1 time)_
    • Transfer to CREAM failed due to exception: CREAM Start raised exception The endpoint is blacklisted (5 times)
 

4) Test starts on Fri Oct 19 at 12:00:05 CEST 2009 (WMS: devel20)

Description:

Revision 122009-10-22 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"

TESTs on ICE (Query Event)

Added:
>
>

5) Test starts on Thu Oct 22 at 12:51:04 CEST 2009 (WMS: devel20)

Description:
  • 2000 collections each of 20 jobs
  • One collection every 30 seconds
  • Four users
  • max_ice_threads = 10
  • Use all the CEs of testbedA (cream-12.pd, cream-04.pd and devel03.cnaf)
  • Use automatic-delegation
  • The job is a "sleep random(2447)"
  • Resubmission is enabled
  • Use proxy renewal service (myproxy.cern.ch)
  • Lease mechanism is not used

 

4) Test starts on Fri Oct 19 at 12:00:05 CEST 2009 (WMS: devel20)

Description:
  • 4000 collections each of 20 jobs
Line: 27 to 42
 

Partial results taken on Thu Oct 22 at 09:41:43 CEST 2009

  • Collections correctly submitted: 1832 (36640 jobs)
    • DONE OK: 30808 (- %)
Changed:
<
<
    • NOT TERMINATED: 4905 (- %)
>
>
    • CANCELLED: 4905 (- %)
 
    • ABORTED: 927 (- %)
    • Resubmitted: 7140 (- %)

Revision 112009-10-22 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"

TESTs on ICE (Query Event)

Line: 24 to 24
 
  • Restarted cream-04.pd.infn.it at 12:49 on Wed Oct 21
Changed:
<
<

Partial results taken on Wed Oct 21 at 15:41:43 CEST 2009

>
>

Partial results taken on Thu Oct 22 at 09:41:43 CEST 2009

 
  • Collections correctly submitted: 1832 (36640 jobs)
Changed:
<
<
    • DONE OK: 27645 (- %)
    • NOT TERMINATED: 8158 (- %)
    • ABORTED: 837 (- %)
    • Resubmitted: 5967 (- %)
>
>
    • DONE OK: 30808 (- %)
    • NOT TERMINATED: 4905 (- %)
    • ABORTED: 927 (- %)
    • Resubmitted: 7140 (- %)
 
  • Errors found (2970)
    • BLAH error: no jobId in submission script's output (stdout:) (stderr: execute_cmd: 200 seconds timeout expired, killing child process.-) N/A (jobId = ...) (13 times)

Revision 102009-10-21 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"

TESTs on ICE (Query Event)

Line: 22 to 22
 20 Oct 2009 12:55:42,323 org.glite.voms.PKIStore - Cannot refresh store: null
    • cream-12.pd.infn.it and devel03.cnaf.infn.it probably have problems with the new BLParser
Changed:
<
<

Partial results taken on Wed Oct 21 at 10:41:43 CEST 2009

>
>
  • Restarted cream-04.pd.infn.it at 12:49 on Wed Oct 21

Partial results taken on Wed Oct 21 at 15:41:43 CEST 2009

 
  • Collections correctly submitted: 1832 (36640 jobs)
Changed:
<
<
    • DONE OK: 23124 (- %)
    • NOT TERMINATED: 12783 (- %)
    • ABORTED: 733 (- %)
    • Resubmitted: 4035 (- %)
>
>
    • DONE OK: 27645 (- %)
    • NOT TERMINATED: 8158 (- %)
    • ABORTED: 837 (- %)
    • Resubmitted: 5967 (- %)
 
  • Errors found (2970)
    • BLAH error: no jobId in submission script's output (stdout:) (stderr: execute_cmd: 200 seconds timeout expired, killing child process.-) N/A (jobId = ...) (13 times)
Line: 147 to 149
  -- AlessioGianelle - 13 Oct 2009
Added:
>
>
META FILEATTACHMENT attr="" autoattached="1" comment="Ice graph. Test 04" date="1256134051" name="ice04.png" path="ice04.png" size="6444" user="Main.AlessioGianelle" version="2"
 
META FILEATTACHMENT attr="" autoattached="1" comment="Ice graph. Test 02" date="1255691072" name="ice02.png" path="ice02.png" size="5993" user="Main.AlessioGianelle" version="1"
META FILEATTACHMENT attr="" autoattached="1" comment="Ice graph. Test 03" date="1255940933" name="ice03.png" path="ice03.png" size="5824" user="Main.AlessioGianelle" version="1"
META FILEATTACHMENT attr="" autoattached="1" comment="Ice graph. Test 01" date="1255595673" name="ice01.png" path="ice01.png" size="6236" user="Main.AlessioGianelle" version="1"
Deleted:
<
<
META FILEATTACHMENT attachment="ice04.png" attr="" comment="Ice graph. Test 04" date="1256118031" name="ice04.png" path="ice04.png" size="6087" stream="ice04.png" user="Main.AlessioGianelle" version="1"

Revision 92009-10-21 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"

TESTs on ICE (Query Event)

Line: 22 to 22
 20 Oct 2009 12:55:42,323 org.glite.voms.PKIStore - Cannot refresh store: null
    • cream-12.pd.infn.it and devel03.cnaf.infn.it probably have problems with the new BLParser
Added:
>
>

Partial results taken on Wed Oct 21 at 10:41:43 CEST 2009

  • Collections correctly submitted: 1832 (36640 jobs)
    • DONE OK: 23124 (- %)
    • NOT TERMINATED: 12783 (- %)
    • ABORTED: 733 (- %)
    • Resubmitted: 4035 (- %)

  • Errors found (2970)
    • BLAH error: no jobId in submission script's output (stdout:) (stderr: execute_cmd: 200 seconds timeout expired, killing child process.-) N/A (jobId = ...) (13 times)
    • blah error: send command timeout (22 times)
    • Cannot move ISB (${globus_transfer_cmd} gsiftp://devel20.cnaf.infn.it:2811/var/glite/SandboxDir/9d/https_3a_2f_2fdevel15.cnaf.infn.it_3a9000_2f9dVthtBkKwyOaSHnSLeXSQ/input/pippo file:///home/dteam028/home_cream_638539945/CREAM638539945/pippo): Problem to detect the lifetime of the proxy (1 time)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception Connection to service [https://cream-04.pd.infn.it:8443/ce-cream/services/CREAM2] failed: (603 times)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception EOF detected during communication. Probably service closed connection or SOCKET TIMEOUT occurred. (66 times)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception MethodName=[invoke] ErrorCode=[0] Description=[Authorization error: Cannot store proxy certificate] FaultCause=[Authorization error: Cannot store proxy certificate] Timestamp=[Tue 20 Oct 2009 05:33:28] (3 times)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception Received NULL fault; the error is due to another cause: FaultString=[] - FaultCode=[SOAP-ENV:Server.generalException] - FaultSubCode=[SOAP-ENV:Server.generalException] - FaultDetail=[invoke2009-10-19T16:08:21.143Z0cannot write the authN proxy to file: nullcannot write the authN proxy to file: nullorg.glite.ce.faults.AuthenticationFaultcream-12.pd.infn.it] (2 times)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception The endpoint is blacklisted (1740 times)
    • Transfer to CREAM failed due to exception: CREAM Start raised exception The endpoint is blacklisted (9 times)
    • Transfer to CREAM failed due to exception: Failed to create a delegation id for job https://devel15.cnaf.infn.it:9000/01bDqoEMYLtCgBJAwkGVBQ: reason is Connection to service [https://cream-04.pd.infn.it:8443/ce-cream/services/gridsite-delegation] failed: (511 times)

ice04.png

 

3) Test starts on Fri Oct 16 at 13:50:05 CEST 2009 (WMS: devel20)

Description:
Line: 47 to 69
 
    • Resubmitted: 961 (2.2%)

  • Errors found (1097)
Changed:
<
<
    • BLAH error: no jobId in submission script's output (stdout:) (stderr: execute_cmd: 200 seconds timeout expired, killing child process.-) N/A (jobId = ) (91 times)
    • BLAH error: submission command failed (exit code = 201) (stdout:) (stderr:[gLExec]: gLExec has detected an input file change during the use of the file. It's unknown if this file-jacking was accidental or intentional.-) N/A (jobId = ) (1 time)
>
>
    • BLAH error: no jobId in submission script's output (stdout:) (stderr: execute_cmd: 200 seconds timeout expired, killing child process.-) N/A (jobId = xxxxx) (91 times)
    • BLAH error: submission command failed (exit code = 201) (stdout:) (stderr:[gLExec]: gLExec has detected an input file change during the use of the file. It's unknown if this file-jacking was accidental or intentional.-) N/A (jobId = xxxxx) (1 time)
 
    • Cannot move ISB: proxy expired (1 time)
    • Cannot take token (105 times)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception EOF detected during communication. Probably service closed connection or SOCKET TIMEOUT occurred. (42 times)
Line: 128 to 150
 
META FILEATTACHMENT attr="" autoattached="1" comment="Ice graph. Test 02" date="1255691072" name="ice02.png" path="ice02.png" size="5993" user="Main.AlessioGianelle" version="1"
META FILEATTACHMENT attr="" autoattached="1" comment="Ice graph. Test 03" date="1255940933" name="ice03.png" path="ice03.png" size="5824" user="Main.AlessioGianelle" version="1"
META FILEATTACHMENT attr="" autoattached="1" comment="Ice graph. Test 01" date="1255595673" name="ice01.png" path="ice01.png" size="6236" user="Main.AlessioGianelle" version="1"
Added:
>
>
META FILEATTACHMENT attachment="ice04.png" attr="" comment="Ice graph. Test 04" date="1256118031" name="ice04.png" path="ice04.png" size="6087" stream="ice04.png" user="Main.AlessioGianelle" version="1"

Revision 82009-10-20 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"

TESTs on ICE (Query Event)

Line: 15 to 15
 
  • Use proxy renewal service (myproxy.cern.ch)
  • Lease mechanism is not used
Added:
>
>

Test interrupted on Tue Oct 20 at 10:29:16 CEST 2009

  • Problems with the CEs that are blacklisted
    • cream-04.pd.indn.it:
      java.net.SocketException
      MESSAGE: Too many open files
      20 Oct 2009 12:55:42,323 org.glite.voms.PKIStore - Cannot refresh store: null
    • cream-12.pd.infn.it and devel03.cnaf.infn.it probably have problems with the new BLParser
 

3) Test starts on Fri Oct 16 at 13:50:05 CEST 2009 (WMS: devel20)

Description:
  • 2000 collections each of 25 jobs

Revision 72009-10-19 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"

TESTs on ICE (Query Event)

Added:
>
>

4) Test starts on Fri Oct 19 at 12:00:05 CEST 2009 (WMS: devel20)

Description:
  • 4000 collections each of 20 jobs
  • One collection every 30 seconds
  • Four users
  • max_ice_threads = 10
  • Use all the CEs of testbedA (cream-12.pd, cream-04.pd and devel03.cnaf)
  • Use automatic-delegation
  • The job is a "sleep random(4242)"
  • Resubmission is enabled
  • Use proxy renewal service (myproxy.cern.ch)
  • Lease mechanism is not used
 

3) Test starts on Fri Oct 16 at 13:50:05 CEST 2009 (WMS: devel20)

Description:
  • 2000 collections each of 25 jobs
Line: 15 to 28
 
  • Use proxy renewal service (myproxy.cern.ch)
  • Lease mechanism is not used
Added:
>
>

Submissions finish on Sat Oct 17 at 06:29:16 CEST 2009

  • 1746 collections correctly submitted in 15578 seconds: 4/8/25 (min/avg/max)
    • 254 submissions failures sue to load limiter

Final results taken on Mon Oct 19 at 09:41:43 CEST 2009

  • Collections correctly submitted: 1746 (43650 jobs)
    • DONE OK: 43642 (99.98 %)
    • CANCELLED: 8 (0.02%)
    • Resubmitted: 961 (2.2%)

  • Errors found (1097)
    • BLAH error: no jobId in submission script's output (stdout:) (stderr: execute_cmd: 200 seconds timeout expired, killing child process.-) N/A (jobId = ) (91 times)
    • BLAH error: submission command failed (exit code = 201) (stdout:) (stderr:[gLExec]: gLExec has detected an input file change during the use of the file. It's unknown if this file-jacking was accidental or intentional.-) N/A (jobId = ) (1 time)
    • Cannot move ISB: proxy expired (1 time)
    • Cannot take token (105 times)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception EOF detected during communication. Probably service closed connection or SOCKET TIMEOUT occurred. (42 times)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception Received NULL fault; the error is due to another cause: FaultString=[] - FaultCode=[SOAP-ENV:Server.generalException] - FaultSubCode=[SOAP-ENV:Server.generalException] - FaultDetail=[invoke</MethodName.2009-10-16T15:38:04.085Z0cannot write the authN proxy to file: nullcannot write the authN proxy to file: nullorg.glite.ce.faults.AuthenticationFaultcream-12.pd.infn.it] (1 time)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception The endpoint is blacklisted (856 times)

 
Added:
>
>
ice03.png
 

2) Test starts on Wed Oct 15 at 12:21:53 CEST 2009 (WMS: devel20)

Description:
Line: 85 to 118
 -- AlessioGianelle - 13 Oct 2009

META FILEATTACHMENT attr="" autoattached="1" comment="Ice graph. Test 02" date="1255691072" name="ice02.png" path="ice02.png" size="5993" user="Main.AlessioGianelle" version="1"
Added:
>
>
META FILEATTACHMENT attr="" autoattached="1" comment="Ice graph. Test 03" date="1255940933" name="ice03.png" path="ice03.png" size="5824" user="Main.AlessioGianelle" version="1"
 
META FILEATTACHMENT attr="" autoattached="1" comment="Ice graph. Test 01" date="1255595673" name="ice01.png" path="ice01.png" size="6236" user="Main.AlessioGianelle" version="1"

Revision 62009-10-16 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"

TESTs on ICE (Query Event)

Added:
>
>

3) Test starts on Fri Oct 16 at 13:50:05 CEST 2009 (WMS: devel20)

Description:
  • 2000 collections each of 25 jobs
  • One collection every 30 seconds
  • Four users
  • max_ice_threads = 10
  • Use all the CEs of testbedA (cream-12.pd, cream-04.pd and devel03.cnaf)
  • Use automatic-delegation
  • The job is a "sleep 666"
  • Resubmission is enabled
  • Use proxy renewal service (myproxy.cern.ch)
  • Lease mechanism is not used
 

2) Test starts on Wed Oct 15 at 12:21:53 CEST 2009 (WMS: devel20)

Line: 34 to 47
 
  • Errors found
    • Cannot take token (19 times)
Added:
>
>
    • BLAH error: submission command failed (exit code = 201) (stdout:) (stderr:[gLExec]: gLExec has detected an input file change during the use of the file. It's unknown if this file-jacking was accidental or intentional.- execute_cmd: poll() got an unknown event (stdout 0x0010 - stderr: 0x0000).-) N/A (jobId = CREAM603493778) (https://devel15.cnaf.infn.it:9000/hzShPwSsvd6S_1kQ2XsbvA)
 
    • Proxy is expired

Revision 52009-10-16 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"

TESTs on ICE (Query Event)

Line: 22 to 22
 
Added:
>
>

Submissions finish on Wed Oct 16 at 05:22:55 CEST 2009

  • 1834 collections correctly submitted
    • 166 submissions failures

Final results taken on Thu Oct 16 10:26:21 CEST 2009

  • Collections correctly submitted: 1834 (45850 jobs)
    • DONE OK: 42868 (-%)
    • ABORTED: 2976 (-%) *
    • Resubmitted: 2976+8 (-%)

  • Errors found
    • Cannot take token (19 times)
    • Proxy is expired

ice02.png

Note:

* All the aborted are due to "proxy expired" reason because I forgot to activate proxy renewal service.
 

1) Test starts on Wed Oct 14 at 15:04:19 CEST 2009 (WMS: devel20)

Description:
  • 400 collections each of 25 jobs
Line: 49 to 70
  -- AlessioGianelle - 13 Oct 2009
Changed:
<
<
META FILEATTACHMENT attachment="ice01.png" attr="" comment="Ice graph. Test 01" date="1255595672" name="ice01.png" path="ice01.png" size="6236" stream="ice01.png" user="Main.AlessioGianelle" version="1"
>
>
META FILEATTACHMENT attr="" autoattached="1" comment="Ice graph. Test 02" date="1255691072" name="ice02.png" path="ice02.png" size="5993" user="Main.AlessioGianelle" version="1"
META FILEATTACHMENT attr="" autoattached="1" comment="Ice graph. Test 01" date="1255595673" name="ice01.png" path="ice01.png" size="6236" user="Main.AlessioGianelle" version="1"

Revision 42009-10-15 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"

TESTs on ICE (Query Event)

Changed:
<
<

2) Test starts on Wed Oct 15 at 15:04:19 CEST 2009 (WMS: devel20)

>
>

2) Test starts on Wed Oct 15 at 12:21:53 CEST 2009 (WMS: devel20)

  Description:
Changed:
<
<
  • 400 collections each of 25 jobs
>
>
  • 2000 collections each of 25 jobs
 
  • One collection every 30 seconds
  • Four users
  • max_ice_threads = 10

Revision 32009-10-15 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"

TESTs on ICE (Query Event)

Added:
>
>

2) Test starts on Wed Oct 15 at 15:04:19 CEST 2009 (WMS: devel20)

Description:
  • 400 collections each of 25 jobs
  • One collection every 30 seconds
  • Four users
  • max_ice_threads = 10
  • Use all the CEs of testbedA (cream-12.pd, cream-04.pd and devel03.cnaf)
  • Use automatic-delegation
  • The job is a "sleep 666"
  • Resubmission is enabled
  • Lease mechanism is not used

  • Changes in the software wrt previous test:
    • WMS
 

1) Test starts on Wed Oct 14 at 15:04:19 CEST 2009 (WMS: devel20)

Description:
  • 400 collections each of 25 jobs
Line: 14 to 34
 
  • Resubmission is enabled
  • Lease mechanism is not used
Added:
>
>

Submissions finish on Wed Oct 14 at 18:22:55 CEST 2009

  • 400 collections submitted in 2371 seconds: 3/5/15 (min/avg/max)
 
Added:
>
>

Final results taken on Thu Oct 15 10:26:21 CEST 2009

  • Collections correctly submitted: 400 (10000 jobs)
    • DONE OK: 10000 (100%)
    • Resubmitted: 3 (0.03%)
 
Added:
>
>
  • Errors found (3)
    • Cannot take token (3 times)

ice01.png

  -- AlessioGianelle - 13 Oct 2009 \ No newline at end of file
Added:
>
>
META FILEATTACHMENT attachment="ice01.png" attr="" comment="Ice graph. Test 01" date="1255595672" name="ice01.png" path="ice01.png" size="6236" stream="ice01.png" user="Main.AlessioGianelle" version="1"

Revision 22009-10-14 - AlessioGianelle

Line: 1 to 1
 
META TOPICPARENT name="TestWokPlan"
Deleted:
<
<
 

TESTs on ICE (Query Event)

Added:
>
>

1) Test starts on Wed Oct 14 at 15:04:19 CEST 2009 (WMS: devel20)

Description:
  • 400 collections each of 25 jobs
  • One collection every 30 seconds
  • Four users
  • max_ice_threads = 10
  • Use all the CEs of testbedA (cream-12.pd, cream-04.pd and devel03.cnaf)
  • Use automatic-delegation
  • The job is a "sleep 666"
  • Resubmission is enabled
  • Lease mechanism is not used

  -- AlessioGianelle - 13 Oct 2009 \ No newline at end of file

Revision 12009-10-13 - AlessioGianelle

Line: 1 to 1
Added:
>
>
META TOPICPARENT name="TestWokPlan"

TESTs on ICE (Query Event)

-- AlessioGianelle - 13 Oct 2009

 
This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback