Tags:
, view all tags

TESTs on ICE

6) Test starts on Thu Apr 23 13:23:35 CEST 2009 (WMS: devel14)

Description:
  • 7200 collections each of 40 jobs
  • One collection every 60 seconds
  • Four users
  • max_ice_threads = 20
  • max_ice_mem = 1048000;
  • Used all the CEs of testbedB (except cert-06.cnaf and cream-28.pd)
  • Used automatic-delegation and proxy renewal service (MyProxyServer = "myproxy.cern.ch")
  • Proxy has 5 hours of lifetime (and it is renewed every 4 hours)
  • The job is a "sleep 2424"
  • Resubmission is enabled
  • Lease mechanism is not used

  • Changes in the software wrt previous test:
    • ICE
      • SQL queries fixed

Partial results taken on Fri Apr 24 at 15:38:25 CEST 2009

  • Collections correctly submitted: 1590 (63600 jobs)
    • DONE OK: 33303 (52.36%)
    • NOT TERMINATED: 30297 (47.64%)
    • Resubmitted: 5 (0.008%)

ice6.png

5) Test starts on Tue Apr 15 at 11:40:01 CEST 2009 (WMS: devel14)

Description:
  • 3400 collections each of 40 jobs
  • One collection every 60 seconds
  • Four users
  • max_ice_threads = 20
  • max_ice_mem = 2800000;
  • Used all the CEs of testbedB (except cert-06.cnaf and cream-22.pd)
  • Used automatic-delegation and proxy renewal service (MyProxyServer = "myproxy.cern.ch")
  • Proxy has 5 hours of lifetime (and it is renewed every 4 hours)
  • The job is a "sleep 2424"
  • Resubmission is enabled
  • Lease mechanism is not used

  • Changes in the software wrt previous test:
    • ICE:
      • Improvements in the memory management in database's queries
      • Fixed a bug in the management of notifications

Test interrupted on 2009-04-16 at 15:10:40

  • ICE has exited with this message:
    terminate called after throwing an instance of 'glite::wms::ice::db::DbOperationException'
      what():  Query [UPDATE jobs SET failure_reason='Transfer to CREAM failed due to exception: CREAM Register returned error "MethodName=[jobRegister] Timestamp=[Thu 16 Apr 2009 13:32:10] ErrorCode=[0] Description=[system error] FaultCause=[cannot create the job's working directory! The problem seems to be related to glexec]"' WHERE gridjobid='https://devel15.cnaf.infn.it:9000/UV2KNfU-ypvg-tbr6GnrNQ';] failed due to error [near "s": syntax error]

Partial results:

  • Collections correctly submitted: 1535 (61400 jobs)
    • DONE OK: 32000 (52.12%)
    • NOT TERMINATED: 29400 (47.88%)
    • Resubmitted: 7 (0.01%)

  • Errors found (7):
    • BLAH error: submission command failed (exit code = 1) (stdout:) (stderr:pbs_iff: cannot read reply from pbs_server-No Permission.-qsub: cannot connect to server cream-28.pd.infn.it (errno=15007)-TERM environment variable not set.-) N/A (jobId = CREAM524685673) (7 times)

4) Test starts on Fri Apr 10 at 17:10:12 CEST 2009 (WMS: devel14)

Description:
  • 4320 collections each of 40 jobs
  • One collection every 60 seconds
  • Five users
  • max_ice_threads = 20
  • Used all the CEs of testbedB (except cert-06.cnaf.infn.it)
  • Used automatic-delegation and proxy renewal service (MyProxyServer = "myproxy.cern.ch")
  • Proxy has 5 hours of lifetime (and it is renewed every 4 hours)
  • The job is a "sleep 4242"
  • Resubmission is enabled
  • Lease mechanism is not used

Submissions finish on Mon Apr 13 at 20:35:45 CEST 2009

  • 3608 collections submitted in 118319 seconds: 5/32/213 (min/avg/max)
    • 712 submission(s) fail(s) due to load limiter

Test interrupted on 2009-04-13 at 07:55:40

  • ICE stopped work on the WMS:
    terminate called after throwing an instance of 'std::bad_alloc'
      what():  St9bad_alloc
    Aborted (core dumped)
  • MAUI stopped work on the torque-CEs:
    04/12 05:36:08 ERROR:  job hash table is FULL.  cannot add MJob[107] '1385803'
    04/12 05:36:08 ERROR:    job buffer is full  (ignoring job '1385803.cream-28.pd.infn.it')
    04/12 05:36:08 ERROR:  job hash table is FULL.  cannot add MJob[107] '1385804'
    04/12 05:36:08 ERROR:    job buffer is full  (ignoring job '1385804.cream-28.pd.infn.it')
    04/12 05:36:08 INFO:     35507 PBS jobs detected on RM base
    04/12 05:36:08 INFO:     jobs detected: 35507
    and
    04/12/2009 05:36:09;0080;PBS_Server;Req;dis_request_read;req header bad, dis error 7 (Premature end of message), type=Connect
    04/12/2009 05:36:09;0080;PBS_Server;Req;req_reject;Reject reply code=15056(Bad DIS based Request Protocol MSG=cannot decode message), aux=0, type=Connect, fr
    om @
    04/12/2009 05:36:09;0002;PBS_Server;Req;dis_reply_write;DIS reply failure, -1

Results:

  • Collections correctly submitted: 3608 (144320 jobs)
    • DONE OK: 31687 (21.96%)
    • NOT TERMINATED: 112633 (78.04%)
    • Resubmitted: 92 (0.06%)

ice4.png

3) Test starts on Fri Apr 10 at 12:11:20 CEST 2009 (WMS: devel14)

Description:
  • 150 collections each of 15 jobs
  • Three collections every 60 seconds
  • Five users
  • max_ice_threads = 20
  • Used all the CEs of testbedB (except cert-06.cnaf.infn.it)
  • Used automatic-delegation
  • The job is a "sleep 666"
  • Resubmission is enabled
  • Lease mechanism is not used

Submissions finish on Fri Apr 10 at 12:59:04 CEST 2009

  • 150 collections correctly submitted

Final results

  • Collections correctly submitted: 150 (2250 jobs)
    • DONE OK: 2250 (100%)

2) Test starts on Thu Apr 9 at 15:57:41 CEST 2009 (WMS: devel14)

Description:
  • 250 collections each of 40 jobs
  • One collection every 60 seconds
  • Five users
  • max_ice_threads = 20
  • Used all the CEs of testbedB (except cert-06.cnaf.infn.it)
  • Used automatic-delegation and proxy renewal service (MyProxyServer = "myproxy.cern.ch")
  • Proxy has 5 hours of lifetime (and it is renewed every 4 hours)
  • The job is a "sleep 666"
  • Resubmission is enabled
  • Lease mechanism is not used

  • Changes in the software wrt previous test:
    • fixed a bug in ICE about the decrement counter associated with the "super" proxy

Submissions finish on Thu Apr 9 20:04:05 CEST 2009

  • 250 collections submitted in 2213 seconds: 4/8/30 (min/avg/max)

Final results

  • Collections correctly submitted: 250 (10000 jobs)
    • DONE OK: 10000 (100%)
    • Resubmitted: 4 (0.04%)

ice2.png

1) Test starts on Wed Apr 8 at 16:28:22 CEST 2009 (WMS: devel14)

Description:
  • 250 collections each of 40 jobs
  • One collection every 60 seconds
  • Five users
  • max_ice_threads = 20
  • Used all the CEs of testbedB (except cert-06.cnaf.infn.it)
  • Used automatic-delegation and proxy renewal service (MyProxyServer = "myproxy.cern.ch")
  • Proxy has 5 hours of lifetime (and it is renewed every 4 hours)
  • The job is a "sleep 616"
  • Resubmission is enabled
  • Lease mechanism is not used

Submissions finish on Wed Apr 8 at 20:34:44 CEST 2009

  • 250 collections submitted in 2030 seconds: 5/8/29 (min/avg/max)

Final results

  • Collections correctly submitted: 250 (10000 jobs)
    • DONE OK: 10000 (100%)
    • Resubmitted: 555 (5.55%)

  • Errors found (561):
    • Cannot move ISB (232 times 41.36%)
    • Cannot move OSB (79 times 14.08%)
    • Proxy is expired (216 times 38.5%)
    • lsf_reason=1603 (34 times 6.06%)

ice1.png

-- AlessioGianelle - 05 Mar 2009

Topic attachments
I Attachment Action Size Date Who Comment
PNGpng ice1.png manage 5.1 K 2009-04-09 - 08:56 AlessioGianelle Submission rate. Test 1
PNGpng ice2.png manage 4.7 K 2009-04-10 - 07:57 AlessioGianelle Submission rate. Test 2
PNGpng ice4.png manage 5.1 K 2009-04-14 - 10:20 AlessioGianelle Submission rate. Test 4
PNGpng ice6.png manage 5.8 K 2009-04-24 - 15:58 AlessioGianelle Ice graph. Test 6
Edit | Attach | PDF | History: r63 | r16 < r15 < r14 < r13 | Backlinks | Raw View | More topic actions...
Topic revision: r14 - 2009-04-24 - AlessioGianelle
 
  • Edit
  • Attach
This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback