Tags:
, view all tags

TESTs on ICE (Query Event)

6) Test starts on Fri Oct 30 at 15:23:47 CEST 2009 (WMS: devel20)

Description:
  • 7200 collections each of 40 jobs
  • One collection every 60 seconds
  • Four users
  • max_ice_threads = 10
  • Use all the CEs of testbedB (i.e. Production CEs 1.11, query event is not implemented)
  • Use automatic-delegation
  • Use proxy renewal service (myproxy.cern.ch)
  • The job is a "sleep random(2447)"
  • Resubmission is enabled
  • Lease mechanism is not used

Partial results taken on Mon Nov 02 at 17:08:43 CEST 2009

  • Collections correctly submitted: 3929 (157160 jobs)
    • DONE OK: 124268 (- %)
    • NOTDONE: 32892 (- %)
    • Resubmitted: 539 (- %)

  • Errors found (1027)
    • blah error: send command timeout (39 times)
    • Cannot move ISB (974 times)
    • Cannot take token (10 times)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception Received NULL fault; the error is due to another cause: FaultString=[Client fault] - FaultCode=[SOAP-ENV:Client] - FaultSubCode=[SOAP-ENV:Client] (2 times)
    • Transfer to CREAM failed due to exception: CREAM Start raised exception Received NULL fault; the error is due to another cause: FaultString=[Client fault] - FaultCode=[SOAP-ENV:Client] - FaultSubCode=[SOAP-ENV:Client] (1 time)
    • lsf_reason=32512 (1 time)

ice06.png

5) Test starts on Thu Oct 22 at 12:51:04 CEST 2009 (WMS: devel20)

Description:
  • 2000 collections each of 20 jobs
  • One collection every 30 seconds
  • Four users
  • max_ice_threads = 10
  • Use all the CEs of testbedA (cream-12.pd, cream-04.pd and devel03.cnaf)
  • Use automatic-delegation
  • The job is a "sleep random(2447)"
  • Resubmission is enabled
  • Use proxy renewal service (myproxy.cern.ch)
  • Lease mechanism is not used

Submissions finish on Sat Oct 23 at 05:30:36 CEST 2009

  • 1455 collections submitted in 16993 seconds: 4/11/48 (min/avg/max)
    • 545 submission(s) fail(s)

Final results taken on Thu Oct 23 at 16:08:43 CEST 2009

  • Collections correctly submitted: 1455 (29100 jobs)
    • DONE OK: 26714 (91.8 %)
    • NOTDONE: 168 (0.58 %)
    • ABORTED: 2218 (7.62 %)
    • Resubmitted: 4101 (14.09 %)

  • Errors found (1758)
    • BLAH error: no jobId in submission script's output (stdout:) (stderr: execute_cmd: 200 seconds timeout expired, killing child process.-) N/A (jobId = xxx) (27 times)
    • blah error: send command timeout (21 times)
    • Cannot move ISB (${globus_transfer_cmd} gsiftp://devel20.cnaf.infn.it:2811...... ): proxy expired (1 time)
    • Cannot take token (39 times)
    • reason=999 (1 time)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception Connection to service [https://cream-04.pd.infn.it:8443/ce-cream/services/CREAM2] failed: (852 times)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception EOF detected during communication. Probably service closed connection or SOCKET TIMEOUT occurred (54 times)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception MethodName=[invoke] ErrorCode=[0] Description=[Authorization error: Cannot set permissions to the store proxy certificate] FaultCause=[Authorization error: Cannot set permissions to the store proxy certificate] Timestamp=[Fri 23 Oct 2009 04:31:45] (21 times)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception MethodName=[invoke] ErrorCode=[0] Description=[Authorization error: Cannot store proxy certificate] FaultCause=[Authorization error: Cannot store proxy certificate] Timestamp=[Fri 23 Oct 2009 04:32:16] (3 times)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception The endpoint is blacklisted (733 times)
    • Transfer to CREAM failed due to exception: CREAM Register returned error "MethodName=[jobRegister] Timestamp=[Fri 23 Oct 2009 04:44:45] ErrorCode=[0] Description=[system error] FaultCause=[The problem seems to be related to glexec]"_(1 time)_
    • Transfer to CREAM failed due to exception: CREAM Start raised exception The endpoint is blacklisted (5 times)

ice05.png

4) Test starts on Fri Oct 19 at 12:00:05 CEST 2009 (WMS: devel20)

Description:
  • 4000 collections each of 20 jobs
  • One collection every 30 seconds
  • Four users
  • max_ice_threads = 10
  • Use all the CEs of testbedA (cream-12.pd, cream-04.pd and devel03.cnaf)
  • Use automatic-delegation
  • The job is a "sleep random(4242)"
  • Resubmission is enabled
  • Use proxy renewal service (myproxy.cern.ch)
  • Lease mechanism is not used

Test interrupted on Tue Oct 20 at 10:29:16 CEST 2009

  • Problems with the CEs that are blacklisted
    • cream-04.pd.indn.it:
      java.net.SocketException
      MESSAGE: Too many open files
      20 Oct 2009 12:55:42,323 org.glite.voms.PKIStore - Cannot refresh store: null
    • cream-12.pd.infn.it and devel03.cnaf.infn.it probably have problems with the new BLParser

  • Restarted cream-04.pd.infn.it at 12:49 on Wed Oct 21

Partial results taken on Thu Oct 22 at 09:41:43 CEST 2009

  • Collections correctly submitted: 1832 (36640 jobs)
    • DONE OK: 30808 (- %)
    • CANCELLED: 4905 (- %)
    • ABORTED: 927 (- %)
    • Resubmitted: 7140 (- %)

  • Errors found (2970)
    • BLAH error: no jobId in submission script's output (stdout:) (stderr: execute_cmd: 200 seconds timeout expired, killing child process.-) N/A (jobId = ...) (13 times)
    • blah error: send command timeout (22 times)
    • Cannot move ISB (${globus_transfer_cmd} gsiftp://devel20.cnaf.infn.it:2811/var/glite/SandboxDir/9d/https_3a_2f_2fdevel15.cnaf.infn.it_3a9000_2f9dVthtBkKwyOaSHnSLeXSQ/input/pippo file:///home/dteam028/home_cream_638539945/CREAM638539945/pippo): Problem to detect the lifetime of the proxy (1 time)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception Connection to service [https://cream-04.pd.infn.it:8443/ce-cream/services/CREAM2] failed: (603 times)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception EOF detected during communication. Probably service closed connection or SOCKET TIMEOUT occurred. (66 times)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception MethodName=[invoke] ErrorCode=[0] Description=[Authorization error: Cannot store proxy certificate] FaultCause=[Authorization error: Cannot store proxy certificate] Timestamp=[Tue 20 Oct 2009 05:33:28] (3 times)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception Received NULL fault; the error is due to another cause: FaultString=[] - FaultCode=[SOAP-ENV:Server.generalException] - FaultSubCode=[SOAP-ENV:Server.generalException] - FaultDetail=[invoke2009-10-19T16:08:21.143Z0cannot write the authN proxy to file: nullcannot write the authN proxy to file: nullorg.glite.ce.faults.AuthenticationFaultcream-12.pd.infn.it] (2 times)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception The endpoint is blacklisted (1740 times)
    • Transfer to CREAM failed due to exception: CREAM Start raised exception The endpoint is blacklisted (9 times)
    • Transfer to CREAM failed due to exception: Failed to create a delegation id for job https://devel15.cnaf.infn.it:9000/01bDqoEMYLtCgBJAwkGVBQ: reason is Connection to service [https://cream-04.pd.infn.it:8443/ce-cream/services/gridsite-delegation] failed: (511 times)

ice04.png

3) Test starts on Fri Oct 16 at 13:50:05 CEST 2009 (WMS: devel20)

Description:
  • 2000 collections each of 25 jobs
  • One collection every 30 seconds
  • Four users
  • max_ice_threads = 10
  • Use all the CEs of testbedA (cream-12.pd, cream-04.pd and devel03.cnaf)
  • Use automatic-delegation
  • The job is a "sleep 666"
  • Resubmission is enabled
  • Use proxy renewal service (myproxy.cern.ch)
  • Lease mechanism is not used

Submissions finish on Sat Oct 17 at 06:29:16 CEST 2009

  • 1746 collections correctly submitted in 15578 seconds: 4/8/25 (min/avg/max)
    • 254 submissions failures sue to load limiter

Final results taken on Mon Oct 19 at 09:41:43 CEST 2009

  • Collections correctly submitted: 1746 (43650 jobs)
    • DONE OK: 43642 (99.98 %)
    • CANCELLED: 8 (0.02%)
    • Resubmitted: 961 (2.2%)

  • Errors found (1097)
    • BLAH error: no jobId in submission script's output (stdout:) (stderr: execute_cmd: 200 seconds timeout expired, killing child process.-) N/A (jobId = xxxxx) (91 times)
    • BLAH error: submission command failed (exit code = 201) (stdout:) (stderr:[gLExec]: gLExec has detected an input file change during the use of the file. It's unknown if this file-jacking was accidental or intentional.-) N/A (jobId = xxxxx) (1 time)
    • Cannot move ISB: proxy expired (1 time)
    • Cannot take token (105 times)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception EOF detected during communication. Probably service closed connection or SOCKET TIMEOUT occurred. (42 times)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception Received NULL fault; the error is due to another cause: FaultString=[] - FaultCode=[SOAP-ENV:Server.generalException] - FaultSubCode=[SOAP-ENV:Server.generalException] - FaultDetail=[invoke</MethodName.2009-10-16T15:38:04.085Z0cannot write the authN proxy to file: nullcannot write the authN proxy to file: nullorg.glite.ce.faults.AuthenticationFaultcream-12.pd.infn.it] (1 time)
    • Transfer to CREAM failed due to exception: CREAM Register raised std::exception The endpoint is blacklisted (856 times)

ice03.png

2) Test starts on Wed Oct 15 at 12:21:53 CEST 2009 (WMS: devel20)

Description:
  • 2000 collections each of 25 jobs
  • One collection every 30 seconds
  • Four users
  • max_ice_threads = 10
  • Use all the CEs of testbedA (cream-12.pd, cream-04.pd and devel03.cnaf)
  • Use automatic-delegation
  • The job is a "sleep 666"
  • Resubmission is enabled
  • Lease mechanism is not used

  • Changes in the software wrt previous test:
    • WMS

Submissions finish on Wed Oct 16 at 05:22:55 CEST 2009

  • 1834 collections correctly submitted
    • 166 submissions failures

Final results taken on Thu Oct 16 10:26:21 CEST 2009

  • Collections correctly submitted: 1834 (45850 jobs)
    • DONE OK: 42868 (-%)
    • ABORTED: 2976 (-%) *
    • Resubmitted: 2976+8 (-%)

  • Errors found
    • Cannot take token (19 times)
    • BLAH error: submission command failed (exit code = 201) (stdout:) (stderr:[gLExec]: gLExec has detected an input file change during the use of the file. It's unknown if this file-jacking was accidental or intentional.- execute_cmd: poll() got an unknown event (stdout 0x0010 - stderr: 0x0000).-) N/A (jobId = CREAM603493778) (https://devel15.cnaf.infn.it:9000/hzShPwSsvd6S_1kQ2XsbvA)
    • Proxy is expired

ice02.png

Note:

* All the aborted are due to "proxy expired" reason because I forgot to activate proxy renewal service.

1) Test starts on Wed Oct 14 at 15:04:19 CEST 2009 (WMS: devel20)

Description:
  • 400 collections each of 25 jobs
  • One collection every 30 seconds
  • Four users
  • max_ice_threads = 10
  • Use all the CEs of testbedA (cream-12.pd, cream-04.pd and devel03.cnaf)
  • Use automatic-delegation
  • The job is a "sleep 666"
  • Resubmission is enabled
  • Lease mechanism is not used

Submissions finish on Wed Oct 14 at 18:22:55 CEST 2009

  • 400 collections submitted in 2371 seconds: 3/5/15 (min/avg/max)

Final results taken on Thu Oct 15 10:26:21 CEST 2009

  • Collections correctly submitted: 400 (10000 jobs)
    • DONE OK: 10000 (100%)
    • Resubmitted: 3 (0.03%)

  • Errors found (3)
    • Cannot take token (3 times)

ice01.png

-- AlessioGianelle - 13 Oct 2009

Topic attachments
I Attachment Action Size Date Who Comment
PNGpng ice01.png manage 6.1 K 2009-10-15 - 08:34 AlessioGianelle Ice graph. Test 01
PNGpng ice02.png manage 5.9 K 2009-10-16 - 11:04 AlessioGianelle Ice graph. Test 02
PNGpng ice03.png manage 5.7 K 2009-10-19 - 08:28 AlessioGianelle Ice graph. Test 03
PNGpng ice04.png manage 6.3 K 2009-10-21 - 14:07 AlessioGianelle Ice graph. Test 04
PNGpng ice05.png manage 8.0 K 2009-10-23 - 15:22 AlessioGianelle Ice graph. Test 05
PNGpng ice06.png manage 6.3 K 2009-11-02 - 13:28 AlessioGianelle Ice graph. Test 06
Edit | Attach | PDF | History: r48 | r22 < r21 < r20 < r19 | Backlinks | Raw View | More topic actions...
Topic revision: r20 - 2009-11-02 - AlessioGianelle
 
  • Edit
  • Attach
This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback