Tags:
, view all tags

TESTs on ICE

2) Test starts on Tue Jan 13 at 15:38:11 CET 2009 (WMS: devel14)

Description:
  • 7200 collections each of 40 jobs
  • One collection every 60 seconds
  • Used the CEs of testbedB (PD+CNAF) plus cream-04.pd.infn.it
  • Used automatic-delegation and proxy renewal service
  • Proxy has 5 hours of lifetime (and it is renewed every 4 hours)
  • The job is a "sleep 313"
  • Resubmission is able
  • Lease mechanism is not used

Partial results taken on Thu Jan 15 2009 ( Update )

  • Collections correctly submitted: 2701 (108040 jobs)
    • DONE OK: 107399 (99.4%)
    • ABORETD: 0 (0.0%)
    • Not finished: 641 (0.6%)
    • Resubmission: 795 (0.73%)

  • Errors found:
    • blparser service is not alive (528 times)
    • BLAH error (153 times)
      • no jobId in submission script's output (stdout:) (stderr:) N/A (jobId = [...]) (30 times)
      • send command timeout (2 times)
      • submission command failed (exit code = 120) (stdout:) (stderr:glexec policy violation: see glexec log for more details-) N/A (jobId = [...]) (2 times)
      • submission command failed (exit code = -15) (stdout:) (stderr:-killed by signal 15-) N/A (jobId = [...]) (112 times)
      • submission command failed (exit code = 1) (stdout:) (stderr:qsub: Invalid credential-) N/A (jobId = [...]) (7 times)
    • Cannot take token (103 times)
    • Cannot move OSB (1 time)
    • Transfer to CREAM failed (10 times)
      • FaultCause=[The problem seems to be related to glexec which reported: java.io.IOException: Too many open files]" (10 times)

1) Test starts on Wed Jan 7 at 16:01:32 CET 2009 (WMS: devel18)

Description:
  • 7200 collections each of 40 jobs
  • One collection every 60 seconds
  • Used the CEs of testbedB (PD+CNAF) plus cream-12.pd.infn.it
  • Used automatic-delegation and proxy renewal service
  • Proxy has 5 hours of lifetime (and it is renewed every 4 hours)
  • The job is a "sleep 313"
  • Resubmission is able

Test stopped on Monday Jan 12 for a serialization error on ICE

Results taken on Mon Jan 12 at 12:52:56 CET 2009

  • Collections correctly submitted: 3733 (149320 jobs)
    • DONE OK: 144004 (96.44%)
    • ABORETD: 446 (0.3%)
    • Not finished: 4870 (3.26%)

  • Errors found:
    • Transfer to CREAM failed due to exception:
      • FaultCause=[org.glite.ce.common.db.DatabaseException: Rollback executed due to: Deadlock found when trying to get lock; try restarting transaction]"
      • Authentication error: Unable to open the file [...]: No such file or directory
      • Connection to service [...] failed:
      • FaultCause=[User [...] not authorized for operation JobRegister]
      • FaultCause=[The problem seems to be related to glexec which reported: java.io.IOException: Too many open files]"
      • FaultCause=[org.glite.ce.common.db.DatabaseException: Server connection failure during transaction. Due to underlying exception: 'java.net.SocketException: Too many open files'.
      • FaultCause=[java.net.UnknownHostException: cream-31.pd.infn.it: cream-31.pd.infn.it]"
      • CREAM Start raised exception Received NULL fault; the error is due to another cause: FaultString=[Client fault] - FaultCode=[SOAP-ENV:Client] - FaultSubCode=[SOAP-ENV:Client]
      • Failed to get lease_id for job [...] Exception is Lease renew operation FAILED for lease ID [...] Exception is Connection to service [https://cream-29.pd.infn.it:8443/ce-cream/services/CREAM2] failed:
      • CREAM Start failed due to error MethodName=[JOB_START] Timestamp=[Wed 07 Jan 2009 22:10:43] ErrorCode=[2] Description=[the job has a status not compatible with the JOB_START command!] FaultCause=[N/A]
    • BLAH error:
      • submission command failed (exit code = -15) (stdout:) (stderr:/opt/glite/etc/blah.config: line 54: syntax error near unexpected token `('-/opt/glite/etc/blah.config: line 54: `//Added for test by Enrico Fattibene (07/01/2009)'--killed by signal 15-) N/A (jobId = CREAM251333253)
      • submission command failed (exit code = 120) (stdout:) (stderr:glexec policy violation: see glexec log for more details-) N/A (jobId = CREAM550710004)
      • submission command failed (exit code = 1) (stdout:) (stderr:Cannot resolve default server host 'cream-28.pd.infn.it' - check server_name file.-qsub: cannot connect to server cream-28.pd.infn.it (errno=15008)-) N/A (jobId = CREAM027575485)
      • submission command failed (exit code = -15) (stdout:) (stderr:-killed by signal 15-) N/A (jobId = CREAM752590056)
      • no jobId in submission script's output (stdout:) (stderr:) N/A (jobId = CREAM988027857)
    • DELEGATION_PROXY_CERT_SANDBOX_PATH not defined!
    • Cannot move ISB [...] The proxy credential [...] expired 0 minutes ago.
    • Proxy is expired; Proxy expired: job killed Terminated Master process killed
    • lsf_reason=32512
    • Lease expired
    • The job cannot be submitted because the blparser service is not alive

-- AlessioGianelle - 08 Jan 2009

Edit | Attach | PDF | History: r75 | r10 < r9 < r8 < r7 | Backlinks | Raw View | More topic actions...
Topic revision: r8 - 2009-01-15 - AlessioGianelle
 
  • Edit
  • Attach
This site is powered by the TWiki collaboration platformCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback