Tags:
, view all tags

Check bugs:

  • Bugs #39807: In some circumstances, jobs which are killed by CREAM job wrapper might remain in ICE cache forever FIXED
    • Following the instructions reported in the bug's comments set:
      start_listener = false;
      start_subscription_updater = false;
      poller_delay = 900;
      poller_status_threshold_time = 60;
      in the Ice section of the configuration file (i.e. glite_wms.conf)
    • Submit a long job (i.e. a job that should run for more than 15 minutes), with a short proxy (i.e. a proxy with a lifetime of about 13 minutes).
    • Submit another job with a long proxy (i.e. more than an hour).
    • After about 17/18 minutes the original job should be ABORTED.

  • Bugs #42018: Missing exit on very severe error HOPEFULLY FIXED
    • Changes inside the code.

  • Bugs #42081: Exception not catched in ICE HOPEFULLY FIXED
    • Changes inside the code.

  • Bugs #42141: Calling the FileList::get_size() method should be mutex protected HOPEFULLY FIXED
    • Changes inside the code.

  • Bugs #44604: A bad handling of delegations slow down dramatically the submission rate of ICE HOPEFULLY FIXED
    • Show tests below.

  • Bugs #46116: MaxOutputSandboxSize value not sent to CREAM by ICE FIXED
    • Set the parameter MaxOutputSandboxSize? in the WorkloadManager? section of the configuration file /opt/glite/etc/glite_wms.conf on the WMS to 100 and restart the workload manager.
    • Submit to a cream CE a jdl like this:
      [
      Type = "Job";
      Executable = "27215_exe.sh";
      Arguments = "70";
      StdOutput = "test.out";
      StdError = "test.err";
      InputSandbox = {"27215_exe.sh"};
      OutputSandbox = {"test.err","test.out","out2", "out1"};
      usertags = [ bug = "27215" ];
      ] 
      where 27215_exe.sh contains
      #!/bin/sh
      MAX=$1
      i=0
      while [ $i -lt $MAX ]; do
                      echo -n "1" >> out1
                      echo -n "2" >> out2
          i=$[$i + 1]
      done
      
    • Take the CreamJobID from the "Transfer Event" logged by the "LogMonitor" (i.e. The field Dest jobid)
    • Using the command of the client of the CE look inside the JDL sent to the ce: glite-ce-job-status -L 2 <CreamJobID>; you should find this parameter: maxOutputSandboxSize = 1.000000000000000E+02;
    • Due to a bug in CREAM the output files are not truncated as expected.

  • Bugs #47389: There's a mem leak in ICE that raises in some very rare circumstances HOPEFULLY FIXED
    • Not easy to reproduce

  • Bugs #47509: ICE must be modified in order to be compliant with modification to CEMon C++ API FIXED
    • Verify if the subscription of ICE to the CE works well (you need to look inside the log file of ICE)

TESTs on ICE

12) Test starts on Feb 24 at 10:29:07 (WMS: wms007)

Description:
  • 120 collections each of 60 jobs
  • One collection every 60 seconds
  • Four users
  • The job is a "sleep 313"
  • Resubmission is enabled
  • We use both CREAM and LCG CEs
  • Long proxy

Test finishes on Mon Feb 24 at 12:25:54 CET 2009

  • 92 collections submitted in 712 seconds: 4/7/13 (min/avg/max)
    • 28 submission(s) fail(s) (due to load limiter)

Final results

  • Collections correctly submitted: 91 (5460 jobs)
    • DONE OK: 5460 (100%)
      • CREAM: 3400
      • LCG: 2060
    • ABORTED: 0 (0%)
    • Resubmitted: 163 (2.99%)

  • The submission of one collection failed due to:
Status Reason:      LBProxy is enabled
Unable to query LB and LBProxy
edg_wll_QueryEvents[Proxy]
Exit code: 1413
LB[Proxy] Error: DNS resolver error
(edg_wll_gss_connect(): Unknown host)

11) Test starts on Mon Feb 23 at 15:25:49CET 2009 (WMS: wms007)

Description:
  • 120 collections each of 60 jobs
  • One collection every 60 seconds
  • Four users
  • The job is a "sleep 313"
  • Resubmission is enabled
  • We use both CREAM and LCG CEs

Test finishes on Mon Feb 23 at 17:22:32 CET 2009

  • 120 collections submitted in 1092 seconds: 5/9/17 (min/avg/max)

Final results

  • Collections correctly submitted: 120 (7200 jobs)
    • DONE OK: 6950 (96.53%)
      • CREAM: 2243
      • LCG: 4707
    • ABORTED: 249 (3.46%)
      • LCG: 249
    • Not finished: 1 (0.01%)
      • LCG: 1
    • Resubmitted: 696 (9.7%)

  • All the jobs have been aborted for "proxy expired" because the job renewal daemon doesn't work.

10) Test starts on Tue Feb 5 at 12:41:35 CET 2009 (WMS: devel14)

Description:
  • 4320 collections each of 40 jobs
  • One collection every 60 seconds
  • Five users
  • max_ice_threads = 20
  • Used all the CEs of testbedB (except cert-06.cnaf) plus cream-04.pd.infn.it
  • Used automatic-delegation and proxy renewal service (MyProxyServer = "myproxy.cern.ch")
  • Proxy has 5 hours of lifetime (and it is renewed every 4 hours)
  • The job is a "sleep 4242"
  • Resubmission is enabled
  • Lease mechanism is not used
  • Changes in the software wrt previous test:
    • ICE
      • Fix a problem with proxy renewal seen in the previous test
      • Removed useless check of proxy duration in subscriptionManager, which could result in performance problems

Final results taken on Thu Feb 10 at 16:20:32 CET 2009

  • Collections correctly submitted: 1568 (62720 jobs)
    • DONE OK: 58641 (93.5%)
    • ABORTED: 4079 (6.5%)
    • Resubmitted: 41601 (66.33%)

  • Errors found (82530):
    • Cannot move ISB (75625 times 91.63%)
    • Cannot move OSB (115 times 0.14%)
    • Proxy is expired (5711 times 6.92%)
    • pbs_reason (702 times 0.85%)
      • pbs_reason=1; [...] proxy expired (477 times)
      • pbs_reason=271 (225 times)
    • Transfer to CREAM failed (187 times 0.23%)
      • due to exception: CREAM Register raised std::exception Connection to service [...] failed: (187 times)
    • lsf_reason (184 times 0.23%)
      • lsf_reason=36608; Proxy expired: job killed Terminated Master process killed (138 times)
      • lsf_reason=256 (43 times)
      • lsf_reason=1603 (3 times)
    • Cannot take token (4 times 0%)
    • BLAH error (2 times 0%)
      • submission command failed (exit code = 1) (stdout:) (stderr:qsub: Invalid credential-) N/A (jobId = [...]) (2 times)

  • Job Aborted (4079)
    • request expired (792 times 19.42%)
    • hit job shallow retry count (3) (3263 times 80%)
    • hit job retry count (2) (24 times 0.58%)

ice10.png

9) Test starts on Fri Jan 30 at 12:41:22 CET 2009 (WMS: devel14)

Description:
  • 4320 collections each of 40 jobs
  • One collection every 60 seconds
  • Five users
  • max_ice_threads = 20
  • Used all the CEs of testbedB (except cert-06.cnaf) plus cream-04.pd.infn.it
  • Used automatic-delegation and proxy renewal service (MyProxyServer = "myproxy.cern.ch";)
  • Proxy has 5 hours of lifetime (and it is renewed every 4 hours)
  • The job is a "sleep 4242"
  • Resubmission is enabled
  • Lease mechanism is not used
  • Changes in the software wrt previous test:
    • CEs:
      • Fix for bug #45913 (only on cream-04.pd.infn.it)
      • Fix for bug #46283 (only on cream-04.pd.infn.it and cert-04.pd.infn.it)
    • ICE
      • Fix for bug #46405
      • 5 sec. (instead of 60) of delay between two LB logging tries
      • Error code is printed in the ICE log file when a log to LB fails

Test finishes on Mon Feb 2 at 12:39:09 CET 2009

  • 1427 collections submitted in 29752 seconds: 5/20/90 (min/avg/max)
    • 2893 submissions fail due to load limiter

Results taken on Thu Feb 02 at 15:24:31 CET 2009

  • Collections correctly submitted: 1427 (57080 jobs)
    • DONE OK: 28244 (49.48%)
    • ABORTED: 0 (0%)
    • Not finished: 28836 (50.52%)
    • Resubmissions: 2012 (3.52%)

  • Errors found (2233):
    • BLAH error (19 times 0.85%)
      • submission command failed (exit code = 1) (stdout:) (stderr:pbs_iff: cannot read reply from pbs_server-No Permission.-qsub: cannot connect to server cream-28.pd.infn.it (errno=15007)-) N/A (jobId = [...]) (18 times)
      • submission command failed (exit code = 1) (stdout:) (stderr:pbs_iff: cannot read reply from pbs_server-No Permission.-qsub: cannot connect to server cream-28.pd.infn.it (errno=15007)- exe_getouterr: poll() got an unknown event (stdout 0x0010 - stderr: 0x0000).-) N/A (jobId = [...]) (1 time)
    • Cannot move ISB (1872 times 83.84%)
    • Proxy is expired (325 times 14.55%)
    • lsf_reason (1 time 0.04%)
      • lsf_reason=36608 (1 time)
    • pbs_reason (16 times 0.72%)
      • pbs_reason=1; [...] proxy expired (15 times)
      • pbs_reason=271; Proxy expired: job killed Terminated Master process killed (1 time)

ice9.png

8) Test starts on Mon Jan 26 17:59:02 CET 2009 (WMS: devel14)

Description:
  • 4320 collections each of 40 jobs
  • One collection every 60 seconds
  • Five users
  • max_ice_threads = 20
  • Used all the CEs of testbedB (except cert-06.cnaf)
  • Used automatic-delegation and proxy renewal service
  • Proxy has 5 hours of lifetime (and it is renewed every 4 hours)
  • The job is a "sleep 4242"
  • Resubmission is able
  • Lease mechanism is not used
  • Changes in the software wrt previous test:
    • ICE:
      • Fixed problem with proxy renewal seen in previous test

Test interrupted for a problem in the proxy-renewal service daemon on Thu Jan 29 12:05:12 CET 2009

Results taken on Thu Jan 29 at 18:35:31 CET 2009

  • Collections correctly submitted: 1433 (57320 jobs)
    • DONE OK: 33566 (58.56%)
    • ABORTED: 8414 (14.68%)
    • Not finished: 15340 (26.76%)
    • Resubmissions: 33507 (58.46%)

  • Errors found (33957):
    • BLAH error (3 time 0.01%)
      • submission command failed (exit code = 1) (stdout:) (stderr:qsub: Invalid credential-) N/A (jobId = [...]) (1 time)
      • submission command failed (exit code = 106) (stdout:) (stderr:glexec policy violation: see glexec log for more details-) N/A (jobId = [...]) (2 times)
    • Cannot move ISB (4513 times 13.29%)
    • Cannot move OSB (73 times 0.21%)
    • Transfer to CREAM failed (28607 times 84.24%)
      • due to exception: Authentication error: The proxy is EXPIRED! (28134 times)
      • due to exception: Authentication error: Unable to open the file [/var/glite/SandboxDir/[...]/user.proxy] : No such file or directory (432 times)
      • Failed to create a delegation id for job [...]: reason is Received NULL fault; the error is due to another cause: FaultString=[Client fault] - FaultCode=[SOAP-ENV:Client] - FaultSubCode=[SOAP-ENV:Client] (19 times)
      • Failed to create a delegation id for job [...]: reason is Failed proxy validation - it has expired. (4 times)
      • Failed to create a delegation id for job [...]: reason is CreamProxy_Delegate::execute() - Coundl't open proxyfile [...]: The proxy is EXPIRED! (1 time)
      • CREAM Register raised std::exception Received NULL fault; the error is due to another cause: FaultString=[Client fault] - FaultCode=[SOAP-ENV:Client] - FaultSubCode=[SOAP-ENV:Client] (12 times)
      • CREAM Register returned error "MethodName=[jobRegister] Timestamp=[Wed 28 Jan 2009 17:42:42] ErrorCode=[0] Description=[system error] FaultCause=[cannot create the job's working directory! The problem seems to be related to glexec]" (5 times)
    • Proxy is expired (688 times 2.03%)
    • lsf_reason (61 times 0.18%)
      • lsf_reason=65280 (32 times)
      • lsf_reason=36608 (22 times)
      • lsf_reason=1603 (1 time)
      • lsf_reason=256 (6 times)
    • pbs_reason (12 times 0.04%)
      • pbs_reason=271; Proxy expired: job killed Terminated Master process killed (12 times)

ice8.png

BUGS:

  • CREAM
    • #46405: VOMSWrapper should try more than once to open a proxy file
  • BLAH
    • #46283: Possible memory leak in strtoken function for BLParser

7) Test starts on Fri Jan 23 12:28:01 CET 2009 (WMS: devel14)

Description:
  • 7200 collections each of 40 jobs
  • One collection every 60 seconds
  • Five users
  • max_ice_threads = 20
  • Used all the CEs of testbedB (except cert-06.cnaf)
  • Used automatic-delegation and proxy renewal service
  • Proxy has 5 hours of lifetime (and it is renewed every 4 hours)
  • The job is a "sleep 4242"
  • Resubmission is able
  • Lease mechanism is not used
  • Changes in the software wrt previous test:
    • ICE:
      • Fixed problem seen in previous test

Test interrupted on Mon Jan 26 17:05:12 CET 2009

Results taken on Mon Jan 26 at 18:20:31 CET 2009

  • Collections correctly submitted: 1741 (69640 jobs)
    • DONE OK: 51894 (74.52%)
    • ABORTED: 44 (0.06%)
    • Not finished: 16186 (23.24%)
    • CANCELLED: 1516 (2.18%)
    • Resubmissions: 18218 (26.16%)

  • Errors found (18218):
    • BLAH error (2 times 0.01%)
      • submission command failed (exit code = 1) (stdout:) (stderr:qsub: Invalid credential-) N/A (jobId = [...]) (1 time)
      • submission command failed (exit code = 1) (stdout:) (stderr:Cannot resolve default server host 'cream-28.pd.infn.it' - check server_name file.-qsub: cannot connect to server cream-28.pd.infn.it (errno=15008)-) N/A (jobId = [...]) (1 time)
    • Cannot take token (77 times 0.42%)
    • Cannot move ISB (14222 time 78.07%)
    • Cannot move OSB (82 times 0.45%)
    • Transfer to CREAM failed (4 times 0.02%)
      • due to exception: Authentication error: Unable to open the file [/var/glite/SandboxDir/[...]/user.proxy] : No such file or directory (4 times)
    • lsf_reason (30 times 0.16%)
      • lsf_reason=36608 (17 times)
      • lsf_reason=256 (13 times)
    • Proxy is expired (3113 times 17.09%)
    • pbs_reason (688 times 3.78%)
      • pbs_reason=-1 (656 times)
      • pbs_reason=271; Proxy expired: job killed Terminated Master process killed (32 times)

ice7.png

6) Test starts on Thu Jan 22 at 17:17:38 CET 2009 (WMS: devel14)

Description:
  • 7200 collections each of 40 jobs
  • One collection every 60 seconds
  • Five users
  • max_ice_threads = 20
  • Used all the CEs of testbedB (except cert-06.cnaf)
  • Used automatic-delegation and proxy renewal service
  • Proxy has 5 hours of lifetime (and it is renewed every 4 hours)
  • The job is a "sleep 4242"
  • Resubmission is able
  • Lease mechanism is not used
  • Changes in the software wrt previous test:
    • CEs:
      • Fix for bug #45718
      • Fix for bug #45983
      • Fix for bug #46024
    • ICE
      • Use of same delegationid if CREAM complains that it doesn't exist anymore

Test aborted on Fri Jan 23 10:30:12 CET 2009

5) Test starts on Wed Jan 21 at 12:45:49 CET 2009 (WMS: devel14)

Description:
  • 300 collections each of 80 jobs
  • One collection every 60 seconds
  • One user
  • Set max_ice_threads = 40;
  • Used the CEs of testbedB (only PD)
  • Used automatic-delegation and proxy renewal service
  • Proxy has 5 hours of lifetime (and it is renewed every 4 hours)
  • The job is a "sleep 313"
  • Resubmission is able
  • Lease mechanism is not used

Test finishes on Wed Jan 21 at 17:44:58 CET 2009

  • 224 collections submitted in 7416 seconds: 64/33/6 (max/avg/min)
    • 76 submissions fail due to load limiter

  • Collections correctly submitted: 224 (17920 jobs)
    • DONE OK: 17850 (99.6%)
    • ABORTED: 0 (0.0%)
    • Not finished: 70 (0.4%)
    • Resubmissions: 8 (0.04%)

  • Errors found (8):
    • Cannot move ISB (3 times)
    • lsf_reason=1603 (3 times)
    • BLAH error (2 times)

ice5.png

4) Test starts on Tue Jan 20 at 10:09:58 CET 2009 (WMS: devel14)

Description:
  • 300 collections each of 80 jobs
  • One collection every 60 seconds
  • One user
  • Used the lsf CEs of testbedB (PD+CNAF) (cert-06 at cnaf is not considered)
  • Used automatic-delegation and proxy renewal service
  • Proxy has 5 hours of lifetime (and it is renewed every 4 hours)
  • The job is a "sleep 313"
  • Resubmission is able
  • Lease mechanism is not used

Test finishes on Tue Jan 20 at 15:13:14 CET 2009

  • 197 collections submitted in 8499 seconds: 84/43/9 (max/avg/min)
    • 103 submissions fail due to load limiter

  • Collections correctly submitted: 197 (15760 jobs)
    • DONE OK: 15695 (99.6%)
    • ABORTED: 0 (0.0%)
    • Not finished: 65 (0.4%)
    • Resubmissions: 3 (0.02%)

  • Errors found (3):
    • Cannot move OSB (1 time)
    • Cannot move ISB (2 times)

ice4.png

3) Test starts on Mon Jan 19 at 15:22:51 CET 2009 (WMS: devel14)

Description:
  • 2880 collections each of 40 jobs
  • One collection every 30 seconds
  • One user
  • Used the lsf CEs of testbedB (PD+CNAF)
  • Used automatic-delegation and proxy renewal service
  • Proxy has 5 hours of lifetime (and it is renewed every 4 hours)
  • The job is a "sleep 313"
  • Resubmission is enabled
  • Lease mechanism is not used

Test has been modified on Mon Jan 19 at 17:03:41:

  • 1440 collections each of 80 jobs
  • One collection every 60 seconds

Test finishes on Tue Jan 20 at 01:53:01 CET 2009

  • Collections correctly submitted: 399 (24800 jobs)
    • DONE OK: 24702 (99.6%)
    • ABORTED: 0 (0.0%)
    • Not finished: 98 (0.4%)
    • Resubmissions: 1176 (4.74%)

  • Errors found (1176):
    • Cannot take token (1 time 0.08%)
    • Cannot move ISB (39 times 3.32%)
    • Transfer to CREAM failed (50 times 4.25%)
      • Transfer to CREAM failed due to exception: CREAM Register raised std::exception Received NULL fault; the error is due to another cause: FaultString=[Client fault] - FaultCode=[SOAP-ENV:Client] - FaultSubCode=[SOAP-ENV:Client] (7 times)
      • Transfer to CREAM failed due to exception: Authentication error: The proxy is EXPIRED! (43 times)
    • lsf_reason=32512 (1085 times 92.27%)
    • lsf_reason=306 (1 time 0.08%)

ice3.png

2) Test starts on Tue Jan 13 at 15:38:11 CET 2009 (WMS: devel14)

Description:
  • 7200 collections each of 40 jobs
  • One collection every 60 seconds
  • One user
  • Used the CEs of testbedB (PD+CNAF) plus cream-04.pd.infn.it
  • Used automatic-delegation and proxy renewal service
  • Proxy has 5 hours of lifetime (and it is renewed every 4 hours)
  • The job is a "sleep 313"
  • Resubmission is enabled
  • Lease mechanism is not used
  • Changes in the software wrt previous test:
    • CEs:
      • Fix for bug #45437
      • Fix for bug #45736
    • ICE
      • Management of serialization error
      • Renewal done at 80 % of lifetime of proxy (or when there are only 20 minutes left)

Test finishes on Sun Jan 18 at 15:42:28 CET 2009

  • 7180 collections submitted in 70789 seconds: 141/9/3 (max/avg/min)
    • 20 submissions fail due to load limiter

  • Collections correctly submitted: 7180 ( 287200 jobs)
    • DONE OK: 284838 (99.18%)
    • ABORTED: 0 (0.0%)
    • Not finished: 2362 (0.82%)
    • Resubmissions: 4599 (1.60%)

  • Errors found (4599):
    • blparser service is not alive (578 times 12.57%)
    • BLAH error (288 times 6.26%)
      • no jobId in submission script's output (stdout:) (stderr:) N/A (jobId = [...]) (52 times)
      • send command timeout (2 times)
      • submission command failed (exit code = 120) (stdout:) (stderr:glexec policy violation: see glexec log for more details-) N/A (jobId = [...]) (2 times)
      • submission command failed (exit code = -15) (stdout:) (stderr:-killed by signal 15-) N/A (jobId = [...]) (219 times)
      • submission command failed (exit code = 1) (stdout:) (stderr:qsub: Invalid credential-) N/A (jobId = [...]) (13 times)
    • Cannot take token (201 times 4.37%)
    • Cannot move OSB (1 time 0.02%)
    • Cannot move ISB (5 times 0.11%)
    • Transfer to CREAM failed (19 times 0.41%)
      • FaultCause=[The problem seems to be related to glexec which reported: java.io.IOException: Too many open files]" (10 times)
      • CREAM Register raised std::exception Connection to service [https://cert-xx.cnaf.infn.it:8443/ce-cream/services/CREAM2] failed: (9 times)
    • lsf_reason=32512 (3505 times 76.22%)
    • Proxy is expired (1 time 0.02%)
    • lsf_reason=306 (1 time 0.02%)

  • Jobs not finished:

Schedul Running Tot. Ce Name
0 6 6 cert-04.cnaf.infn.it
6 349 355 cream-34.pd.infn.it
0 3 3 cream-26.pd.infn.it
0 2 2 cream-25.pd.infn.it
5 332 337 cream-28.pd.infn.it
0 1 1 cream-27.pd.infn.it
0 2 2 cream-22.pd.infn.it
0 5 5 cream-04.pd.infn.it
0 1 1 cream-23.pd.infn.it
0 2 2 cert-07.cnaf.infn.it
1 0 1 cert-13.cnaf.infn.it
6 334 340 cream-29.pd.infn.it
9 327 336 cream-33.pd.infn.it
0 5 5 cert-08.cnaf.infn.it
0 2 2 cert-05.cnaf.infn.it
6 0 6 cert-06.cnaf.infn.it
6 307 313 cream-31.pd.infn.it
4 296 300 cream-32.pd.infn.it
8 337 345 cream-30.pd.infn.it
51 2311 2362 Totals

BUGS:

  • CREAM
  • BLAH
    • #45718: Some check on log lines should be added on BLParser code
    • #45983: BLAH can leave children processes behind.

1) Test starts on Wed Jan 7 at 16:01:32 CET 2009 (WMS: devel18)

Description:
  • 7200 collections each of 40 jobs
  • One collection every 60 seconds
  • One user
  • Used the CEs of testbedB (PD+CNAF) plus cream-12.pd.infn.it
  • Used automatic-delegation and proxy renewal service
  • Proxy has 5 hours of lifetime (and it is renewed every 4 hours)
  • The job is a "sleep 313"
  • Resubmission is able

Test stopped on Monday Jan 12 for a serialization error on ICE

Results taken on Mon Jan 12 at 12:52:56 CET 2009

  • Collections correctly submitted: 3733 (149320 jobs)
    • DONE OK: 144004 (96.44%)
    • ABORTED: 446 (0.3%)
    • Not finished: 4870 (3.26%)

  • Errors found:
    • Transfer to CREAM failed due to exception:
      • FaultCause=[org.glite.ce.common.db.DatabaseException: Rollback executed due to: Deadlock found when trying to get lock; try restarting transaction]"
      • Authentication error: Unable to open the file [...]: No such file or directory
      • Connection to service [...] failed:
      • FaultCause=[User [...] not authorized for operation JobRegister]
      • FaultCause=[The problem seems to be related to glexec which reported: java.io.IOException: Too many open files]"
      • FaultCause=[org.glite.ce.common.db.DatabaseException: Server connection failure during transaction. Due to underlying exception: 'java.net.SocketException: Too many open files'.
      • FaultCause=[java.net.UnknownHostException: cream-31.pd.infn.it: cream-31.pd.infn.it]"
      • CREAM Start raised exception Received NULL fault; the error is due to another cause: FaultString=[Client fault] - FaultCode=[SOAP-ENV:Client] - FaultSubCode=[SOAP-ENV:Client]
      • Failed to get lease_id for job [...] Exception is Lease renew operation FAILED for lease ID [...] Exception is Connection to service [https://cream-29.pd.infn.it:8443/ce-cream/services/CREAM2] failed:
      • CREAM Start failed due to error MethodName=[JOB_START] Timestamp=[Wed 07 Jan 2009 22:10:43] ErrorCode=[2] Description=[the job has a status not compatible with the JOB_START command!] FaultCause=[N/A]
    • BLAH error:
      • submission command failed (exit code = -15) (stdout:) (stderr:/opt/glite/etc/blah.config: line 54: syntax error near unexpected token `('-/opt/glite/etc/blah.config: line 54: `//Added for test by Enrico Fattibene (07/01/2009)'--killed by signal 15-) N/A (jobId = CREAM251333253)
      • submission command failed (exit code = 120) (stdout:) (stderr:glexec policy violation: see glexec log for more details-) N/A (jobId = CREAM550710004)
      • submission command failed (exit code = 1) (stdout:) (stderr:Cannot resolve default server host 'cream-28.pd.infn.it' - check server_name file.-qsub: cannot connect to server cream-28.pd.infn.it (errno=15008)-) N/A (jobId = CREAM027575485)
      • submission command failed (exit code = -15) (stdout:) (stderr:-killed by signal 15-) N/A (jobId = CREAM752590056)
      • no jobId in submission script's output (stdout:) (stderr:) N/A (jobId = CREAM988027857)
    • DELEGATION_PROXY_CERT_SANDBOX_PATH not defined!
    • Cannot move ISB [...] The proxy credential [...] expired 0 minutes ago.
    • Proxy is expired; Proxy expired: job killed Terminated Master process killed
    • lsf_reason=32512
    • Lease expired
    • The job cannot be submitted because the blparser service is not alive

BUGS:

  • CREAM
    • #45914: glexec and proxy rotation
    • #45913: Proxy renewal not done for CREAM jobs not yet in IDLE status
    • #45736: Problems in case of resubmissions in the same CREAM CE
    • #45437: Sometimes the jobPurger throws the exception "Too many open files"
  • BLAH
    • #45718: Some check on log lines should be added on BLParser code
    • #45717: BLParserPBS should consider log lines like "unable to run job"

-- AlessioGianelle - 08 Jan 2009

Topic attachments
I Attachment Action Size Date Who Comment
PNGpng ice10.png manage 5.6 K 2009-02-09 - 11:28 AlessioGianelle Test 10 Ice submission rate
PNGpng ice3.png manage 4.5 K 2009-01-20 - 16:16 AlessioGianelle Test 3 Ice submission rate
PNGpng ice4.png manage 6.4 K 2009-01-21 - 10:15 AlessioGianelle Test4 Ice submission rate
PNGpng ice5.png manage 6.0 K 2009-01-22 - 08:41 AlessioGianelle Test 5 Ice submission rate
PNGpng ice7.png manage 5.8 K 2009-01-26 - 17:08 AlessioGianelle Test 7 Ice submission rate
PNGpng ice8.png manage 5.9 K 2009-01-28 - 11:06 AlessioGianelle Test 8 Ice submission rate
PNGpng ice9.png manage 5.4 K 2009-02-02 - 11:33 AlessioGianelle Test 9 Ice submission rate
Edit | Attach | PDF | History: r75 | r62 < r61 < r60 < r59 | Backlinks | Raw View | More topic actions...
Topic revision: r60 - 2009-03-04 - AlessioGianelle
 
  • Edit
  • Attach
This site is powered by the TWiki collaboration platformCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback