Tags:
,
view all tags
---+ TESTs on ICE ---++ 5) Test starts on Wed Jan 21 at 12:45:49 CET 2009 (WMS: devel14) Description: * 300 collections each of 80 jobs * One collection every 60 seconds * One user * %PINK%Set max_ice_threads = 40;%ENDCOLOR% * Used the CEs of testbedB (only PD) * Used automatic-delegation and proxy renewal service * Proxy has 5 hours of lifetime (and it is renewed every 4 hours) * The job is a "sleep 313" * Resubmission is able * Lease mechanism is not used ---+++ Test finish on Wed Jan 21 at 17:44:58 CET 2009 * 224 collections submitted in 7416 seconds: 64/33/6 (max/avg/min) * 76 submissions fails due to load limiter * Collections correctly submitted: 224 (17920 jobs) ---++ 4) Test starts on Tue Jan 20 at 10:09:58 CET 2009 (WMS: devel14) Description: * 300 collections each of %PINK%80 jobs%ENDCOLOR% * One collection every 60 seconds * One user * Used the lsf CEs of testbedB (PD+CNAF) (cert-06 at cnaf is not considered) * Used automatic-delegation and proxy renewal service * Proxy has 5 hours of lifetime (and it is renewed every 4 hours) * The job is a "sleep 313" * Resubmission is able * Lease mechanism is not used ---+++ Test finish on Tue Jan 20 at 15:13:14 CET 2009 * 197 collections submitted in 8499 seconds: 84/43/9 (max/avg/min) * 103 submissions fails due to load limiter * Collections correctly submitted: 197 (15760 jobs) * DONE OK: %GREEN% 15695 (99.6%) %ENDCOLOR% * ABORTED: %RED% 0 (0.0%) %ENDCOLOR% * Not finished: %ORANGE% 65 (0.4%) %ENDCOLOR% * Resubmissions: %BLUE% 3 (0.02%) %ENDCOLOR% * Errors found (3): * Cannot move OSB _(1 time)_ * Cannot move ISB _(2 times)_ %ATTACHURL%/ice4.png ---++ 3) Test starts on Mon Jan 19 at 15:22:51 CET 2009 (WMS: devel14) Description: * 2880 collections each of 40 jobs * One collection %PINK%every 30 seconds%ENDCOLOR% * One user * Used the lsf CEs of testbedB (PD+CNAF) * Used automatic-delegation and proxy renewal service * Proxy has 5 hours of lifetime (and it is renewed every 4 hours) * The job is a "sleep 313" * Resubmission is able * Lease mechanism is not used Test has been modified on Mon Jan 19 at 17:03:41: * 1440 collections each of 80 jobs * One collection every 60 seconds ---+++ Test finishes on Tue Jan 20 at 01:53:01 CET 2009 * Collections correctly submitted: 399 (24800 jobs) * DONE OK: %GREEN% 24702 (99.6%) %ENDCOLOR% * ABORTED: %RED% 0 (0.0%) %ENDCOLOR% * Not finished: %ORANGE% 98 (0.4%) %ENDCOLOR% * Resubmissions: %BLUE% 1176 (4.74%) %ENDCOLOR% * Errors found (1176): * Cannot take token _(1 time 0.08%)_ * Cannot move ISB _(39 times 3.32%)_ * Transfer to CREAM failed _(50 times 4.25%)_ * Transfer to CREAM failed due to exception: CREAM Register raised std::exception Received NULL fault; the error is due to another cause: FaultString=[Client fault] - FaultCode=[SOAP-ENV:Client] - FaultSubCode=[SOAP-ENV:Client] _(7 times)_ * Transfer to CREAM failed due to exception: Authentication error: The proxy is EXPIRED! _(43 times)_ * lsf_reason=32512 _(1085 times 92.27%)_ * lsf_reason=306 _(1 time 0.08%)_ %ATTACHURL%/ice3.png ---++ 2) Test starts on Tue Jan 13 at 15:38:11 CET 2009 (WMS: devel14) Description: * 7200 collections each of 40 jobs * One collection every 60 seconds * One user * Used the CEs of testbedB (PD+CNAF) plus cream-04.pd.infn.it * Used automatic-delegation and proxy renewal service * Proxy has 5 hours of lifetime (and it is renewed every 4 hours) * The job is a "sleep 313" * Resubmission is able * %PINK%Lease mechanism is not used%ENDCOLOR% ---+++ Test finishes on Sun Jan 18 at 15:42:28 CET 2009 * 7180 collections submitted in 70789 seconds: 141/9/3 (max/avg/min) * 20 submissions fails due to load limiter * Collections correctly submitted: 7180 ( 287200 jobs) * DONE OK: %GREEN% 284838 (99.18%) %ENDCOLOR% * ABORTED: %RED% 0 (0.0%) %ENDCOLOR% * Not finished: %ORANGE% 2362 (0.82%) %ENDCOLOR% * Resubmissions: %BLUE% 4599 (1.60%) %ENDCOLOR% * Errors found (4599): * blparser service is not alive _(578 times 12.57%)_ * BLAH error _(288 times 6.26%)_ * no jobId in submission script's output (stdout:) (stderr:) N/A (jobId = [...]) _(52 times)_ * send command timeout _(2 times)_ * submission command failed (exit code = 120) (stdout:) (stderr:glexec policy violation: see glexec log for more details-) N/A (jobId = [...]) _(2 times)_ * submission command failed (exit code = -15) (stdout:) (stderr:-killed by signal 15-) N/A (jobId = [...]) _(219 times)_ * submission command failed (exit code = 1) (stdout:) (stderr:qsub: Invalid credential-) N/A (jobId = [...]) _(13 times)_ * Cannot take token _(201 times 4.37%)_ * Cannot move OSB _(1 time 0.02%)_ * Cannot move ISB _(5 times 0.11%)_ * Transfer to CREAM failed _(19 times 0.41%)_ * FaultCause=[The problem seems to be related to glexec which reported: java.io.IOException: Too many open files]" _(10 times)_ * CREAM Register raised std::exception Connection to service [https://cert-xx.cnaf.infn.it:8443/ce-cream/services/CREAM2] failed: _(9 times)_ * lsf_reason=32512 _(3505 times 76.22%)_ * Proxy is expired _(1 time 0.02%)_ * lsf_reason=306 _(1 time 0.02%)_ * Jobs not finished: | *Schedul* | *Running* | *Tot.* | *Ce Name* | | 0 | 6 | 6 | cert-04.cnaf.infn.it | | 6 | 349 | 355 | cream-34.pd.infn.it | | 0 | 3 | 3 | cream-26.pd.infn.it | | 0 | 2 | 2 | cream-25.pd.infn.it | | 5 | 332 | 337 | cream-28.pd.infn.it | | 0 | 1 | 1 | cream-27.pd.infn.it | | 0 | 2 | 2 | cream-22.pd.infn.it | | 0 | 5 | 5 | cream-04.pd.infn.it | | 0 | 1 | 1 | cream-23.pd.infn.it | | 0 | 2 | 2 | cert-07.cnaf.infn.it | | 1 | 0 | 1 | cert-13.cnaf.infn.it | | 6 | 334 | 340 | cream-29.pd.infn.it | | 9 | 327 | 336 | cream-33.pd.infn.it | | 0 | 5 | 5 | cert-08.cnaf.infn.it | | 0 | 2 | 2 | cert-05.cnaf.infn.it | | 6 | 0 | 6 | cert-06.cnaf.infn.it | | 6 | 307 | 313 | cream-31.pd.infn.it | | 4 | 296 | 300 | cream-32.pd.infn.it | | 8 | 337 | 345 | cream-30.pd.infn.it | | *51* | *2311* | *2362* | *Totals* | ---+++ BUGS: * CREAM * [[https://savannah.cern.ch/bugs/index.php?46024][#46024]]: Sometimes the AbstractJobExecutor throws the exception "Too many open files" * BLAH * [[https://savannah.cern.ch/bugs/index.php?45718][#45718]]: Some check on log lines should be added on BLParser code * [[https://savannah.cern.ch/bugs/index.php?45983][#45983]]: BLAH can leave children processes behind. ---++ 1) Test starts on Wed Jan 7 at 16:01:32 CET 2009 (WMS: devel18) Description: * 7200 collections each of 40 jobs * One collection every 60 seconds * One user * Used the CEs of testbedB (PD+CNAF) plus cream-12.pd.infn.it * Used automatic-delegation and proxy renewal service * Proxy has 5 hours of lifetime (and it is renewed every 4 hours) * The job is a "sleep 313" * Resubmission is able *%PINK%Test stopped on Monday Jan 12 for a _serialization error_ on ICE%ENDCOLOR%* ---+++ Results taken on Mon Jan 12 at 12:52:56 CET 2009 * Collections correctly submitted: 3733 (149320 jobs) * DONE OK: %GREEN%144004 (96.44%) %ENDCOLOR% * ABORTED: %RED%446 (0.3%) %ENDCOLOR% * Not finished: %ORANGE%4870 (3.26%) %ENDCOLOR% * Errors found: * Transfer to CREAM failed due to exception: * FaultCause=[org.glite.ce.common.db.DatabaseException: Rollback executed due to: Deadlock found when trying to get lock; try restarting transaction]" * Authentication error: Unable to open the file [...]: No such file or directory * Connection to service [...] failed: * FaultCause=[User [...] not authorized for operation JobRegister] * FaultCause=[The problem seems to be related to glexec which reported: java.io.IOException: Too many open files]" * FaultCause=[org.glite.ce.common.db.DatabaseException: Server connection failure during transaction. Due to underlying exception: 'java.net.SocketException: Too many open files'. * FaultCause=[java.net.UnknownHostException: cream-31.pd.infn.it: cream-31.pd.infn.it]" * CREAM Start raised exception Received NULL fault; the error is due to another cause: FaultString=[Client fault] - FaultCode=[SOAP-ENV:Client] - FaultSubCode=[SOAP-ENV:Client] * Failed to get lease_id for job [...] Exception is Lease renew operation FAILED for lease ID [...] Exception is Connection to service [https://cream-29.pd.infn.it:8443/ce-cream/services/CREAM2] failed: * CREAM Start failed due to error MethodName=[JOB_START] Timestamp=[Wed 07 Jan 2009 22:10:43] ErrorCode=[2] Description=[the job has a status not compatible with the JOB_START command!] FaultCause=[N/A] * BLAH error: * submission command failed (exit code = -15) (stdout:) (stderr:/opt/glite/etc/blah.config: line 54: syntax error near unexpected token `('-/opt/glite/etc/blah.config: line 54: `//Added for test by Enrico Fattibene (07/01/2009)'--killed by signal 15-) N/A (jobId = CREAM251333253) * submission command failed (exit code = 120) (stdout:) (stderr:glexec policy violation: see glexec log for more details-) N/A (jobId = CREAM550710004) * submission command failed (exit code = 1) (stdout:) (stderr:Cannot resolve default server host 'cream-28.pd.infn.it' - check server_name file.-qsub: cannot connect to server cream-28.pd.infn.it (errno=15008)-) N/A (jobId = CREAM027575485) * submission command failed (exit code = -15) (stdout:) (stderr:-killed by signal 15-) N/A (jobId = CREAM752590056) * no jobId in submission script's output (stdout:) (stderr:) N/A (jobId = CREAM988027857) * DELEGATION_PROXY_CERT_SANDBOX_PATH not defined! * Cannot move ISB [...] The proxy credential [...] expired 0 minutes ago. * Proxy is expired; Proxy expired: job killed Terminated Master process killed * lsf_reason=32512 * Lease expired * The job cannot be submitted because the blparser service is not alive ---+++ BUGS: * CREAM * [[https://savannah.cern.ch/bugs/index.php?45914][#45914]]: glexec and proxy rotation * [[https://savannah.cern.ch/bugs/index.php?45913][#45913]]: Proxy renewal not done for CREAM jobs not yet in IDLE status * [[https://savannah.cern.ch/bugs/index.php?45736][#45736]]: Problems in case of resubmissions in the same CREAM CE * [[https://savannah.cern.ch/bugs/index.php?45437][#45437]]: Sometimes the jobPurger throws the exception "Too many open files" * BLAH * [[https://savannah.cern.ch/bugs/index.php?45718][#45718]]: Some check on log lines should be added on BLParser code * [[https://savannah.cern.ch/bugs/index.php?45717][#45717]]: BLParserPBS should consider log lines like "unable to run job" -- Main.AlessioGianelle - 08 Jan 2009
Attachments
Attachments
Topic attachments
I
Attachment
Action
Size
Date
Who
Comment
png
ice3.png
manage
4.5 K
2009-01-20 - 16:16
AlessioGianelle
Test 3 Ice submission rate
png
ice4.png
manage
6.4 K
2009-01-21 - 10:15
AlessioGianelle
Test4 Ice submission rate
Edit
|
Attach
|
PDF
|
H
istory
:
r75
|
r24
<
r23
<
r22
<
r21
|
B
acklinks
|
V
iew topic
|
More topic actions...
Topic revision: r22 - 2009-01-21
-
AlessioGianelle
Home
Site map
CEMon web
CREAM web
Cloud web
Cyclops web
DGAS web
EgeeJra1It web
Gows web
GridOversight web
IGIPortal web
IGIRelease web
MPI web
Main web
MarcheCloud web
MarcheCloudPilotaCNAF web
Middleware web
Operations web
Sandbox web
Security web
SiteAdminCorner web
TWiki web
Training web
UserSupport web
VOMS web
WMS web
WMSMonitor web
WeNMR web
EgeeJra1It Web
Create New Topic
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
P
View
Raw View
Print version
Find backlinks
History
More topic actions
Edit
Raw edit
Attach file or image
Edit topic preference settings
Set new parent
More topic actions
Account
Log In
Edit
Attach
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback