TESTS
Modified mpirun: Executing command: /home/dteam029/globus-tmp.griditwn03.7486.0/https_3a_2f_2fdevel17.cnaf.infn.it_3a9000_2fOsslm3cw4T7lgR09qJTR4g/cpi
Process 0 of 1 on griditwn03.na.infn.it
pi is approximately 3.1415926544231341, Error is 0.0000000008333410
wall clock time = 10.001266
- Submission of 270 collections of 100 jobs each (10 collections every 30 minutes), using 1 user and a fuzzy rank (used 90 lcg CEs):
- Success > 99.99% OK
- Cancelled about 1800 jobs due to a problem with the CEs at in2p3.fr
Check bugs:
- BUG #13494
: FIXED
- checked by Laurence Field and ARC developers
- BUG #23443
: FIXED
- Required documents are not put into the glite doc template in edms
- BUG #24690
: NOT COMPLETELY FIXED
- The message error that you could find in the wmproxy log (also with level 5) is: edg_wll_JobStat GSSAPI Error
- In any case now there is a dedicated cron script to renew host-proxy (e.g. it is not included in the cron-purger script)
- BUG #26885
: FIXED
- Checked with two subsequent submissions of 5 collections made of 50 nodes each. ICE does not leave any job with status UNKNOWN behind in the cache
- BUG #27215
: FIXED (for a LCG-CE); NOT fixed for a CREAM-CE
- Set the parameter MaxOutputSandboxSize in the WorkloadManager section of the configuration file /opt/glite/etc/glite_wms.conf on the WMS to 100 and restart the workload manager.
- Submit a jdl like this:
[
Type = "Job";
Executable = "27215_exe.sh";
Arguments = "70";
StdOutput = "test.out";
StdError = "test.err";
Environment = {"GLITE_LOCAL_MAX_OSB_SIZE=35"};
InputSandbox = {"27215_exe.sh"};
OutputSandbox = {"test.err","test.out","out2", "out1"};
usertags = [ bug = "27215" ];
]
where 27215_exe.sh contains
#!/bin/sh
MAX=$1
i=0
while [ $i -lt $MAX ]; do
echo -n "1" >> out1
echo -n "2" >> out2
i=$[$i + 1]
done
- When Done retrieving the output files, this should be the result of an ls -l of the output dir:
-rw-rw-r-- 1 ale ale 30 Jul 8 16:02 out1.tail
-rw-rw-r-- 1 ale ale 70 Jul 8 16:02 out2
-rw-rw-r-- 1 ale ale 0 Jul 8 16:02 test.err
-rw-rw-r-- 1 ale ale 0 Jul 8 16:02 test.out
- BUG #27797
: FIXED
- Submit a jdl like this one:
[
JobType = "parametric";
Executable = "/usr/bin/env";
Environment = {"MYPATH_PARAM_=$PATH:/bin:/usr/bin:$HOME"};
StdOutput = "echo_PARAM_.out";
StdError = "echo_PARAM_.err";
OutputSandbox = {"echo_PARAM_.out","echo_PARAM_.err"};
Parameters = {test, 2};
]
- The generated jdl should contains:
[
requirements = other.GlueCEStateStatus == "Production";
nodes = [ dependencies = { };
Node_test = [ ... ];
Node_2 = [ ... ];
[...]
]
- BUG #27899
: FIXED
- Edit the configuration file in
/opt/glite/etc//glite_wmsclient.conf
by changing the virtualorganisation
attribute in the JdlDefaultAttributes section to a different from the one used to generate the user proxy, as in the following:
- Submit a job and check the generated .jdl has the right virtualorganisation defined, that is the same as the one used to generated the user proxy
- BUG #28235
: FIXED
- Change the jdl setting the name of an existing CE (in the requirements)
- At the end of the jobs with this command:
glite-wms-job-logging-info -v 2 "jobid" | grep -A 2 Match | grep Dest
you should see 3 times the name of the previously choosen CE. (The job must be Aborted with reason: hit job shallow retry count (2))
- BUG #28249
: Hopefully fixed
- bug posted by the developer
- BUG #28498
: FIXED
- compilation error with gcc-4.x
- BUG #28642
: FIXED
- Submit this jdl:
[
Executable = "/usr/bin/env" ;
Stdoutput = "env.out" ;
StdError = "env.err" ;
shallowretrycount = 2;
InputSandbox = { "data/input.txt" };
OutputSandbox = { "env.out" ,"env.err", "input.txt" } ;
Environment={"LD_LIBRARY_PATH=."};
usertags = [ bug = "28642" ];
]
- Get the output of the job. In the output directory you should find the file input.txt, and the LD_LIBRARY_PATH should be set to "." into the file env.out.
- BUG #28657
: FIXED
- Stop ICE: /opt/glite/etc/glite-wms-ice stop
- Corrupt ICE database, e.g. by doing the following:
For each file (all but *proxy*) in /var/glite/ice/persist_dir do:
cat "pippo" > "file"
done
- Start ICE:
/opt/glite/etc/glite-wms-ice start
- In the ICE log file you should see something like:
2008-07-29 12:44:00,537 FATAL - jobCache::jobCache() - Failed to
initialize the jobDbManager object. Reason is: Db::open: Invalid argument
- BUG #29538
: Hopefully fixed
- bug posted by the developer
- BUG #30289
: FIXED
- Fixed by not using 'clog'
- BUG #30308
: FIXED
- Submit this jdl:
[
requirements = ( other.GlueCEStateStatus == "Production" ) && Member("MPICH",other.GlueHostApplicationSoftwareRunTimeEnvironment) && ( other.GlueCEInfoTotalCPUs >= 4 ) && ( other.GlueCEInfoLRMSType == "torque" || RegExp("pbs",other.GlueCEInfoLRMSType) );
Type = "Job";
NodeNumber = 4;
Executable = "30308_exe.sh";
Arguments = "cpi 4";
StdOutput = "test.out";
StdError = "test.err";
InputSandbox = {"30308_exe.sh", "exe/cpi"};
OutputSandbox = {"test.err","test.out","executable.out"};
usertags = [ bug = "30308" ];
]
Where the 30308_exe.sh should be:
#!/bin/sh
# The first parameter is the binary to be executed
EXE=$1
# The second parameter is the number of CPU's to be reserved for parallel execution
CPU_NEEDED=$2
chmod 777 $EXE
# prints the list of files in the working directory
echo "List files on the working directory:"
ls -alR `pwd`
# execute the user job
mpirun -np $CPU_NEEDED -machinefile $PBS_NODEFILE `pwd`/$EXE >& executable.out
- When DONE retrieve the output and check that the directory
.mpi
should not be listed in the test.out output file.
- BUG #30816
: FIXED
- Already fixed and working on the production wms using patch #1491
- BUG #30896
: FIXED
- Set
maxInputSandboxFiles = 2
; in the WorkloadManagerProxy section of the configuration file on the WMS, and restart the wmproxy.
- Submit a job with more than 2 files listed in the
InputSandbox
parameter
- Check if the job is immediately set as Aborted and if the reason of the status is:
The Operation is not allowed: The maximum number of input sandbox files is reached
- Set
maxOutputSandboxFiles = 2
; in the WorkloadManagerProxy section of the configuration file on the WMS and restart the wmproxy.
- Submit a job with more than 2 files listed in the I=OutputSandbox= parameter
- Check if the job is immediately set as Aborted and if the reason of the status is:
The Operation is not allowed: The maximum number of output sandbox files is reached
- BUG #31026
: FIXED
- Simply check the
/opt/glite/etc/templates/template.sh
file on a WMS
- BUG #31278
: FIXED
- Using the command
glite-wms-job-info --jdl "jobid" | grep -i requirements
check if the expression RegExp(".*sdj$",other.GlueCEniqueID);
is present (the exact expression should be found in the configuration file on the WMS, section: WorkloadManagerProxy, parameter: SDJRequirements)
- Setting
ShortDeadlineJob=false;
in the jdl, the previous command should contain the expression !RegExp(".*sdj$",other.GlueCEUniqueID)
- BUG #32078
: FIXED
- Set on the WMS conf file:
II_Contact = "lcg-bdii.cern.ch";
- Do a list-match using this jdl:
[
Requirements = RegExp(".manchester.ac.uk:2119.*",other.GlueCEUniqueID) && anyMatch(other.storage.CloseSEs,target.GlueSEStatus == "unset");
Executable = "/bin/ls";
prologue = "/bin/false";
]
the output should be:
- ce01.tier2.hep.manchester.ac.uk:2119/jobmanager-lcgpbs-dteam
- BUG #32345
: FIXED
- Reproduced the problem by inserting a 500 sec sleep in the dirmanager and killing it by hand while unzipping the ISB. The job stays in status 'waiting' and is not forwarded to the WM.
- BUG #32528
: FIXED
- Set a very low timeout for the BDII on the WMS conf file:
II_Timeout = 3;
- Now setting on the WMS conf file:
IsmIILDAPSearchAsync = false;
- You should see in the log file of the workload_manager (if yuo use a populate BDII):
[Warning] fetch_bdii_ce_info(ldap-utils.cpp:640): Timed out
[Warning] fetch_bdii_se_info(ldap-utils.cpp:308): Timed out
[Debug] do_purchase(ism-ii-purchaser.cpp:176): BDII fetching completed in 4 seconds
[Info] do_purchase(ism-ii-purchaser.cpp:193): Total VO_Views entries in ISM : 0
[Info] do_purchase(ism-ii-purchaser.cpp:194): Total SE entries in ISM : 0
- Setting:
IsmIILDAPSearchAsync = true:
you should obtain more (>0) VO_Views entries (e.g.):
[Debug] fetch_bdii_ce_info(ldap-utils-asynch.cpp:628): #1652 LDAP entries received in 5 seconds
[Debug] fetch_bdii_ce_info(ldap-utils-asynch.cpp:781): ClassAd reppresentation built in 0 seconds
[Debug] fetch_bdii_se_info(ldap-utils-asynch.cpp:444): #2381 LDAP entries received in 5 seconds
[Debug] fetch_bdii_se_info(ldap-utils-asynch.cpp:504): ClassAd reppresentation built in 0 seconds
[Debug] do_purchase(ism-ii-purchaser.cpp:176): BDII fetching completed in 10 seconds
[Info] do_purchase(ism-ii-purchaser.cpp:193): Total VO_Views entries in ISM : 53
[Info] do_purchase(ism-ii-purchaser.cpp:194): Total SE entries in ISM : 61
- BUG #32980
: FIXED
- Submit a jdl
- Look into the SandBox dir of the job (on the WMS) until you see the
Maradona
file
- Put the condor job (equivalent to your job previously submitted) on hold, this should trigger a resubmission
- When the job has been resubmitted check if the old
Maradona
file has been removed
- BUG #33026
: FIXED
- Set the II_Timeout parameter in the NetworkServr section of the glite_wms.conf file on the WMS to a very low value, as for ex.:
II_Timeout = 2;
- Rre-start the WM and check the
$GLITE_WMS_LOCATION_VAR/workload_manager/ismdump.fl
does not get emptied
- Perform some job-list-match operation checking that it gets some match results
- BUG #33103
: FIXED:
- Add this parameter to section WorkloadManager of the glite_wms.conf configuration file (using for example vo "cms" as filter):
IsmIILDAPCEFilterExt = "(|(GlueCEAccessControlBaseRule=VO:cms)(GlueCEAccessControlBaseRule=VOMS:/cms/*))"
- Restart the WM
- Doing a list-match using a voms proxy of a different VO (e.g. dteam) you should obtain "no resource available".
- BUG #33378
: FIXED
- Removed if present the directory
$GLITE_WMS_LOCATION_VAR/workload_manager/jobdir
on the WMS
- Restart the wm and check id the previous directory is recreated.
- BUG #34508
: FIXED
- Stop the WM on the WMS.
- Submit a collection
- Restart the WM
- Check if the status of the collection changes to Running
- BUG #35156
: FIXED
- Check if the proxy file name is hardcoded on
$GLITE_WMS_LOCATION/sbin/glite-wms-purgeStorage.sh
- BUG #35878
: FIXED
- compilation error with gcc-4.x
- BUG #36341
: Hopefully fixed
- bug posted by the developer
- BUG #36466
: Hopefully fixed
- bug posted by the developer
- BUG #36496
: FIXED
- Restart wmproxy: /opt/glite/etc/init.d/glite-wms-wmproxy restart
- Try to issue some commands (e.g. glite-wms-job-list-match, glite-wms-job-submit, glite-wms-job-delegate-proxy, etc...) towards that WMS They should succeed considering any proxy
- BUG #36536
: FIXED
- Submitted a normal job
- Waited until finished successfully
- Checked the job record is in the LBProxy mysql DB (e.g.:
mysql# select * from jobs where jobid like '%hLrG4YYebvYB0xsrPO4q8A%';
where https://devel17.cnaf.infn.it:9000/hLrG4YYebvYB0xsrPO4q8A
is the jobid)
- Retrieved the output via 'glite-wms-job-output'
- Checked the job record is no more in the LBProxy mysql DB (e.g.: the previous query should return:
Empty set
)
- BUG #36870
: FIXED
- Fixed by removing the spec file
- BUG #36876
: Hopefully fixed
- bug posted by the developer
- BUG #37756
: NOT COMPLETELY FIXED
- Tested using a short proxy to submit a longer job and ICE does not resubmit it, but afterwards the status is not updated to Done by ICE, due to another bug #39807
- BUG #37916
: Hopefully fixed
- bug posted by the developer
- BUG #38359
: FIXED
- Set the parameter
MaxOutputSandboxSize
in the WorkloadManager section of the configuration file /opt/glite/etc/glite_wms.conf on the WMS to 100 and restart the workload manager.
- Submit this jdl:
[
Type = "Job";
Executable = "38359_exe.sh";
Arguments = "50";
StdOutput = "test.out";
StdError = "test.err";
InputSandbox = {"38359_exe.sh"};
OutputSandbox = {"test.err","test.out","out3", "out1", "out4", "out2"};
usertags = [ bug = "38359" ];
]
where 38359_exe.sh is:
#!/bin/sh
MAX=$1
i=0
while [ $i -lt $MAX ]; do
echo -n "1" >> out1
echo -n "2" >> out2
echo -n "3" >> out3
echo -n "4" >> out4
i=$[$i + 1]
done
i=200
while [ $i -lt 100 ]; do
echo -n "1" >> out1
echo -n "2" >> out2
echo -n "3" >> out3
echo -n "4" >> out4
i=$[$i + 1]
done
- When Done retrieve the output files, this should be the result of an ls -l of the output dir:
-rw-rw-r-- 1 ale ale 50 Jul 8 12:06 out1
-rw-rw-r-- 1 ale ale 0 Jul 8 12:06 out2.tail
-rw-rw-r-- 1 ale ale 50 Jul 8 12:06 out3
-rw-rw-r-- 1 ale ale 0 Jul 8 12:06 out4.tail
-rw-rw-r-- 1 ale ale 0 Jul 8 12:06 test.err
-rw-rw-r-- 1 ale ale 0 Jul 8 12:06 test.out
- BUG #38366
: FIXED
- Log on the WMS. Stop the workload manager. Put in the directory
$GLITE_WMS_LOCATION_VAR/workload_manager/jobdir/new/
this list-match request:
[root@devel19 glite]# cat /var/glite/workload_manager/jobdir/tmp/20080625T133135.906497_3085874880
[ arguments = [ ad = [ requirements = ( other.GlueCEStateStatus =="Production" || other.GlueCEStateStatus == "CREAMPreCertTests" ) &&
!RegExp(".*sdj$",other.GlueCEUniqueID); RetryCount = 3; Arguments = "/tmp"; MyProxyServer = "myproxy.cnaf.infn.it"; AllowZippedISB = true; JobType =
"normal"; InputSandboxDestFileName = { "pippo","pluto" }; SignificantAttributes = { "Requirements","Rank" }; FuzzyRank = true;
Executable = "/bin/ls"; CertificateSubject = "/C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Alessio Gianelle"; X509UserProxy =
"/tmp/user.proxy.6056.20080625153135905"; Stdoutput = "ls.out"; VOMS_FQAN = "/dteam/Role=NULL/Capability=NULL"; OutputSandbox = { "ls.out" };
VirtualOrganisation = "dteam"; usertags = [ exe = "ls" ]; rank =-other.GlueCEStateEstimatedResponseTime; Type = "job"; ShallowRetryCount = 3;
InputSandbox = {"protocol://address/input/pippo","protocol://address/input/pluto" }; Fuzzyparameter = 1.000000000000000E-01 ]; include_brokerinfo = false; file =
"/tmp/6056.20080625153135905"; number_of_results = -1 ]; command = "match"; version = "1.0.0" ]
- Start the workload manager and look if it works.
- BUG #38828
: FIXED
- Procede as in the previous bug: #38816
- BUG #39215
: FIXED
- You need to check the code of $GLITE_WMS_LOCATION/sbin/glite-wms-purgeStorage.sh as specified in the bug
- BUG #39501
: FIXED
- Submit a job thorugh ICE (use this requirements:
Requirements = RegExp("cream",other.GlueCEUniqueID)
;
- Remove the job directory from the WMS
- Check in the log if ICE figure out that the proxy is disappeared.
--
AlessioGianelle - 27 Jun 2008