Summary
- Product: BLAH 1.16.4
- Release Task: Task #25000
- ETICS Subsystem Configuration Name: emi-cream-ce_R_1_13_7_1
- VCS Tag: emi-blahp_R_1_16_4_1
- EMI Major Release: EMI 1 (Kebnekaise)
- Platform: SL5 epel
- Author: Sara Bertocco and Alessio Gianelle
- Testing report:
- Certification report:
- Outcome: In certification...
Deployment tests
Clean Installation
Upgrade Installation
LSF CE
PBS CE
Unit Tests
Not Available. The plan is to provide some unit tests starting with EMI-2.
System tests
Functionality tests
Test submission
- Test result for LSF is available here PASSED
- Test result for PBS is available here PASSED
BLParser test
Old BLParser
- Job which finishes normally
- LSF PASSED
[ale@cream-12 UI]$ glite-ce-job-submit -a -r cream-22.pd.infn.it:8443/cream-lsf-cert cream.jdl
https://cream-22.pd.infn.it:8443/CREAM053260410
[ale@cream-12 UI]$ glite-ce-job-status https://cream-22.pd.infn.it:8443/CREAM053260410
****** JobID=[https://cream-22.pd.infn.it:8443/CREAM053260410]
Status = [DONE-OK]
ExitCode = [0]
[root@cream-22 ~]# grep 053260410 /var/log/cream/glite-lsfparser.log
2011-12-20 13:47:29 Adding: ID:551886 Type:BLAHPNAME Value:"cre22_053260410"
2011-12-20 13:47:29 Sent for Cream:[BatchJobId="551886"; JobStatus=1; BlahJobName="cre22_053260410"; ClientJobId="053260410"; ChangeTime="2011-12-20 13:47:28";]
2011-12-20 13:47:33 Sent for Cream:[BatchJobId="551886"; JobStatus=2; BlahJobName="cre22_053260410"; ClientJobId="053260410"; WorkerNode="prod-wn-002"; ChangeTime="2011-12-20 13:47:33";]
2011-12-20 13:49:14 Sent for Cream:[BatchJobId="551886"; JobStatus=4; BlahJobName="cre22_053260410"; ClientJobId="053260410"; WorkerNode="prod-wn-002"; Reason="lsf_reason=0"; ChangeTime="2011-12-20 13:49:14";]
- PBS/Torque PASSED
[bertocco@cream-12 mie_certificazioni]$ glite-ce-job-submit -a -r cream-41.pd.infn.it:8443/cream-pbs-cert test1.jdl
[bertocco@cream-12 mie_certificazioni]$ glite-ce-job-status https://cream-41.pd.infn.it:8443/CREAM228600831
****** JobID=[https://cream-41.pd.infn.it:8443/CREAM228600831]
Status = [DONE-OK]
ExitCode = [0]
[root@cream-41 ~]# grep 228600831 /var/log/cream/glite-pbsparser.log
2011-12-20 15:45:34 Adding: ID:41080 Type:BLAHPNAME Value:cre41_228600831
2011-12-20 15:45:34 Sent for Cream:[BatchJobId="41080"; JobStatus=1; BlahJobName="cre41_228600831"; ClientJobId="228600831"; ChangeTime="2011-12-20 15:45:34";]
2011-12-20 15:45:35 Sent for Cream:[BatchJobId="41080"; JobStatus=2; BlahJobName="cre41_228600831"; ClientJobId="228600831"; ChangeTime="2011-12-20 15:45:35";]
2011-12-20 15:47:16 Sent for Cream:[BatchJobId="41080"; JobStatus=4; BlahJobName="cre41_228600831"; ClientJobId="228600831"; Reason="pbs_reason=0"; ChangeTime="2011-12-20 15:47:15";]
- Job which is cancelled
- LSF PASSED
[ale@cream-12 UI]$ glite-ce-job-submit -a -r cream-22.pd.infn.it:8443/cream-lsf-cert cream.jdl
https://cream-22.pd.infn.it:8443/CREAM142681575
[ale@cream-12 UI]$ glite-ce-job-cancel https://cream-22.pd.infn.it:8443/CREAM142681575
Are you sure you want to cancel specified job(s) [y/n]: y
[ale@cream-12 UI]$ glite-ce-job-status https://cream-22.pd.infn.it:8443/CREAM142681575
****** JobID=[https://cream-22.pd.infn.it:8443/CREAM142681575]
Status = [CANCELLED]
ExitCode = []
Description = [Cancelled by user]
[root@cream-22 ~]# grep 142681575 /var/log/cream/glite-lsfparser.log
2011-12-20 13:51:28 Adding: ID:551887 Type:BLAHPNAME Value:"cre22_142681575"
2011-12-20 13:51:28 Sent for Cream:[BatchJobId="551887"; JobStatus=1; BlahJobName="cre22_142681575"; ClientJobId="142681575"; ChangeTime="2011-12-20 13:51:27";]
2011-12-20 13:51:29 Sent for Cream:[BatchJobId="551887"; JobStatus=2; BlahJobName="cre22_142681575"; ClientJobId="142681575"; WorkerNode="prod-wn-003"; ChangeTime="2011-12-20 13:51:29";]
2011-12-20 13:51:55 Sent for Cream:[BatchJobId="551887"; JobStatus=3; BlahJobName="cre22_142681575"; ClientJobId="142681575"; WorkerNode="prod-wn-003"; ChangeTime="2011-12-20 13:51:54";]
- PBS/Torque PASSED
[bertocco@cream-12 mie_certificazioni]$ glite-ce-job-submit -a -r cream-41.pd.infn.it:8443/cream-pbs-cert test1.jdl
https://cream-41.pd.infn.it:8443/CREAM097109154
[bertocco@cream-12 mie_certificazioni]$ glite-ce-job-status https://cream-41.pd.infn.it:8443/CREAM097109154
****** JobID=[https://cream-41.pd.infn.it:8443/CREAM097109154]
Status = [CANCELLED]
ExitCode = []
Description = [Cancelled by user]
[root@cream-41 ~]# grep 097109154 /var/log/cream/glite-pbsparser.log
2011-12-20 16:04:01 Adding: ID:41082 Type:BLAHPNAME Value:cre41_097109154
2011-12-20 16:04:01 Sent for Cream:[BatchJobId="41082"; JobStatus=1; BlahJobName="cre41_097109154"; ClientJobId="097109154"; ChangeTime="2011-12-20 16:04:00";]
2011-12-20 16:04:01 Sent for Cream:[BatchJobId="41082"; JobStatus=2; BlahJobName="cre41_097109154"; ClientJobId="097109154"; ChangeTime="2011-12-20 16:04:01";]
2011-12-20 16:04:29 Sent for Cream:[BatchJobId="41082"; JobStatus=3; BlahJobName="cre41_097109154"; ClientJobId="097109154"; ChangeTime="2011-12-20 16:04:28";]
2011-12-20 16:04:29 Sent for Cream:[BatchJobId="41082"; JobStatus=4; BlahJobName="cre41_097109154"; ClientJobId="097109154"; Reason="pbs_reason=271"; ExitReason="Killed by Resource Management System"; ChangeTime="2011-12-20 16:04:28";]
- Job which is suspended and then resumed
New BLParser
- Job which finishes normally
- LSF PASSED
[ale@cream-12 UI]$ glite-ce-job-submit -a -r cream-20.pd.infn.it:8443/cream-lsf-cert cream.jdl
https://cream-20.pd.infn.it:8443/CREAM587966276
[ale@cream-12 UI]$ glite-ce-job-status https://cream-20.pd.infn.it:8443/CREAM587966276
****** JobID=[https://cream-20.pd.infn.it:8443/CREAM587966276]
Status = [DONE-OK]
ExitCode = [0]
[root@cream-20 ~]# grep 587966276 /var/log/cream/glite-ce-bnotifier.log
2011-12-20 15:17:00 Sent for Cream:[BatchJobId="551901"; JobStatus=2; ChangeTime="2011-12-20 15:16:28"; WorkerNode="prod-wn-003"; ClientJobId="587966276"; BlahJobName="cre20_587966276";]
2011-12-20 15:18:41 Sent for Cream:[BatchJobId="551901"; JobStatus=4; ChangeTime="2011-12-20 15:18:09"; WorkerNode="prod-wn-003"; JwExitCode=0; Reason="reason=0"; ClientJobId="587966276"; BlahJobName="cre20_587966276";]
- PBS/Torque PASSED
[bertocco@cream-12 mie_certificazioni]$ glite-ce-job-submit -a -r cream-40.pd.infn.it:8443/cream-pbs-cert test1.jdl
https://cream-40.pd.infn.it:8443/CREAM509451203
[bertocco@cream-12 mie_certificazioni]$ glite-ce-job-status https://cream-40.pd.infn.it:8443/CREAM509451203
****** JobID=[https://cream-40.pd.infn.it:8443/CREAM509451203]
Status = [DONE-OK]
ExitCode = [0]
[root@cream-40 ~]# grep 509451203 /var/log/cream/glite-ce-bnotifier.log
2011-12-20 16:42:32 Sent for Cream:[BatchJobId="41349.cream-40.pd.infn.it"; JobStatus=2; ChangeTime="2011-12-20 16:41:52"; WorkerNode="cream-wn-040.pn.pd.infn.it"; ClientJobId="509451203"; BlahJobName="cre40_509451203";]
2011-12-20 16:44:13 Sent for Cream:[BatchJobId="41349.cream-40.pd.infn.it"; JobStatus=4; ChangeTime="2011-12-20 16:43:32"; WorkerNode="cream-wn-040.pn.pd.infn.it"; JwExitCode=0; Reason="reason=0"; ClientJobId="509451203"; BlahJobName="cre40_509451203";]
- Job which is cancelled
- LSF PASSED
[ale@cream-12 UI]$ glite-ce-job-submit -a -r cream-20.pd.infn.it:8443/cream-lsf-cert cream.jdl
https://cream-20.pd.infn.it:8443/CREAM260205441
[ale@cream-12 UI]$ glite-ce-job-cancel https://cream-20.pd.infn.it:8443/CREAM260205441
Are you sure you want to cancel specified job(s) [y/n]: y
[ale@cream-12 UI]$ glite-ce-job-status https://cream-20.pd.infn.it:8443/CREAM260205441
****** JobID=[https://cream-20.pd.infn.it:8443/CREAM260205441]
Status = [CANCELLED]
ExitCode = []
Description = [Cancelled by user]
[root@cream-20 ~]# grep 260205441 /var/log/cream/glite-ce-bnotifier.log
2011-12-20 15:20:26 Sent for Cream:[BatchJobId="551902"; JobStatus=3; ChangeTime="2011-12-20 15:20:02"; JwExitCode=-999; Reason="reason=-999"; ClientJobId="260205441"; BlahJobName="cre20_260205441";]
- PBS/Torque PASSED
[bertocco@cream-12 mie_certificazioni]$ glite-ce-job-submit -a -r cream-40.pd.infn.it:8443/cream-pbs-cert test1.jdl
https://cream-40.pd.infn.it:8443/CREAM887070151
[bertocco@cream-12 mie_certificazioni]$ glite-ce-job-cancel https://cream-40.pd.infn.it:8443/CREAM887070151
Are you sure you want to cancel specified job(s) [y/n]: y
[bertocco@cream-12 mie_certificazioni]$ glite-ce-job-status https://cream-40.pd.infn.it:8443/CREAM887070151
****** JobID=[https://cream-40.pd.infn.it:8443/CREAM887070151]
Status = [CANCELLED]
ExitCode = []
Description = [Cancelled by user]
[root@cream-40 ~]# grep 887070151 /var/log/cream/glite-ce-bnotifier.log
2011-12-20 16:58:33 Sent for Cream:[BatchJobId="41350.cream-40.pd.infn.it"; JobStatus=3; ChangeTime="2011-12-20 16:58:20"; JwExitCode=-999; Reason="reason=-999"; ClientJobId="887070151"; BlahJobName="cre40_887070151";]
- Job which is suspended and then resumed
- LSF PASSED
[ale@cream-12 UI]$ glite-ce-job-submit -a -r cream-20.pd.infn.it:8443/cream-lsf-cert cream.jdl
https://cream-20.pd.infn.it:8443/CREAM908720454
[ale@cream-12 UI]$ glite-ce-job-suspend https://cream-20.pd.infn.it:8443/CREAM908720454
Are you sure you want to suspend specified job(s) [y/n]: y
[ale@cream-12 UI]$ glite-ce-job-status https://cream-20.pd.infn.it:8443/CREAM908720454
****** JobID=[https://cream-20.pd.infn.it:8443/CREAM908720454]
Status = [HELD]
[ale@cream-12 UI]$ glite-ce-job-resume https://cream-20.pd.infn.it:8443/CREAM908720454
Are you sure you want to resume specified job(s) [y/n]: y
[ale@cream-12 UI]$ glite-ce-job-status https://cream-20.pd.infn.it:8443/CREAM908720454
****** JobID=[https://cream-20.pd.infn.it:8443/CREAM908720454]
Status = [DONE-OK]
ExitCode = [0]
[root@cream-20 ~]# grep 908720454 /var/log/cream/glite-ce-bnotifier.log
2011-12-20 15:23:27 Sent for Cream:[BatchJobId="551904"; JobStatus=5; ChangeTime="2011-12-20 15:23:05"; ClientJobId="908720454"; BlahJobName="cre20_908720454";]
2011-12-20 15:31:12 Sent for Cream:[BatchJobId="551904"; JobStatus=4; ChangeTime="2011-12-20 15:30:33"; JwExitCode=0; Reason="reason=0"; ClientJobId="908720454"; BlahJobName="cre20_908720454";]
- PBS PASSED
[bertocco@cream-12 mie_certificazioni]$ glite-ce-job-submit -a -r cream-40.pd.infn.it:8443/cream-pbs-cert test1.jdl
[bertocco@cream-12 mie_certificazioni]$ glite-ce-job-status https://cream-40.pd.infn.it:8443/CREAM089011081
****** JobID=[https://cream-40.pd.infn.it:8443/CREAM089011081]
Status = [IDLE]
[bertocco@cream-12 mie_certificazioni]$ glite-ce-job-suspend https://cream-40.pd.infn.it:8443/CREAM089011081
Are you sure you want to suspend specified job(s) [y/n]: y
[bertocco@cream-12 mie_certificazioni]$ glite-ce-job-status https://cream-40.pd.infn.it:8443/CREAM089011081
****** JobID=[https://cream-40.pd.infn.it:8443/CREAM089011081]
Status = [HELD]
[bertocco@cream-12 mie_certificazioni]$ glite-ce-job-resume https://cream-40.pd.infn.it:8443/CREAM089011081
Are you sure you want to resume specified job(s) [y/n]: y
[root@cream-40 ~]# grep 089011081 /var/log/cream/glite-ce-bnotifier.log
[BatchJobId="41400.cream-40.pd.infn.it"; JobStatus=1; ChangeTime="2011-12-20 17:13:10"; ClientJobId="089011081"; BlahJobName="cre40_089011081";]
2011-12-20 17:14:35 Sent for Cream:[BatchJobId="41400.cream-40.pd.infn.it"; JobStatus=5; ChangeTime="2011-12-20 17:14:27"; ClientJobId="089011081"; BlahJobName="cre40_089011081";]
2011-12-20 17:21:17 Sent for Cream:[BatchJobId="41400.cream-40.pd.infn.it"; JobStatus=2; ChangeTime="2011-12-20 17:21:14"; WorkerNode="cream-wn-040.pn.pd.infn.it"; ClientJobId="089011081"; BlahJobName="cre40_089011081";]
2011-12-20 17:23:17 Sent for Cream:[BatchJobId="41400.cream-40.pd.infn.it"; JobStatus=4; ChangeTime="2011-12-20 17:22:55"; WorkerNode="cream-wn-040.pn.pd.infn.it"; JwExitCode=0; Reason="reason=0"; ClientJobId="089011081"; BlahJobName="cre40_089011081";]
Regression tests
Verification attached bugs
Submit 1000 jobs, one every 3 seconds monitoring the Used RSS memory of the
/usr/bin/BUpdaterLSF process:
Submit 1000 jobs, one every 3 seconds monitoring the Used RSS memory of the
/usr/bin/BUpdaterPBS process:
Verification old bugs
Configure /etc/blah.config:
[root@cream-40 ~]# tail -4 /etc/blah.config
# Verify fix for bug #77776
pbs_batch_caching_enabled=yes
batch_command_caching_filter=/usr/bin/runcmd.pl
Where runcmd.pl is:
#!/usr/bin/perl
#---------------------#
# PROGRAM: argv.pl #
#---------------------#
$numArgs = $#ARGV + 1;
open (MYFILE, '>>/tmp/xyz');
foreach $argnum (0 .. $#ARGV) {
print MYFILE "$ARGV[$argnum] ";
}
print MYFILE "\n";
close (MYFILE);
Restart the services and submit 10 jobs to the CE.
[root@cream-40 ~]# cat /tmp/xyz
/usr/bin/qstat -f
/usr/bin/tracejob -p /var/torque -m -l -a -n 2 43450.cream-40.pd.infn.it
/usr/bin/tracejob -p /var/torque -m -l -a -n 2 43451.cream-40.pd.infn.it
/usr/bin/tracejob -p /var/torque -m -l -a -n 2 43452.cream-40.pd.infn.it
/usr/bin/tracejob -p /var/torque -m -l -a -n 2 43453.cream-40.pd.infn.it
/usr/bin/tracejob -p /var/torque -m -l -a -n 2 43454.cream-40.pd.infn.it
/usr/bin/tracejob -p /var/torque -m -l -a -n 2 43455.cream-40.pd.infn.it
/usr/bin/tracejob -p /var/torque -m -l -a -n 2 43456.cream-40.pd.infn.it
/usr/bin/tracejob -p /var/torque -m -l -a -n 2 43457.cream-40.pd.infn.it
/usr/bin/tracejob -p /var/torque -m -l -a -n 2 43458.cream-40.pd.infn.it
/usr/bin/tracejob -p /var/torque -m -l -a -n 2 43459.cream-40.pd.infn.it
/usr/bin/tracejob -p /var/torque -m -l -a -n 2 43460.cream-40.pd.infn.it
/usr/bin/tracejob -p /var/torque -m -l -a -n 2 43461.cream-40.pd.infn.it
/usr/bin/tracejob -p /var/torque -m -l -a -n 2 43462.cream-40.pd.infn.it
/usr/bin/tracejob -p /var/torque -m -l -a -n 2 43463.cream-40.pd.infn.it
/usr/bin/tracejob -p /var/torque -m -l -a -n 2 43464.cream-40.pd.infn.it
/usr/bin/tracejob -p /var/torque -m -l -a -n 2 43465.cream-40.pd.infn.it
/usr/bin/tracejob -p /var/torque -m -l -a -n 2 43466.cream-40.pd.infn.it
/usr/bin/tracejob -p /var/torque -m -l -a -n 2 43467.cream-40.pd.infn.it
/usr/bin/tracejob -p /var/torque -m -l -a -n 2 43468.cream-40.pd.infn.it
/usr/bin/tracejob -p /var/torque -m -l -a -n 2 43469.cream-40.pd.infn.it
/usr/bin/tracejob -p /var/torque -m -l -a -n 2 43470.cream-40.pd.infn.it
/usr/bin/tracejob -p /var/torque -m -l -a -n 2 43471.cream-40.pd.infn.it
/usr/bin/tracejob -p /var/torque -m -l -a -n 2 43472.cream-40.pd.infn.it
/usr/bin/qstat -f
Test PASSED