Tags:
, view all tags

Regression Test Work Plan

BLAH

Bug #75854 Problems related to the growth of the blah registry) Not implemented

Configure a CREAM CE using the new BLparser.

Verify that in /etc/blah.config there is: job_registry_use_mmap=yes (default scenario).

Submit 5000 jobs on a CREAM CE using the following JDL:

[
executable="/bin/sleep";
arguments="100";
]

Monitor the BLAH processed. Verify that each of them doesn't use more than 50 MB.

Bug #77776 (BUpdater should have an option to use cached batch system commands) Not implemented

Add:

lsf_batch_caching_enabled=yes
batch_command_caching_filter=/usr/bin/runcmd.pl
in /etc/blah.config.

Create and fill /usr/bin/runcmd.pl with the following content:

#!/usr/bin/perl
#---------------------#
#  PROGRAM:  argv.pl  #
#---------------------#

$numArgs = $#ARGV + 1;
open (MYFILE, '>>/tmp/xyz');
foreach $argnum (0 .. $#ARGV) {
    print MYFILE "$ARGV[$argnum] ";
}
print MYFILE "\n";
close (MYFILE); 

Submit some jobs. Check that in /tmp/xyz the queries to the batch system are recorded. E.g. for LSF something like that should be reported:

/opt/lsf/7.0/linux2.6-glibc2.3-x86/bin/bjobs
-u
all
-l
/opt/lsf/7.0/linux2.6-glibc2.3-x86/bin/bjobs
-u
all
-l
...

Bug #80805 (BLAH job registry permissions should be improved) Not implemented

Check permissions and ownership under /var/blah. They should be:

/var/blah:
total 12
-rw-r--r-- 1 tomcat tomcat    5 Oct 18 07:32 blah_bnotifier.pid
-rw-r--r-- 1 tomcat tomcat    5 Oct 18 07:32 blah_bupdater.pid
drwxrwx--t 4 tomcat tomcat 4096 Oct 18 07:38 user_blah_job_registry.bjr

/var/blah/user_blah_job_registry.bjr:
total 16
-rw-rw-r-- 1 tomcat tomcat 1712 Oct 18 07:38 registry
-rw-r--r-- 1 tomcat tomcat  260 Oct 18 07:38 registry.by_blah_index
-rw-rw-rw- 1 tomcat tomcat    0 Oct 18 07:38 registry.locktest
drwxrwx-wt 2 tomcat tomcat 4096 Oct 18 07:38 registry.npudir
drwxrwx-wt 2 tomcat tomcat 4096 Oct 18 07:38 registry.proxydir
-rw-rw-r-- 1 tomcat tomcat    0 Oct 18 07:32 registry.subjectlist

/var/blah/user_blah_job_registry.bjr/registry.npudir:
total 0

/var/blah/user_blah_job_registry.bjr/registry.proxydir:
total 0

Bug #81354 (Missing 'Iwd' Attribute when trasferring files with the 'TransferInput' attribute causes thread to loop) Not implemented

Log on a cream ce as user tomcat. Create a proxy of yours and copy it as /tmp/proxy (change the ownership to tomcat.tomcat).

Create the file /home/dteam001/dir1/fstab (you can copy /etc/fstab).

Submit a job directly via blah (in the following change pbs and creamtest2 with the relevant batch system and queue names):

$ /usr/bin/blahpd
$GahpVersion: 1.16.2 Mar 31 2008 INFN\ blahpd\ (poly,new_esc_format) $
BLAH_SET_SUDO_ID dteam001
S Sudo\ mode\ on
blah_job_submit 1 [cmd="/bin/cp";Args="fstab\ fstab.out";TransferInput="/home/dteam001/dir1/fstab";TransferOutput="fstab.out";TransferOutputRemaps="fstab.out=/home/dteam001/dir1/fstab.out";gridtype="pbs";queue="creamtest2";x509userproxy="/tmp/proxy"]
S
results
S 1
1 0 No\ error pbs/20111010/304.cream-38.pd.infn.it

Eventually check the content of /home/dteam001/dir1/ where you see both fstab and fstab.out:

$ ls /home/dteam001/dir1/
fstab  fstab.out

Bug #81824 (yaim-cream-ce should manage the attribute bupdater_loop_interval) Not implemented

Set BUPDATER_LOOP_INTERVAL to 30 in siteinfo.def and reconfigure via yaim. Then verify that in blah.config there is:

bupdater_loop_interval=30

Bug #82281 (blahp.log records should always contain CREAM job ID) Not implement

Submit a job directly to CREAM using CREAM-CLI. Then submit a job to CREAM through the WMS.

In the accounting log file (/var/log/cream/accounting/blahp.log-<date>) in both cases the clientID field should end with the numeric part of the CREAM jobid, e.g.:

"timestamp=2011-10-10 14:37:38" "userDN=/C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Massimo Sgaravatto" "userFQAN=/dteam/Role=NULL/Capability=NULL" "userFQAN=/dteam/NGI_IT/Role=NULL/Capability=NULL" "ceID=cream-38.pd.infn.it:8443/cream-pbs-creamtest2" "jobID=CREAM956286045" "lrmsID=300.cream-38.pd.infn.it" "localUser=18757" "clientID=cre38_956286045"

"timestamp=2011-10-10 14:39:57" "userDN=/C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Massimo Sgaravatto" "userFQAN=/dteam/Role=NULL/Capability=NULL" "userFQAN=/dteam/NGI_IT/Role=NULL/Capability=NULL" "ceID=cream-38.pd.infn.it:8443/cream-pbs-creamtest2" "jobID=https://devel19.cnaf.infn.it:9000/dLvm84LvD7w7QXtLZK4L0A" "lrmsID=302.cream-38.pd.infn.it" "localUser=18757" "clientID=cre38_315532638"

Bug #82297 (blahp.log rotation period is too short) Not implemented

Check that in /etc/logrotate.d/blahp-logrotate rotate is equal to 365:

# cat /etc/logrotate.d/blahp-logrotate
/var/log/cream/accounting/blahp.log {
        copytruncate
        rotate 365
        size = 10M
        missingok
        nomail
}

Bug #83275 (Problem in updater with very short jobs that can cause no notification to cream) Not implemented

Configure a CREAM CE using the new blparser. Submit a job using the following JDL:

[
executable="/bin/echo";
arguments="ciao";
]

Check in the bnotifier log file (/var/log/cream/glite-ce-bnotifier.log that at least a notification is sent for this job, e.g.:

2011-11-04 14:11:11 Sent for Cream:[BatchJobId="927.cream-38.pd.infn.it"; JobStatus=4; ChangeTime="2011-11-04 14:08:55"; JwExitCode=0; Reason="reason=0"; ClientJobId="622028514"; BlahJobName="cre38_622028514";]

Bug #83347 (Incorrect special character handling for BLAH Arguments and Environment attributes) Not implemented

Log on a cream ce as user tomcat. Create a proxy of yours and copy it as /tmp/proxy (change the ownership to tomcat.tomcat).

Create the file /home/dteam001/dir1/fstab (you can copy /etc/fstab).

Submit a job directly via blah (in the following change pbs and creamtest1 with the relevant batch system and queue names):

BLAH_JOB_SUBMIT 1 [Cmd="/bin/echo";Args="$HOSTNAME";Out="/tmp/stdout_l15367";In="/dev/null";GridType="pbs";Queue="creamtest1";x509userproxy="/tmp/proxy";Iwd="/tmp";TransferOutput="output_file";TransferOutputRemaps="output_file=/tmp/stdout_l15367";GridResource="blah"]

Verify that in the output file there is the hostname of the WN.

Bug #87419 (blparser_master add some spurious character in the BLParser command line) Not implemented

Configure a CREAM CE using the old blparser. Check the blparser process using ps. It shouldn't show urious characters:

root     26485  0.0  0.2 155564  5868 ?        Sl   07:36   0:00 /usr/bin/BLParserPBS -d 1 -l /var/log/cream/glite-pbsparser.log -s /var/torque -p 33333 -m 56565

Bug #88974 BUpdaterSGE and BNotifier don't start if sge_helperpath var is not fixed Not implemented

Install and configure (via yaim) a CREAM-CE using GE as batch system.

Make sure that in /etc/blah.config the variable sge_helperpath is commented/is not there.

Try to restart the blparser: /etc/init.d/glite-ce-blahparser restart

It should work without problems. In particular it should not report the following error:

Starting BNotifier: /usr/bin/BNotifier: sge_helperpath not defined. Exiting
[FAILED]
Starting BUpdaterSGE: /usr/bin/BUpdaterSGE: sge_helperpath not defined. Exiting
[FAILED] 

Bug 89859 There is a memory leak in the updater for LSF, PBS and Condor Not implemented

Configure a CREAM CE using the new blparser.

Submit 1000 jobs using e.g. this JDL:

[
executable="/bin/sleep";
arguments="100";
]

Keep monitoring the memory used by the bupdaterxxx process. It should basically not increase.

The test should be done for both LSF and Torque/PBS.


CREAM

Bug #69857 Job submission to CreamCE is enabled by restart of service even if it was previously disabled - Implemented

STATUS: Implemented

To test the fix:

  • disable the submission on the CE
This can be achieved via the `glite-ce-disable-submission host:port` command (provided by the CREAM CLI package installed on the UI), that can be issued only by a CREAM CE administrator, that is the DN of this person must be listed in the /etc/grid-security/admin-list file of the CE.

Output should be: "Operation for disabling new submissions succeeded"

  • restart tomcat on the CREAM CE (service tomcat restart - on CE)

  • verify if the submission is disabled (glite-ce-allowed-submission)
This can be achieved via the `glite-ce-enable-submission host:port` command (provided by the CREAM CLI package installed on the UI).

Output should be: "Job submission to this CREAM CE is enabled"

Bug #81561 Make JobDBAdminPurger script compliant with CREAM EMI environment. - Implemented

STATUS: Implemented

To test the fix, simply run on the CREAM CE as root the JobDBAdminPurger.sh. E.g.:

# JobDBAdminPurger.sh -c /etc/glite-ce-cream/cream-config.xml -u <user> -p <passwd> -s DONE-FAILED,0 
START jobAdminPurger

It should work without reporting error messages:

-----------------------------------------------------------
Job CREAM595579358 is going to be purged ...
- Job deleted. JobId = CREAM595579358
CREAM595579358 has been purged!
-----------------------------------------------------------

STOP jobAdminPurger

Bug #81824 yaim-cream-ce should manage the attribute bupdater_loop_interval. - Implemented

STATUS: Implemented

To test the fix, check in the CREAM CE if the file /etc/blah.config contains the parameter bupdater_loop_interval.

Bug #83238 Sometimes CREAM does not update the state of a failed job. - Implemented

STATUS: Implemented

To test the fix, try to kill by hand a job.

The status of the job should eventually be:

   Status        = [DONE-FAILED]
   ExitCode      = [N/A]
   FailureReason = [Job has been terminated (got SIGTERM)]

Bug #83749 JobDBAdminPurger cannot purge jobs if configured sandbox dir has changed. - Not implemented

STATUS: Not implemented

To test the fix, submit some jobs and then reconfigure the service with a different value of CREAM_SANDBOX_PATH. Then try, with the JobDBAdminPurger.sh script, to purge some jobs submitted before the switch.

It must be verified:

  • that the jobs have been purged from the CREAM DB (i.e. a glite-ce-job-status should not find them anymore)
  • that the relevant CREAM sandbox directories have been deleted

Bug #84374 yaim-cream-ce: GlueForeignKey: GlueCEUniqueID: published using : instead of=. - Implemented

STATUS: Implemented

To test the fix, query the resource bdii of the CREAM-CE:

ldapsearch -h <CREAM CE host> -x -p 2170 -b "o=grid" | grep -i foreignkey | grep -i glueceuniqueid

Entries such as:

GlueForeignKey: GlueCEUniqueID=cream-35.pd.infn.it:8443/cream-lsf-creamtest1

i.e.:

GlueForeignKey: GlueCEUniqueID=<CREAM CE ID>

should appear.

Bug #86191 No info published by the lcg-info-dynamic-scheduler for one VOView - Implemented

STATUS: Implemented

To test the fix, issue the following ldapsearch query towards the resource bdii of the CREAM-CE:

$ ldapsearch -h cream-35 -x -p 2170 -b "o=grid" | grep -i GlueCEStateWaitingJobs | grep -i 444444

It should not find anything

Bug #87361 The attribute cream_concurrency_level should be configurable via yaim. - Implemented

STATUS: Implemented

To test the fix, set in seiteinfo.def the variable CREAM_CONCURRENCY_LEVEL to a certain number (n). After configuration verify that in /etc/glite-ce-cream/cream-config.xml there is:

         cream_concurrency_level="n"

Bug #87492 CREAM doesn't handle correctly the jdl attribute "environment". - Not implemented

STATUS: Not implemented

To test the fix, submit the following JDL using glite-ce-job-submit:

Environment = {
"GANGA_LCG_VO='camont:/camont/Role=lcgadmin'",
"LFC_HOST='lfc0448.gridpp.rl.ac.uk'",
"GANGA_LOG_HANDLER='WMS'"
}; 
executable="/bin/env";
stdoutput="out.out";
outputsandbox={"out.out"};
outputsandboxbasedesturi="gsiftp://localhost";

When the job is done, retrieve the output and check that in out.out the variables GANGA_LCG_VO, LFC_HOST and GANGA_LOG_HANDLER have exactly the values defined in the JDL.

gLite-CLUSTER

Bug #69318 The cluster publisher needs to publish in GLUE 2 too Not implemented

ldapsearch -h <gLite-CUSTER hostname> -x -p 2170 -b "o=glue" objectclass=GLUE2ComputingService

ldapsearch -h <gLite-CUSTER hostname> -x -p 2170 -b "o=glue" objectclass=GLUE2Manager

  • Check if the resource BDII publishes glue 2 GLUE2Share objectclasses. There should be one GLUE2Share objectclass per each VOview.

ldapsearch -h <gLite-CUSTER hostname> -x -p 2170 -b "o=glue" objectclass=GLUE2Share

ldapsearch -h <gLite-CUSTER hostname> -x -p 2170 -b "o=glue" objectclass=GLUE2ExecutionEnvironment

ldapsearch -h  <gLite-CUSTER hostname> -x -p 2170 -b "o=glue" "(&(objectclass=GLUE2ComputingEndPoint)(GLUE2EndpointInterfaceName=org.glite.ce.ApplicationPublisher))"

Bug #86512 YAIM CLuster Publisher incorrectly configures GlueClusterService and GlueForeignKey for CreamCEs- Not implemented

To test the fix issue a ldapsearch such as:

ldapsearch -h <gLite-CLUSTER> -x -p 2170 -b "o=grid" | grep GlueClusterService

Then issue a ldapsearch such as:

ldapsearch -h  <gLite-CLUSTER> -x -p 2170 -b "o=grid" | grep GlueForeignKey | grep -v Site

Verify that for each returned line, the format is:

<hostname>:8443/cream-<lrms>-<queue>

Bug #87691 Not possible to map different queues of the same CE to different clusters - Not implemented

To test this fix, configure a gLite-CLUSTER with at least two different queues mapped to different clusters (use the yaim variables QUEUE__CLUSTER_UniqueID), e.g."

QUEUE_CREAMTEST1_CLUSTER_UniqueID=cl1id
QUEUE_CREAMTEST2_CLUSTER_UniqueID=cl2id

Then query the resource bdii of the gLite-CLUSTER and verify that:

  • for the GlueCluster objectclass with GlueClusterUniqueID equal to cl1id, the attributes GlueClusterService and GlueForeignKey refers to CEIds with creamtest1 as queue
  • for the GlueCluster objectclass with GlueClusterUniqueID equal to cl2id, the attributes GlueClusterService and GlueForeignKey refers to CEIds with creamtest2 as queue

Bug #87799 Add yaim variables to configure the GLUE 2 WorkingArea attributes - Not implemented

Set all (or some) of the following yaim variables:

WORKING_AREA_SHARED
WORKING_AREA_GUARANTEED
WORKING_AREA_TOTAL
WORKING_AREA_FREE
WORKING_AREA_LIFETIME
WORKING_AREA_MULTISLOT_TOTAL
WORKING_AREA_MULTISLOT_FREE
WORKING_AREA_MULTISLOT_LIFETIME

and then configure via yaim. Then query the resource bdii of the gLite cluster and verify that the relevant attributes of the glue2 ComputingManager object are set.

CREAM Torque module

Bug #17325 Default time limits not taken into account - Not implemented

To test the fix for this bug, consider a PBS installation where for a certain queue both default and max values are specified, e.g.:

resources_max.cput = A
resources_max.walltime = B
resources_default.cput = C
resources_default.walltime = D

Verify that the published value for GlueCEPolicyMaxCPUTime is C and that the published value for GlueCEPolicyMaxWallClockTime is D

Bug #49653 lcg-info-dynamic-pbs should check pcput in addition to cput - Not implemented

To test the fix for this bug, consider a PBS installation where for a certain queue both cput and pcput max values are specified, e.g.:

resources_max.cput = A
resources_max.pcput = B

Verify that the published value for GlueCEPolicyMaxCPUTime is the minimum between A an B.

Then consider a PBS installation where for a certain queue both cput and pcput max and default values are specified, e.g.:

resources_max.cput = C
resources_default.cput = D
resources_max.pcput = E
resources_default.pcput = F

Verify that the published value for GlueCEPolicyMaxCPUTime is the minimum between D and F.

Bug #76162 YAIM for APEL parsers to use the BATCH_LOG_DIR for the batch system log location - Not implemented

To test the fix for this bug, set the yaim variable BATCH_ACCT_DIR and configure via yaim.

Check the file /etc/glite-apel-pbs/parser-config-yaim.xml and verify the section:

<Logs searchSubDirs="yes" reprocess="no">
            <Dir>X</Dir>

X should be the value specified for BATCH_ACCT_DIR.

Then reconfigure without setting BATCH_ACCT_DIR.

Check the file /etc/glite-apel-pbs/parser-config-yaim.xml and verify that the directory name is ${TORQUE_VAR_DIR}/server_priv/accounting

Bug #77106 PBS info provider doesn't allow - in a queue name - Not implemented

To test the fix, configure a CREAM CE in a PBS installation where at least a queue has a - in its name.

Then log as root on the CREAM CE and run:

/sbin/runuser -s /bin/sh ldap -c "/var/lib/bdii/gip/plugin/glite-info-dynamic-ce"

Check if the returned information is correct.

CREAM LSF module

Bug #88720 Too many '9' in GlueCEPolicyMaxCPUTime for LSF - Not implemented

To test the fix, query the CREAM CE resource bdii in the following way:

ldapsearch -h <CREAM CE node> -x -p 2170 -b "o=grid" | grep GlueCEPolicyMaxCPUTime | grep 9999999999

This shouldn't return anything.

Bug #89767 The LSF dynamic infoprovider shouldn't publish GlueCEStateFreeCPUs and GlueCEStateFreeJobSlots - Not implemented

To test the fix, log as root on the CREAM CE and run:

/sbin/runuser -s /bin/sh ldap -c "/var/lib/bdii/gip/plugin/glite-info-dynamic-ce"

Among the returned information, there shouldn't be GlueCEStateFreeCPUs and GlueCEStateFreeJobSlots.

Bug #89794 LSF info provider doesn't allow - in a queue name - Not implemented

To test the fix, configure a CREAM CE in a LSF installation where at least a queue has a - in its name.

Then log as root on the CREAM CE and run:

/sbin/runuser -s /bin/sh ldap -c "/var/lib/bdii/gip/plugin/glite-info-dynamic-ce"

Check if the returned information is correct.

Bug #90113 missing yaim check for batch system - Not implemented

To test the fix, configure a CREAM CE without having also installed LSF.

yaim installation should fail saying that there were problems with LSF installation.

-- MassimoSgaravatto - 2011-11-07

Edit | Attach | PDF | History: r111 | r19 < r18 < r17 < r16 | Backlinks | Raw View | More topic actions...
Topic revision: r17 - 2012-01-03 - MassimoSgaravatto
 

  • Edit
  • Attach
This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback