Regression Test Work Plan

BLAH

Fixes provided with BLAH 1.18.0

Bug #86238 blahpd doesn't check the status of its daemons when idlingTBD

Bug #86918 Request for passing all submit command attributes to the local configuration script. TBD

Bug #89504 Repeated notification problem for BLParserLSF TBD

Bug #89527 BLAHP produced -W stage(in/out) directives are incompatible with Torque 2.5.8 Not implemented

To test this fix, configure a CREAM CE with PBS/Torque 2.5.8.

If this is not possible and you have another torque version, apply the change documented at:

https://wiki.italiangrid.it/twiki/bin/view/CREAM/TroubleshootingGuide#5_1_Saving_the_batch_job_submiss

to save the submission script.

Submit a job and check in /tmp the pbs job submission script.

It should contain something like:

#PBS -W stagein=\'CREAM610186385_jobWrapper.sh.18757.13699.1328001723@cream-38.pd.infn.it:/var/c\
ream_sandbox/dteam/CN_Massimo_Sgaravatto_L_Padova_OU_Personal_Certificate_O_INFN_C_IT_dteam_Role\
_NULL_Capability_NULL_dteam042/61/CREAM610186385/CREAM610186385_jobWrapper.sh,cre38_610186385.pr\
oxy@cream-38.pd.infn.it:/var/cream_sandbox/dteam/CN_Massimo_Sgaravatto_L_Padova_OU_Personal_Cert\
ificate_O_INFN_C_IT_dteam_Role_NULL_Capability_NULL_dteam042/proxy/5a34c64e2a8db2569284306e9a472\
3d2d40045a7_13647008746533\'
#PBS -W stageout=\'out_cre38_610186385_StandardOutput@cream-38.pd.infn.it:/var/cream_sandbox/dte\
am/CN_Massimo_Sgaravatto_L_Padova_OU_Personal_Certificate_O_INFN_C_IT_dteam_Role_NULL_Capability\
_NULL_dteam042/61/CREAM610186385/StandardOutput,err_cre38_610186385_StandardError@cream-38.pd.in\
fn.it:/var/cream_sandbox/dteam/CN_Massimo_Sgaravatto_L_Padova_OU_Personal_Certificate_O_INFN_C_I\
T_dteam_Role_NULL_Capability_NULL_dteam042/61/CREAM610186385/StandardError\'

i.e. a stagein and a stageout directives, with escaped quotes around the whole lists.

Bug #90082 BUpdaterPBS workaround if tracejob is in infinite loop TBD

Bug #90085 Suspend command doesn't work with old parser Not implemented

To test the fix configure a CREAM CE with the old blparser.

Then submit a job and after a while suspend it using the glite-ce-job-suspend command.

Check the job status which eventually should be HELD.

Bug #90101 Missing 'Iwd' Attribute when trasferring files with the 'TransferInput' attribute may cause thread to loop TBD

Bug #90927 Problem with init script for blparser TBD

Fixes provided with BLAH 1.16.4

Bug #88974 BUpdaterSGE and BNotifier don't start if sge_helperpath var is not fixed Not implemented

Install and configure (via yaim) a CREAM-CE using GE as batch system.

Make sure that in /etc/blah.config the variable sge_helperpath is commented/is not there.

Try to restart the blparser: /etc/init.d/glite-ce-blahparser restart

It should work without problems. In particular it should not report the following error:

Starting BNotifier: /usr/bin/BNotifier: sge_helperpath not defined. Exiting
[FAILED]
Starting BUpdaterSGE: /usr/bin/BUpdaterSGE: sge_helperpath not defined. Exiting
[FAILED] 

Bug 89859 There is a memory leak in the updater for LSF, PBS and Condor Not implemented

Configure a CREAM CE using the new blparser.

Submit 1000 jobs using e.g. this JDL:

[
executable="/bin/sleep";
arguments="100";
]

Keep monitoring the memory used by the bupdaterxxx process. It should basically not increase.

The test should be done for both LSF and Torque/PBS.

Fixes provided with BLAH 1.16.3

Bug #75854 Problems related to the growth of the blah registry) Not implemented

Configure a CREAM CE using the new BLparser.

Verify that in /etc/blah.config there is: job_registry_use_mmap=yes (default scenario).

Submit 5000 jobs on a CREAM CE using the following JDL:

[
executable="/bin/sleep";
arguments="100";
]

Monitor the BLAH processed. Verify that each of them doesn't use more than 50 MB.

Bug #77776 (BUpdater should have an option to use cached batch system commands) Not implemented

Add:

lsf_batch_caching_enabled=yes
batch_command_caching_filter=/usr/bin/runcmd.pl
in /etc/blah.config.

Create and fill /usr/bin/runcmd.pl with the following content:

#!/usr/bin/perl
#---------------------#
#  PROGRAM:  argv.pl  #
#---------------------#

$numArgs = $#ARGV + 1;
open (MYFILE, '>>/tmp/xyz');
foreach $argnum (0 .. $#ARGV) {
    print MYFILE "$ARGV[$argnum] ";
}
print MYFILE "\n";
close (MYFILE); 

Submit some jobs. Check that in /tmp/xyz the queries to the batch system are recorded. E.g. for LSF something like that should be reported:

/opt/lsf/7.0/linux2.6-glibc2.3-x86/bin/bjobs
-u
all
-l
/opt/lsf/7.0/linux2.6-glibc2.3-x86/bin/bjobs
-u
all
-l
...

Bug #80805 (BLAH job registry permissions should be improved) Not implemented

Check permissions and ownership under /var/blah. They should be:

/var/blah:
total 12
-rw-r--r-- 1 tomcat tomcat    5 Oct 18 07:32 blah_bnotifier.pid
-rw-r--r-- 1 tomcat tomcat    5 Oct 18 07:32 blah_bupdater.pid
drwxrwx--t 4 tomcat tomcat 4096 Oct 18 07:38 user_blah_job_registry.bjr

/var/blah/user_blah_job_registry.bjr:
total 16
-rw-rw-r-- 1 tomcat tomcat 1712 Oct 18 07:38 registry
-rw-r--r-- 1 tomcat tomcat  260 Oct 18 07:38 registry.by_blah_index
-rw-rw-rw- 1 tomcat tomcat    0 Oct 18 07:38 registry.locktest
drwxrwx-wt 2 tomcat tomcat 4096 Oct 18 07:38 registry.npudir
drwxrwx-wt 2 tomcat tomcat 4096 Oct 18 07:38 registry.proxydir
-rw-rw-r-- 1 tomcat tomcat    0 Oct 18 07:32 registry.subjectlist

/var/blah/user_blah_job_registry.bjr/registry.npudir:
total 0

/var/blah/user_blah_job_registry.bjr/registry.proxydir:
total 0

Bug #81354 (Missing 'Iwd' Attribute when trasferring files with the 'TransferInput' attribute causes thread to loop) Not implemented

Log on a cream ce as user tomcat. Create a proxy of yours and copy it as /tmp/proxy (change the ownership to tomcat.tomcat).

Create the file /home/dteam001/dir1/fstab (you can copy /etc/fstab).

Submit a job directly via blah (in the following change pbs and creamtest2 with the relevant batch system and queue names):

$ /usr/bin/blahpd
$GahpVersion: 1.16.2 Mar 31 2008 INFN\ blahpd\ (poly,new_esc_format) $
BLAH_SET_SUDO_ID dteam001
S Sudo\ mode\ on
blah_job_submit 1 [cmd="/bin/cp";Args="fstab\ fstab.out";TransferInput="/home/dteam001/dir1/fstab";TransferOutput="fstab.out";TransferOutputRemaps="fstab.out=/home/dteam001/dir1/fstab.out";gridtype="pbs";queue="creamtest2";x509userproxy="/tmp/proxy"]
S
results
S 1
1 0 No\ error pbs/20111010/304.cream-38.pd.infn.it

Eventually check the content of /home/dteam001/dir1/ where you see both fstab and fstab.out:

$ ls /home/dteam001/dir1/
fstab  fstab.out

Bug #81824 (yaim-cream-ce should manage the attribute bupdater_loop_interval) Implemented

Set BUPDATER_LOOP_INTERVAL to 30 in siteinfo.def and reconfigure via yaim. Then verify that in blah.config there is:

bupdater_loop_interval=30

Bug #82281 (blahp.log records should always contain CREAM job ID) Not implement

Submit a job directly to CREAM using CREAM-CLI. Then submit a job to CREAM through the WMS.

In the accounting log file (/var/log/cream/accounting/blahp.log-<date>) in both cases the clientID field should end with the numeric part of the CREAM jobid, e.g.:

"timestamp=2011-10-10 14:37:38" "userDN=/C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Massimo Sgaravatto" "userFQAN=/dteam/Role=NULL/Capability=NULL" "userFQAN=/dteam/NGI_IT/Role=NULL/Capability=NULL" "ceID=cream-38.pd.infn.it:8443/cream-pbs-creamtest2" "jobID=CREAM956286045" "lrmsID=300.cream-38.pd.infn.it" "localUser=18757" "clientID=cre38_956286045"

"timestamp=2011-10-10 14:39:57" "userDN=/C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Massimo Sgaravatto" "userFQAN=/dteam/Role=NULL/Capability=NULL" "userFQAN=/dteam/NGI_IT/Role=NULL/Capability=NULL" "ceID=cream-38.pd.infn.it:8443/cream-pbs-creamtest2" "jobID=https://devel19.cnaf.infn.it:9000/dLvm84LvD7w7QXtLZK4L0A" "lrmsID=302.cream-38.pd.infn.it" "localUser=18757" "clientID=cre38_315532638"

Bug #82297 (blahp.log rotation period is too short) Not implemented

Check that in /etc/logrotate.d/blahp-logrotate rotate is equal to 365:

# cat /etc/logrotate.d/blahp-logrotate
/var/log/cream/accounting/blahp.log {
        copytruncate
        rotate 365
        size = 10M
        missingok
        nomail
}

Bug #83275 (Problem in updater with very short jobs that can cause no notification to cream) Not implemented

Configure a CREAM CE using the new blparser. Submit a job using the following JDL:

[
executable="/bin/echo";
arguments="ciao";
]

Check in the bnotifier log file (/var/log/cream/glite-ce-bnotifier.log that at least a notification is sent for this job, e.g.:

2011-11-04 14:11:11 Sent for Cream:[BatchJobId="927.cream-38.pd.infn.it"; JobStatus=4; ChangeTime="2011-11-04 14:08:55"; JwExitCode=0; Reason="reason=0"; ClientJobId="622028514"; BlahJobName="cre38_622028514";]

Bug #83347 (Incorrect special character handling for BLAH Arguments and Environment attributes) Not implemented

Log on a cream ce as user tomcat. Create a proxy of yours and copy it as /tmp/proxy (change the ownership to tomcat.tomcat).

Create the file /home/dteam001/dir1/fstab (you can copy /etc/fstab).

Submit a job directly via blah (in the following change pbs and creamtest1 with the relevant batch system and queue names):

BLAH_JOB_SUBMIT 1 [Cmd="/bin/echo";Args="$HOSTNAME";Out="/tmp/stdout_l15367";In="/dev/null";GridType="pbs";Queue="creamtest1";x509userproxy="/tmp/proxy";Iwd="/tmp";TransferOutput="output_file";TransferOutputRemaps="output_file=/tmp/stdout_l15367";GridResource="blah"]

Verify that in the output file there is the hostname of the WN.

Bug #87419 (blparser_master add some spurious character in the BLParser command line) Not implemented

Configure a CREAM CE using the old blparser. Check the blparser process using ps. It shouldn't show urious characters:

root     26485  0.0  0.2 155564  5868 ?        Sl   07:36   0:00 /usr/bin/BLParserPBS -d 1 -l /var/log/cream/glite-pbsparser.log -s /var/torque -p 33333 -m 56565


CREAM

Fixes provided with CREAM 1.14

Bug #59871 lcg-info-dynamic-software must split tag lines on white space - Not Implemented

To verify the fix edit a VO.list file under /opt/glite/var/info/cream-38.pd.infn.it/VO adding:

tag1 tag2
tag3

Then query the resource bdii, where you should see:

...
GlueHostApplicationSoftwareRunTimeEnvironment: tag1
GlueHostApplicationSoftwareRunTimeEnvironment: tag2
GlueHostApplicationSoftwareRunTimeEnvironment: tag3
...

Bug #68968 lcg-info-dynamic-software should protect against duplicate RTE tags - Not Implemented

To verify the fix edit a VO.list file under /opt/glite/var/info/cream-38.pd.infn.it/VO adding:

tag1
tag2
TAG1
tag1

Then query the resource bdii:

ldapsearch -h <CE host> -x -p 2170 -b "o=grid" | grep -i tag

This should return:

GlueHostApplicationSoftwareRunTimeEnvironment: tag1
GlueHostApplicationSoftwareRunTimeEnvironment: tag2

Bug #69854 CreamCE should publish non-production state when job submission is disabled - Not Implemented

Disable job submission with glite-ce-disable-submission. Wait 3 minutes and then perform the following ldap query:

# ldapsearch -h <CREAM CE node> -x -p 2170 -b "o=grid" | grep GlueCEStateStatus

For each GlueCE this should return:

GlueCEStateStatus: Draining

Then re-enable the submission. Edit the script /usr/bin/glite_cream_load_monitor to trigger job submission disabling. E.g. change:

$MemUsage  = 95;

with:

$MemUsage  = 1;

Wait 15 minutes and then perform the following ldap query:

# ldapsearch -h <CREAM CE node> -x -p 2170 -b "o=grid" | grep GlueCEStateStatus

For each GlueCE this should return:

GlueCEStateStatus: Draining

Bug #69857 Job submission to CreamCE is enabled by restart of service even if it was previously disabled - Implemented

STATUS: Implemented

To test the fix:

  • disable the submission on the CE
This can be achieved via the `glite-ce-disable-submission host:port` command (provided by the CREAM CLI package installed on the UI), that can be issued only by a CREAM CE administrator, that is the DN of this person must be listed in the /etc/grid-security/admin-list file of the CE.

Output should be: "Operation for disabling new submissions succeeded"

  • restart tomcat on the CREAM CE (service tomcat restart - on CE)

  • verify if the submission is disabled (glite-ce-allowed-submission)
This can be achieved via the `glite-ce-enable-submission host:port` command (provided by the CREAM CLI package installed on the UI).

Output should be: "Job submission to this CREAM CE is disabled"

Bug #77791 CREAM installation does not fail if sudo is not installed - Not Implemented

Try to configure via yaim a CREAM-CE where the sudo executable is not installed,

The configuration should fail saying:

 ERROR: sudo probably not installed !

Bug #79362 location of python files provided with lcg-info-dynamic-scheduler-generic-2.3.5-0.sl5 - Not Implemented

To verify the fix, do a:

rpm -ql dynsched-generic

and verify that the files are installed in usr/lib/python2.4 and not more in /usr/lib/python.

Bug #80410 CREAM bulk submission CLI is desirable - Not Implemented

To test the fix, specify multiple JDLs in the glite-ce-job-submit command, e.g.:

glite-ce-job-submit --debug -a -r cream-47.pd.infn.it:8443/cream-lsf-creamtest1 jdl1.jdl jdl2.jdl jdl3.jdl

Considering the above example, verify that 3 jobs are submitted and 3 jobids are returned.

Bug #81734 removed conf file retrieve from old path that is not EMI compliant - Not Implemented

To test the fix, create the conf file /etc/glite_cream.conf with the following content:

[
CREAM_URL_PREFIX="abc://";
]

Try then e.g. the following command:

glite-ce-job-list --debug cream-47.pd.infn.it

It should report that it is trying to contact abc://cream-47.pd.infn.it:8443//ce-cream/services/CREAM2:

2012-01-13 14:44:39,028 DEBUG - Service address=[abc://cream-47.pd.infn.it:8443//ce-cream/services/CREAM2]

Move the conf file as /etc/VO/glite_cream.conf and repeat the test which should give the same result

Then move the conf file as ~/.glite/VO/glite_cream.conf and repeat the test which should give the same result

Bug #82206 yaim-cream-ce: BATCH_LOG_DIR missing among the required attributes - Not Implemented

Try to configure a CREAM CE with Torque using yaim without setting BLPARSER_WITH_UPDATER_NOTIFIER and without setting BATCH_LOG_DIR.

It should fail saying:

 INFO: Executing function: config_cream_blah_check 
 ERROR: BATCH_LOG_DIR is not set
 ERROR: Error during the execution of function: config_cream_blah_check

Bug #83314 Information about the RTEpublisher service should be available also in glue2 - Not Implemented

Check if the resource BDII publishes glue 2 GLUE2ComputingEndPoint objectclasses with GLUE2EndpointInterfaceName equal to org.glite.ce.ApplicationPublisher. If the CE is configured in no cluster mode there should be one of such objectclass. If the CE is configured in cluster mode and the gLite-CLUSTER is deployed on a different node, there shouldn't be any of such objectclasses.

ldapsearch -h  <CREAM CE hostname> -x -p 2170 -b "o=glue" "(&(objectclass=GLUE2ComputingEndPoint)(GLUE2EndpointInterfaceName=org.glite.ce.ApplicationPublisher))"

Bug #83338 endpointType (in GLUE2ServiceComplexity) hardwired to 1 in CREAM CE is not always correct - Not Implemented

Perform the following query on the resource bdii of the CREAM CE:

 -p 2170 -b "o=glue" | grep -i endpointtype

endpointtype should be 3 if CEMon is deployed (USE_CEMON is true). 2 otherwise.

Bug #83474 Some problems concerning glue2 publications of CREAM CE configured in cluster mode - Not Implemented

Configure a CREAM CE in cluster mode, with the gLite-CLUSTER configured on a different host.

ldapsearch -h <CREAM CE hostname> -x -p 2170 -b "o=glue" objectclass=GLUE2ComputingService

ldapsearch -h <CREAM CE hostname> -x -p 2170 -b "o=glue" "(&(objectclass=GLUE2ComputingEndPoint)(GLUE2EndpointInterfaceName=org.glite.ce.CREAM))"

  • Check if the resource BDII publishes glue 2 GLUE2Manager objectclasses. There shouldn't be any GLUE2Manager objectclass.

ldapsearch -h <CREAM CE hostname> -x -p 2170 -b "o=glue" objectclass=GLUE2Manager

  • Check if the resource BDII publishes glue 2 GLUE2Share objectclasses. There shouldn't be any GLUE2Share objectclass.

ldapsearch -h <CREAM CE hostname> -x -p 2170 -b "o=glue" objectclass=GLUE2Share

ldapsearch -h <CREAM CE hostname> -x -p 2170 -b "o=glue" objectclass=GLUE2ExecutionEnvironment

ldapsearch -h  <CREAM CE hostname> -x -p 2170 -b "o=glue" "(&(objectclass=GLUE2ComputingEndPoint)(GLUE2EndpointInterfaceName=org.glite.ce.ApplicationPublisher))"

Bug #83592 CREAM client doesn't allow the delegation of RFC proxies - Not Implemented

Create a RFC proxy, e.g.:

voms-proxy-init -voms dteam -rfc

and then submit using glite-ce-job-submit a job using ISB and OSB, e.g.:

[
executable="ssh1.sh";
inputsandbox={"file:///home/sgaravat/JDLExamples/ssh1.sh", "file:///home/sgaravat/a"};
stdoutput="out3.out";
stderror="err2.err";
outputsandbox={"out3.out", "err2.err", "ssh1.sh", "a"};
outputsandboxbasedesturi="gsiftp://localhost";
]

Verify that the final status is DONE-OK

Bug #83593 Problems limiting RFC proxies in CREAM - Not Implemented

Consider the same test done for bug #83592

Bug #84308 Error on glite_cream_load_monitor if cream db is on another host - Not Implemented

Configure a CREAM CE with the database installed on a different host than the CREAM CE.

Run:

/usr/bin/glite_cream_load_monitor --show

which shouldn't report any error.

Bug #86522 glite-ce-job-submit authorization error message difficoult to understand - Not Implemented

TBD

Bug #86609 yaim variable CE_OTHERDESCR not properly managed for Glue2 - Not Implemented

Try to set the yaim variable CE_OTHERDESCR to:

CE_OTHERDESCR="Cores=1"

Perform the following ldap query on the resource bdii:

ldapsearch -h <CREAM CE node> -x -p 2170 -b "o=glue" objectclass=GLUE2ExecutionEnvironment GLUE2EntityOtherInfo

This should also return:

GLUE2EntityOtherInfo: Cores=1

Try then to set the yaim variable CE_OTHERDESCR to:

CE_OTHERDESCR="Cores=1,Benchmark=150-HEP-SPEC06

and reconfigure via yaim.

Perform the following ldap query on the resource bdii:

ldapsearch -h <CREAM CE node> -x -p 2170 -b "o=glue" objectclass=GLUE2ExecutionEnvironment GLUE2EntityOtherInfo

This should also return:

GLUE2EntityOtherInfo: Cores=1

Then perform the following ldap query on the resource bdii:

ldapsearch -h <CREAM CE node> -x -p 2170 -b "o=glue" objectclass=Glue2Benchmark 

This should return something like:

dn: GLUE2BenchmarkID=cream-47.pd.infn.it_hep-spec06,GLUE2ResourceID=cream-47.pd.infn.it,GLUE2ServiceID=cream-47.pd.infn.it_ComputingElement,GLUE2GroupID=re
 source,o=glue
GLUE2BenchmarkExecutionEnvironmentForeignKey: cream-47.pd.infn.it
GLUE2BenchmarkID: cream-47.pd.infn.it_hep-spec06
GLUE2BenchmarkType: hep-spec06
objectClass: GLUE2Entity
objectClass: GLUE2Benchmark
GLUE2EntityCreationTime: 2012-01-13T14:04:48Z
GLUE2BenchmarkValue: 150
GLUE2EntityOtherInfo: InfoProviderName=glite-ce-glue2-benchmark-static
GLUE2EntityOtherInfo: InfoProviderVersion=1.0
GLUE2EntityOtherInfo: InfoProviderHost=cream-47.pd.infn.it
GLUE2BenchmarkComputingManagerForeignKey: cream-47.pd.infn.it_ComputingElement_Manager
GLUE2EntityName: Benchmark hep-spec06

Bug #86694 A different port number than 9091 should be used for LRMS_EVENT_LISTENER - Not Implemented

On a running CREAM CE, perform the following command:

netstat -an | grep -i 9091

This shouldn't return anything.

Then perform the following command:

netstat -an | grep -i 49152

This should return:

tcp        0      0 :::49152                    :::*                        LISTEN      

[root@cream-47 ~]# netstat -an | grep -i 49153 [root@cream-47 ~]# netstat -an | grep -i 49154 [root@cream-47 ~]# netstat -an | grep -i 9091

Bug #86697 User application's exit code not recorded in the CREAM log file - Not Implemented

Submit a job and wait for its completion. Then check the glite-ce-cream.log file on the CREAM CE. The user exit code should be reported (filed exitCode), e.g.:

13 Jan 2012 15:22:52,966 org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor - JOB CREAM124031222 STATUS CHANGED: REALLY-RUNNING => DONE-OK [failureReason=reason=0] [exitCode=23] [localUser=dteam004] [workerNode=prod-wn-001.pn.pd.infn.it] [delegationId=7a52772caaeea96628a1ff9223e67a1f6c6dde9f]

Bug #86737 A different port number than 9909 should be used for CREAM_JOB_SENSOR - Not Implemented

On a running CREAM CE, perform the following command:

netstat -an | grep -i 9909

This shouldn't return anything.

Bug #86773 wrong /etc/glite-ce-cream/cream-config.xml with multiple ARGUS servers set - Not Implemented

To test the fix, set in the siteinfo,def:

USE_ARGUS=yes
ARGUS_PEPD_ENDPOINTS="https://cream-46.pd.infn.it:8154/authz https://cream-46-1.pd.infn.it:8154/authz"
CREAM_PEPC_RESOURCEID="http://pd.infn.it/cream-47"

i.e. 2 values for ARGUS_PEPD_ENDPOINTS.

Then configure via yaim.

In /etc/glite-ce-cream/cream-config.xml there should be:

 <argus-pep name="pep-client1"
             resource_id="http://pd.infn.it/cream-47"
             cert="/etc/grid-security/tomcat-cert.pem"
             key="/etc/grid-security/tomcat-key.pem"
             passwd=""
             mapping_class="org.glite.ce.cream.authz.argus.ActionMapping">
    <endpoint url="https://cream-46.pd.infn.it:8154/authz" />
    <endpoint url="https://cream-46-1.pd.infn.it:8154/authz" />
  </argus-pep>

Bug #87690 Not possible to map different queues to different clusters for CREAM configured in cluster mode - Not Implemented

Configure via yaim a CREAM CE in cluster mode with different queues mapped to different clusters, e.g.:

CREAM_CLUSTER_MODE=yes
CE_HOST_cream_47_pd_infn_it_QUEUES="creamtest1 creamtest2"
QUEUE_CREAMTEST1_CLUSTER_UniqueID=cl1id
QUEUE_CREAMTEST2_CLUSTER_UniqueID=cl2id

Then query the resource bdii of the CREAM, and check the GlueForeignKey attributes of the different glueCEs: they should refer to the specified clusters:

ldapsearch -h cream-47.pd.infn.it -p 2170 -x -b o=grid objectclass=GlueCE GlueForeignKey
# extended LDIF
#
# LDAPv3
# base <o=grid> with scope subtree
# filter: objectclass=GlueCE
# requesting: GlueForeignKey 
#

# cream-47.pd.infn.it:8443/cream-lsf-creamtest2, resource, grid
dn: GlueCEUniqueID=cream-47.pd.infn.it:8443/cream-lsf-creamtest2,Mds-Vo-name=r
 esource,o=grid
GlueForeignKey: GlueClusterUniqueID=cl12d

# cream-47.pd.infn.it:8443/cream-lsf-creamtest1, resource, grid
dn: GlueCEUniqueID=cream-47.pd.infn.it:8443/cream-lsf-creamtest1,Mds-Vo-name=r
 esource,o=grid
GlueForeignKey: GlueClusterUniqueID=cl1id

Bug #87799 Add yaim variables to configure the GLUE 2 WorkingArea attributes - Not Implemented

Set all (or some) of the following yaim variables:

WORKING_AREA_SHARED
WORKING_AREA_GUARANTEED
WORKING_AREA_TOTAL
WORKING_AREA_FREE
WORKING_AREA_LIFETIME
WORKING_AREA_MULTISLOT_TOTAL
WORKING_AREA_MULTISLOT_FREE
WORKING_AREA_MULTISLOT_LIFETIME

and then configure via yaim. Then query the resource bdii of the CREAM CE and verify that the relevant attributes of the glue2 ComputingManager object are set.

Bug #88078 CREAM DB names should be configurable - Not Implemented

Configure from scratch a CREAM CE setting the yaim variables: CREAM_DB_NAME and DELEGATION_DB_NAME, e.g.:

CREAM_DB_NAME=abc
DELEGATION_DB_NAME=xyz

and then configure via yaim.

Then check if the two databases have been created:

# mysql -u xxx -p Enter password: Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 7176 Server version: 5.0.77 Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> show databases;

+--------------------+

Database
+--------------------+
information_schema
abc
test
xyz
+--------------------+ 4 rows in set (0.02 sec)

Try also a job submission to verify if everything works properly.

Bug #89489 yaim plugin for CREAM CE does not execute a check function due to name mismatch - Not Implemented

Configure a CREAM CE via yaim and save the yaim output. It should contain the string:

INFO: Executing function: config_cream_gip_scheduler_plugin_check

Bug #89664 yaim-cream-ce doesn't manage spaces in CE_OTHERDESCR - Not Implemented

Try to set the yaim variable CE_OTHERDESCR to:

CE_OTHERDESCR="Cores=1"

Perform the following ldap query on the resource bdii:

ldapsearch -h <CREAM CE node> -x -p 2170 -b "o=glue" objectclass=GLUE2ExecutionEnvironment GLUE2EntityOtherInfo

This should also return:

GLUE2EntityOtherInfo: Cores=1

Try then to set the yaim variable CE_OTHERDESCR to:

CE_OTHERDESCR="Cores=2, Benchmark=4-HEP-SPEC06"

and reconfigure via yaim.

Perform the following ldap query on the resource bdii:

ldapsearch -h <CREAM CE node> -x -p 2170 -b "o=glue" objectclass=GLUE2ExecutionEnvironment GLUE2EntityOtherInfo

This should also return:

GLUE2EntityOtherInfo: Cores=2

Then perform the following ldap query on the resource bdii:

ldapsearch -h <CREAM CE node> -x -p 2170 -b "o=glue" objectclass=Glue2Benchmark 

This should return something like:

# cream-47.pd.infn.it_hep-spec06, cream-47.pd.infn.it, ppp, resource, glue
dn: GLUE2BenchmarkID=cream-47.pd.infn.it_hep-spec06,GLUE2ResourceID=cream-47.pd.infn.it,GLUE2ServiceID=ppp,GLUE2GroupID=resource,o=glue
GLUE2BenchmarkExecutionEnvironmentForeignKey: cream-47.pd.infn.it
GLUE2BenchmarkID: cream-47.pd.infn.it_hep-spec06
GLUE2BenchmarkType: hep-spec06
objectClass: GLUE2Entity
objectClass: GLUE2Benchmark
GLUE2EntityCreationTime: 2012-01-13T17:07:52Z
GLUE2BenchmarkValue: 4
GLUE2EntityOtherInfo: InfoProviderName=glite-ce-glue2-benchmark-static
GLUE2EntityOtherInfo: InfoProviderVersion=1.0
GLUE2EntityOtherInfo: InfoProviderHost=cream-47.pd.infn.it
GLUE2BenchmarkComputingManagerForeignKey: ppp_Manager
GLUE2EntityName: Benchmark hep-spec06

Bug #89784 Improve client side description of authorization failure - Not Implemented

Try to remove the lsc files for your VO and try a submission to that CE.

It should return an authorization error.

Then check the glite-ce-cream.log. It should report something like:

13 Jan 2012 18:21:21,270 org.glite.voms.PKIVerifier - Cannot find usable certificates to validate the AC. Check that the voms server host certificate is in your vomsdir directory.
13 Jan 2012 18:21:21,602 org.glite.ce.commonj.authz.gjaf.LocalUserPIP - glexec error: [gLExec]:   LCAS failed, see '/var/log/glexec/lcas_lcmaps.log' for more info.
13 Jan 2012 18:21:21,603 org.glite.ce.commonj.authz.gjaf.ServiceAuthorizationChain - Failed to get the local user id via glexec: glexec error: [gLExec]:   LCAS failed, see '/var/log/glexec/lcas_lcmaps.log' for more info.
org.glite.ce.commonj.authz.AuthorizationException: Failed to get the local user id via glexec: glexec error: [gLExec]:   LCAS failed, see '/var/log/glexec/lcas_lcmaps.log' for more info.

Fixes provided with CREAM 1.13.3

Bug #81561 Make JobDBAdminPurger script compliant with CREAM EMI environment. - Implemented

STATUS: Implemented

To test the fix, simply run on the CREAM CE as root the JobDBAdminPurger.sh. E.g.:

# JobDBAdminPurger.sh -c /etc/glite-ce-cream/cream-config.xml -u <user> -p <passwd> -s DONE-FAILED,0 
START jobAdminPurger

It should work without reporting error messages:

-----------------------------------------------------------
Job CREAM595579358 is going to be purged ...
- Job deleted. JobId = CREAM595579358
CREAM595579358 has been purged!
-----------------------------------------------------------

STOP jobAdminPurger

Bug #83238 Sometimes CREAM does not update the state of a failed job. - Implemented

STATUS: Implemented

To test the fix, try to kill by hand a job.

The status of the job should eventually be:

   Status        = [DONE-FAILED]
   ExitCode      = [N/A]
   FailureReason = [Job has been terminated (got SIGTERM)]

Bug #83749 JobDBAdminPurger cannot purge jobs if configured sandbox dir has changed. - Implemented

STATUS: Not implemented

To test the fix, submit some jobs and then reconfigure the service with a different value of CREAM_SANDBOX_PATH. Then try, with the JobDBAdminPurger.sh script, to purge some jobs submitted before the switch.

It must be verified:

  • that the jobs have been purged from the CREAM DB (i.e. a glite-ce-job-status should not find them anymore)
  • that the relevant CREAM sandbox directories have been deleted

Bug #84374 yaim-cream-ce: GlueForeignKey: GlueCEUniqueID: published using : instead of=. - Implemented

STATUS: Implemented

To test the fix, query the resource bdii of the CREAM-CE:

ldapsearch -h <CREAM CE host> -x -p 2170 -b "o=grid" | grep -i foreignkey | grep -i glueceuniqueid

Entries such as:

GlueForeignKey: GlueCEUniqueID=cream-35.pd.infn.it:8443/cream-lsf-creamtest1

i.e.:

GlueForeignKey: GlueCEUniqueID=<CREAM CE ID>

should appear.

Bug #86191 No info published by the lcg-info-dynamic-scheduler for one VOView - Implemented

STATUS: Implemented

To test the fix, issue the following ldapsearch query towards the resource bdii of the CREAM-CE:

$ ldapsearch -h cream-35 -x -p 2170 -b "o=grid" | grep -i GlueCEStateWaitingJobs | grep -i 444444

It should not find anything

Bug #87361 The attribute cream_concurrency_level should be configurable via yaim. - Implemented

STATUS: Implemented

To test the fix, set in seiteinfo.def the variable CREAM_CONCURRENCY_LEVEL to a certain number (n). After configuration verify that in /etc/glite-ce-cream/cream-config.xml there is:

         cream_concurrency_level="n"

Bug #87492 CREAM doesn't handle correctly the jdl attribute "environment". - Implemented

STATUS: Implemented

To test the fix, submit the following JDL using glite-ce-job-submit:

Environment = {
"GANGA_LCG_VO='camont:/camont/Role=lcgadmin'",
"LFC_HOST='lfc0448.gridpp.rl.ac.uk'",
"GANGA_LOG_HANDLER='WMS'"
}; 
executable="/bin/env";
stdoutput="out.out";
outputsandbox={"out.out"};
outputsandboxbasedesturi="gsiftp://localhost";

When the job is done, retrieve the output and check that in out.out the variables GANGA_LCG_VO, LFC_HOST and GANGA_LOG_HANDLER have exactly the values defined in the JDL.

gLite-CLUSTER

Bug #69318 The cluster publisher needs to publish in GLUE 2 too Not implemented

ldapsearch -h <gLite-CUSTER hostname> -x -p 2170 -b "o=glue" objectclass=GLUE2ComputingService

ldapsearch -h <gLite-CUSTER hostname> -x -p 2170 -b "o=glue" objectclass=GLUE2Manager

  • Check if the resource BDII publishes glue 2 GLUE2Share objectclasses. There should be one GLUE2Share objectclass per each VOview.

ldapsearch -h <gLite-CUSTER hostname> -x -p 2170 -b "o=glue" objectclass=GLUE2Share

ldapsearch -h <gLite-CUSTER hostname> -x -p 2170 -b "o=glue" objectclass=GLUE2ExecutionEnvironment

ldapsearch -h  <gLite-CUSTER hostname> -x -p 2170 -b "o=glue" "(&(objectclass=GLUE2ComputingEndPoint)(GLUE2EndpointInterfaceName=org.glite.ce.ApplicationPublisher))"

Bug #86512 YAIM CLuster Publisher incorrectly configures GlueClusterService and GlueForeignKey for CreamCEs- Not implemented

To test the fix issue a ldapsearch such as:

ldapsearch -h <gLite-CLUSTER> -x -p 2170 -b "o=grid" | grep GlueClusterService

Then issue a ldapsearch such as:

ldapsearch -h  <gLite-CLUSTER> -x -p 2170 -b "o=grid" | grep GlueForeignKey | grep -v Site

Verify that for each returned line, the format is:

<hostname>:8443/cream-<lrms>-<queue>

Bug #87691 Not possible to map different queues of the same CE to different clusters - Not implemented

To test this fix, configure a gLite-CLUSTER with at least two different queues mapped to different clusters (use the yaim variables QUEUE__CLUSTER_UniqueID), e.g."

QUEUE_CREAMTEST1_CLUSTER_UniqueID=cl1id
QUEUE_CREAMTEST2_CLUSTER_UniqueID=cl2id

Then query the resource bdii of the gLite-CLUSTER and verify that:

  • for the GlueCluster objectclass with GlueClusterUniqueID equal to cl1id, the attributes GlueClusterService and GlueForeignKey refers to CEIds with creamtest1 as queue
  • for the GlueCluster objectclass with GlueClusterUniqueID equal to cl2id, the attributes GlueClusterService and GlueForeignKey refers to CEIds with creamtest2 as queue

Bug #87799 Add yaim variables to configure the GLUE 2 WorkingArea attributes - Not implemented

Set all (or some) of the following yaim variables:

WORKING_AREA_SHARED
WORKING_AREA_GUARANTEED
WORKING_AREA_TOTAL
WORKING_AREA_FREE
WORKING_AREA_LIFETIME
WORKING_AREA_MULTISLOT_TOTAL
WORKING_AREA_MULTISLOT_FREE
WORKING_AREA_MULTISLOT_LIFETIME

and then configure via yaim. Then query the resource bdii of the gLite cluster and verify that the relevant attributes of the glue2 ComputingManager object are set.

CREAM Torque module

Bug #17325 Default time limits not taken into account - Not implemented

To test the fix for this bug, consider a PBS installation where for a certain queue both default and max values are specified, e.g.:

resources_max.cput = A
resources_max.walltime = B
resources_default.cput = C
resources_default.walltime = D

Verify that the published value for GlueCEPolicyMaxCPUTime is C and that the published value for GlueCEPolicyMaxWallClockTime is D

Bug #49653 lcg-info-dynamic-pbs should check pcput in addition to cput - Not implemented

To test the fix for this bug, consider a PBS installation where for a certain queue both cput and pcput max values are specified, e.g.:

resources_max.cput = A
resources_max.pcput = B

Verify that the published value for GlueCEPolicyMaxCPUTime is the minimum between A an B.

Then consider a PBS installation where for a certain queue both cput and pcput max and default values are specified, e.g.:

resources_max.cput = C
resources_default.cput = D
resources_max.pcput = E
resources_default.pcput = F

Verify that the published value for GlueCEPolicyMaxCPUTime is the minimum between D and F.

Bug #76162 YAIM for APEL parsers to use the BATCH_LOG_DIR for the batch system log location - Not implemented

To test the fix for this bug, set the yaim variable BATCH_ACCT_DIR and configure via yaim.

Check the file /etc/glite-apel-pbs/parser-config-yaim.xml and verify the section:

<Logs searchSubDirs="yes" reprocess="no">
            <Dir>X</Dir>

X should be the value specified for BATCH_ACCT_DIR.

Then reconfigure without setting BATCH_ACCT_DIR.

Check the file /etc/glite-apel-pbs/parser-config-yaim.xml and verify that the directory name is ${TORQUE_VAR_DIR}/server_priv/accounting

Bug #77106 PBS info provider doesn't allow - in a queue name - Not implemented

To test the fix, configure a CREAM CE in a PBS installation where at least a queue has a - in its name.

Then log as root on the CREAM CE and run:

/sbin/runuser -s /bin/sh ldap -c "/var/lib/bdii/gip/plugin/glite-info-dynamic-ce"

Check if the returned information is correct.

CREAM LSF module

Bug #88720 Too many '9' in GlueCEPolicyMaxCPUTime for LSF - Not implemented

To test the fix, query the CREAM CE resource bdii in the following way:

ldapsearch -h <CREAM CE node> -x -p 2170 -b "o=grid" | grep GlueCEPolicyMaxCPUTime | grep 9999999999

This shouldn't return anything.

Bug #89767 The LSF dynamic infoprovider shouldn't publish GlueCEStateFreeCPUs and GlueCEStateFreeJobSlots - Not implemented

To test the fix, log as root on the CREAM CE and run:

/sbin/runuser -s /bin/sh ldap -c "/var/lib/bdii/gip/plugin/glite-info-dynamic-ce"

Among the returned information, there shouldn't be GlueCEStateFreeCPUs and GlueCEStateFreeJobSlots.

Bug #89794 LSF info provider doesn't allow - in a queue name - Not implemented

To test the fix, configure a CREAM CE in a LSF installation where at least a queue has a - in its name.

Then log as root on the CREAM CE and run:

/sbin/runuser -s /bin/sh ldap -c "/var/lib/bdii/gip/plugin/glite-info-dynamic-ce"

Check if the returned information is correct.

Bug #90113 missing yaim check for batch system - Not implemented

To test the fix, configure a CREAM CE without having also installed LSF.

yaim installation should fail saying that there were problems with LSF installation.

-- MassimoSgaravatto - 2011-11-07


This topic: CREAM > WebHome > OtherDocumentation > CreamTesting > RegressionTestWorkPlan
Topic revision: r30 - 2012-01-31 - MassimoSgaravatto
 
This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback