Regression Test Work Plan

BLAH

Fixes provided with BLAH 1.20.3

Bug #CREAM-112 sge_local_submit_attributes.sh problem with memory requirements Not Implemented

#CREAM-112 sge_local_submit_attributes.sh problem with memory requirements. Not Implemented

Fixes provided with BLAH 1.20.2

Bug #CREAM-88 Memory leak in BNotifier Not Implemented

#CREAM-88 Memory leak in BNotifier. Not Implemented

Bug #CREAM-94 Command not found error in /usr/libexec/sge_cancel.sh Not Implemented

#CREAM-94 Command not found error in /usr/libexec/sge_cancel.sh. Not Implemented

Bug #CREAM-105 Env vars in blah.config should be exported also in daemons Not Implemented

#CREAM-105 Env vars in blah.config should be exported also in daemons. Not Implemented

Fixes provided with BLAH 1.18.4

Bug #CREAM-94 Command not found error in /usr/libexec/sge_cancel.sh Not Implemented

#CREAM-94 Command not found error in /usr/libexec/sge_cancel.sh. Not Implemented

Bug #CREAM-105 Env vars in blah.config should be exported also in daemons Not Implemented

#CREAM-105 Env vars in blah.config should be exported also in daemons. Not Implemented

Bug #CREAM-112 sge_local_submit_attributes.sh problem with memory requirements Not Implemented

#CREAM-112 sge_local_submit_attributes.sh problem with memory requirements Not Implemented

Fixes provided with BLAH 1.18.2

Bug #97491 (BUpdaterLSF should not execute any bhist query if all the bhist related conf parameter are set to "no") Hard to test - Not implemented

  • the fix has been fully tested by CERN
  • the following is the report provided by Ulrich:
We are running this patch in production at CERN. The patch is motivated by the
fact that bhist calls are very expensive and calls to the command don't scale.
Running the command makes both the CE and the batch master unresponsive
and has therefore a severe impact on the system performance. 
While discussing the issue with the CREAM developers we found out that it
is possible to obsolete these calls and replace them with less expensive batch
system queries. For that we use the btools tool suite which provides a set
of additional LSF batch system queries which return output in machine readable
form.

Pre-requisites:
---------------
Note on btools: btools are shipped in source code with the LSF information
provider plugin. Compiling them requires LSF header files. Binaries depend on
the LSF version which is used, therefore they cannot be shipped or
automatically build due to licensing reasons.

Building Instructions:

- ensure that the LSF headers are installed on your build host

# yum install gcc rpmbuild automake autoconf info-dynamic-scheduler-lsf-btools-2.2.0-1.noarch.rpm
# cd /tmp
# tar -zxvf /usr/src/egi/btools.src.tgz
# cd btools
# ./autogen.sh
# make rpm
and install the resulting rpm

Patch version:
--------------
On EMI1 CEs (our production version) we are using a private build of the
patch glite-ce-blahp-1.16.99-0_201208291258.slc5
On EMI2 we've been testing a new build glite-ce-blahp-1.18.1-2
(Both rpms are private builds we got from Massimo Mezzadri.

Configuration of the patch
--------------------------
in /etc/blah.conf we set:

# use btools to obsolete bhist calls
bupdater_use_btools=yes
bupdater_btools_path=/usr/bin

#
bupdater_use_bhist_for_susp=no
bupdater_use_bhist_for_killed=no
bupdater_use_bhist_for_idle=no
bupdater_use_bhist_time_constraint=no
 
# CERN add caching for LSF queries
lsf_batch_caching_enabled=yes
batch_command_caching_filter=/usr/libexec/runcmd

The runcmd command is shipped with the LSF information providers. You need
at least info-dynamic-scheduler-lsf-2.2.0-1. In our configuration we cache
all batch system responses and share them using  an NFS file system. The cache
directory is a convenient way to check if any bhist calls are done by any of the
CEs by just checking for a cache file. With the above settings there are no
such calls any longer.

Bug #95385 (Misleading message when Cream SGE aborts jobs requesting more than one CPU) Not implemented

Fully tested by CERN: the test requires the CERN environment based on SGE.

Fixes provided with BLAH 1.18.1

Bug #94414 (BLParserLSF could crash if a suspend on an idle job is done) Implemented

Try to suspend a job whose status is "IDLE" and verify that the daemon BLParserLSF doesn't crash.

Bug #94519 (Updater for LSF can misidentify killed jobs as finished) Implemented

Verify that the value for bupdater_use_bhist_for_killed is set to yes, submit and cancel a job and verify that the status of the job reports a "jobstatus=3"

Bug #94712 (Due to a timestamp problem bupdater for LSF can leave job in IDLE state) Hard to Reproduce - Not implemented

Not easy to reproduce (unpredictable behaviour)

Bug #95392 Heavy usage of 'bjobsinfo' still hurts LSF @ CERN Cannot be reproduced - Not implemented

This is a cosmetic update required by CERN team; it can be reproduced only using the tools developed at CERN.

Fixes provided with BLAH 1.18.0

Bug #84261 BNotifier on CREAM CE seems to not restart cleanly Implemented

To test the fix, configure a CREAM CE using the new blparser.

Then try a:

service gLite restart

It shouldn't report the error message:

Starting BNotifier: /opt/glite/bin/BNotifier: Error creating and binding socket: Address already in use

Bug #86238 blahpd doesn't check the status of its daemons when idling Implemented

To test the fix configure a CREAM CE with the new blparser.

Don't use it (i.e. do not submit jobs nor issue any other commands).

kill the budater and bnotifier processes.

Wait for 1 minute: you should see that the bupdater and bnotifier have been restarted.

Bug #86918 Request for passing all submit command attributes to the local configuration script. Not implemented

To test this fix, create/edit the /usr/libexec/pbs_local_submit_attributes.sh (for PBS) script adding:

export gridType x509UserProxyFQAN uniquejobid queue ceid VirtualOrganisation ClientJobId x509UserProxySubject
env > /tmp/filewithenv

Edit the /etc/blah,config file adding:

blah_pass_all_submit_attributes=yes

Submit a job. In the CREAM CE the file /tmp/filewithenv should be created and it should contain the setting of some variables, including the ones exported in the /usr/libexec/pbs_local_submit_attributes.sh script.

Then edit the /etc/blah,config file, removing the previously added line, and adding:

blah_pass_submit_attributes[0]="x509UserProxySubject"
blah_pass_submit_attributes[1]="x509UserProxyFQAN"

Submit a job. In the CREAM CE the file /tmp/filewithenv should be created and it should contain the setting of some variables, including x509UserProxySubject and x509UserProxyFQAN.

Bug #90085 Suspend command doesn't work with old parser Implemented

To test the fix configure a CREAM CE with the old blparser.

Then submit a job and after a while suspend it using the glite-ce-job-suspend command.

Check the job status which eventually should be HELD.

Bug #90331 Not implemented

To test the fix submit a job, ban yourself in the ce (check here how to ban a user) and try a glite-ce-job-status. It should throw an authorization fault.

Bug #90927 Problem with init script for blparser Not implemented

To check the fix, try to stop/start the blparser:

service glite-ce-blparser start / stop 

Then verify that the blparser has indeed been started/stopped

Bug #91318 Request to change functions in blah_common_submit_functions.sh Not implemented

Verify that in /usr/libexec/blah_common_submit_functions.sh there is this piece of code:

function bls_add_job_wrapper ()
{
  bls_start_job_wrapper >> $bls_tmp_file
  bls_finish_job_wrapper >> $bls_tmp_file
  bls_test_working_dir
}

Bug #90101: Missing 'Iwd' Attribute when trasferring files with the 'TransferInput' attribute may cause thread to loop TBD

Bug #92554: BNotifier problem can leave connection in CLOSE_WAIT state TBD

Bug #89504: Repeated notification problem for BLParserLSF TBD

Bug #90082: BUpdaterPBS workaround if tracejob is in infinite loop TBD

Fixes provided with BLAH 1.16.6

See Fixes provided with BLAH 1.18.1

Fixes provided with BLAH 1.16.5

Bug #89527 BLAHP produced -W stage(in/out) directives are incompatible with Torque 2.5.8 Not implemented

To test this fix, configure a CREAM CE with PBS/Torque 2.5.8.

If this is not possible and you have another torque version, apply the change documented at:

https://wiki.italiangrid.it/twiki/bin/view/CREAM/TroubleshootingGuide#5_1_Saving_the_batch_job_submiss

to save the submission script.

Submit a job and check in /tmp the pbs job submission script.

It should contain something like:

#PBS -W stagein=\'CREAM610186385_jobWrapper.sh.18757.13699.1328001723@cream-38.pd.infn.it:/var/c\
ream_sandbox/dteam/CN_Massimo_Sgaravatto_L_Padova_OU_Personal_Certificate_O_INFN_C_IT_dteam_Role\
_NULL_Capability_NULL_dteam042/61/CREAM610186385/CREAM610186385_jobWrapper.sh,cre38_610186385.pr\
oxy@cream-38.pd.infn.it:/var/cream_sandbox/dteam/CN_Massimo_Sgaravatto_L_Padova_OU_Personal_Cert\
ificate_O_INFN_C_IT_dteam_Role_NULL_Capability_NULL_dteam042/proxy/5a34c64e2a8db2569284306e9a472\
3d2d40045a7_13647008746533\'
#PBS -W stageout=\'out_cre38_610186385_StandardOutput@cream-38.pd.infn.it:/var/cream_sandbox/dte\
am/CN_Massimo_Sgaravatto_L_Padova_OU_Personal_Certificate_O_INFN_C_IT_dteam_Role_NULL_Capability\
_NULL_dteam042/61/CREAM610186385/StandardOutput,err_cre38_610186385_StandardError@cream-38.pd.in\
fn.it:/var/cream_sandbox/dteam/CN_Massimo_Sgaravatto_L_Padova_OU_Personal_Certificate_O_INFN_C_I\
T_dteam_Role_NULL_Capability_NULL_dteam042/61/CREAM610186385/StandardError\'

i.e. a stagein and a stageout directives, with escaped quotes around the whole lists.

Bug #91037 BUpdaterLSF should use bjobs to detect final job state Not implemented

To test the fix, configure a CREAM CE with LSF with the new blparser.

Then edit blah.config setting:

 bupdater_debug_level=3

Delete the bupdater log file and restart the blparser.

Submit a job and wait for its completion and wait till then a notification with status 4 is logged in the bnotifier log file.

grep the bupdater log file for the bhist string, which should not be found, apart from something like:

2012-03-09 07:56:15 /usr/bin/BUpdaterLSF: key bupdater_use_bhist_time_constraint not found - using the default:no
2012-03-09 07:56:15 /usr/bin/BUpdaterLSF: key bupdater_use_bhist_for_killed not found - using the default:no
2012-03-09 07:56:15 /usr/bin/BUpdaterLSF: key bupdater_use_bhist_time_constraint not found - using the default:no
2012-03-09 07:56:15 /usr/bin/BUpdaterLSF: key bupdater_use_bhist_for_killed not found - using the default:no

Fixes provided with BLAH 1.16.4

Bug #88974 BUpdaterSGE and BNotifier don't start if sge_helperpath var is not fixed Not implemented

Install and configure (via yaim) a CREAM-CE using GE as batch system.

Make sure that in /etc/blah.config the variable sge_helperpath is commented/is not there.

Try to restart the blparser: /etc/init.d/glite-ce-blahparser restart

It should work without problems. In particular it should not report the following error:

Starting BNotifier: /usr/bin/BNotifier: sge_helperpath not defined. Exiting
[FAILED]
Starting BUpdaterSGE: /usr/bin/BUpdaterSGE: sge_helperpath not defined. Exiting
[FAILED] 

Bug 89859 There is a memory leak in the updater for LSF, PBS and Condor Not implemented

Configure a CREAM CE using the new blparser.

Submit 1000 jobs using e.g. this JDL:

[
executable="/bin/sleep";
arguments="100";
]

Keep monitoring the memory used by the bupdaterxxx process. It should basically not increase.

The test should be done for both LSF and Torque/PBS.

Fixes provided with BLAH 1.16.3

Bug #75854 Problems related to the growth of the blah registry) Not implemented

Configure a CREAM CE using the new BLparser.

Verify that in /etc/blah.config there is: job_registry_use_mmap=yes (default scenario).

Submit 5000 jobs on a CREAM CE using the following JDL:

[
executable="/bin/sleep";
arguments="100";
]

Monitor the BLAH processed. Verify that each of them doesn't use more than 50 MB.

Bug #77776 (BUpdater should have an option to use cached batch system commands) Not implemented

Add:

lsf_batch_caching_enabled=yes
batch_command_caching_filter=/usr/bin/runcmd.pl
in /etc/blah.config.

Create and fill /usr/bin/runcmd.pl with the following content:

#!/usr/bin/perl
#---------------------#
#  PROGRAM:  argv.pl  #
#---------------------#

$numArgs = $#ARGV + 1;
open (MYFILE, '>>/tmp/xyz');
foreach $argnum (0 .. $#ARGV) {
    print MYFILE "$ARGV[$argnum] ";
}
print MYFILE "\n";
close (MYFILE); 

Submit some jobs. Check that in /tmp/xyz the queries to the batch system are recorded. E.g. for LSF something like that should be reported:

/opt/lsf/7.0/linux2.6-glibc2.3-x86/bin/bjobs
-u
all
-l
/opt/lsf/7.0/linux2.6-glibc2.3-x86/bin/bjobs
-u
all
-l
...

Bug #80805 (BLAH job registry permissions should be improved) Not implemented

Check permissions and ownership under /var/blah. They should be:

/var/blah:
total 12
-rw-r--r-- 1 tomcat tomcat    5 Oct 18 07:32 blah_bnotifier.pid
-rw-r--r-- 1 tomcat tomcat    5 Oct 18 07:32 blah_bupdater.pid
drwxrwx--t 4 tomcat tomcat 4096 Oct 18 07:38 user_blah_job_registry.bjr

/var/blah/user_blah_job_registry.bjr:
total 16
-rw-rw-r-- 1 tomcat tomcat 1712 Oct 18 07:38 registry
-rw-r--r-- 1 tomcat tomcat  260 Oct 18 07:38 registry.by_blah_index
-rw-rw-rw- 1 tomcat tomcat    0 Oct 18 07:38 registry.locktest
drwxrwx-wt 2 tomcat tomcat 4096 Oct 18 07:38 registry.npudir
drwxrwx-wt 2 tomcat tomcat 4096 Oct 18 07:38 registry.proxydir
-rw-rw-r-- 1 tomcat tomcat    0 Oct 18 07:32 registry.subjectlist

/var/blah/user_blah_job_registry.bjr/registry.npudir:
total 0

/var/blah/user_blah_job_registry.bjr/registry.proxydir:
total 0

Bug #81354 (Missing 'Iwd' Attribute when trasferring files with the 'TransferInput' attribute causes thread to loop) Not implemented

Log on a cream ce as user tomcat. Create a proxy of yours and copy it as /tmp/proxy (change the ownership to tomcat.tomcat).

Create the file /home/dteam001/dir1/fstab (you can copy /etc/fstab).

Submit a job directly via blah (in the following change pbs and creamtest2 with the relevant batch system and queue names):

$ /usr/bin/blahpd
$GahpVersion: 1.16.2 Mar 31 2008 INFN\ blahpd\ (poly,new_esc_format) $
BLAH_SET_SUDO_ID dteam001
S Sudo\ mode\ on
blah_job_submit 1 [cmd="/bin/cp";Args="fstab\ fstab.out";TransferInput="/home/dteam001/dir1/fstab";TransferOutput="fstab.out";TransferOutputRemaps="fstab.out=/home/dteam001/dir1/fstab.out";gridtype="pbs";queue="creamtest2";x509userproxy="/tmp/proxy"]
S
results
S 1
1 0 No\ error pbs/20111010/304.cream-38.pd.infn.it

Eventually check the content of /home/dteam001/dir1/ where you see both fstab and fstab.out:

$ ls /home/dteam001/dir1/
fstab  fstab.out

Bug #81824 (yaim-cream-ce should manage the attribute bupdater_loop_interval) Implemented

Set BUPDATER_LOOP_INTERVAL to 30 in siteinfo.def and reconfigure via yaim. Then verify that in blah.config there is:

bupdater_loop_interval=30

Bug #82281 (blahp.log records should always contain CREAM job ID) Not implement

Submit a job directly to CREAM using CREAM-CLI. Then submit a job to CREAM through the WMS.

In the accounting log file (/var/log/cream/accounting/blahp.log-<date>) in both cases the clientID field should end with the numeric part of the CREAM jobid, e.g.:

"timestamp=2011-10-10 14:37:38" "userDN=/C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Massimo Sgaravatto" "userFQAN=/dteam/Role=NULL/Capability=NULL" "userFQAN=/dteam/NGI_IT/Role=NULL/Capability=NULL" "ceID=cream-38.pd.infn.it:8443/cream-pbs-creamtest2" "jobID=CREAM956286045" "lrmsID=300.cream-38.pd.infn.it" "localUser=18757" "clientID=cre38_956286045"

"timestamp=2011-10-10 14:39:57" "userDN=/C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Massimo Sgaravatto" "userFQAN=/dteam/Role=NULL/Capability=NULL" "userFQAN=/dteam/NGI_IT/Role=NULL/Capability=NULL" "ceID=cream-38.pd.infn.it:8443/cream-pbs-creamtest2" "jobID=https://devel19.cnaf.infn.it:9000/dLvm84LvD7w7QXtLZK4L0A" "lrmsID=302.cream-38.pd.infn.it" "localUser=18757" "clientID=cre38_315532638"

Bug #82297 (blahp.log rotation period is too short) Not implemented

Check that in /etc/logrotate.d/blahp-logrotate rotate is equal to 365:

# cat /etc/logrotate.d/blahp-logrotate
/var/log/cream/accounting/blahp.log {
        copytruncate
        rotate 365
        size = 10M
        missingok
        nomail
}

Bug #83275 (Problem in updater with very short jobs that can cause no notification to cream) Not implemented

Configure a CREAM CE using the new blparser. Submit a job using the following JDL:

[
executable="/bin/echo";
arguments="ciao";
]

Check in the bnotifier log file (/var/log/cream/glite-ce-bnotifier.log that at least a notification is sent for this job, e.g.:

2011-11-04 14:11:11 Sent for Cream:[BatchJobId="927.cream-38.pd.infn.it"; JobStatus=4; ChangeTime="2011-11-04 14:08:55"; JwExitCode=0; Reason="reason=0"; ClientJobId="622028514"; BlahJobName="cre38_622028514";]

Bug #83347 (Incorrect special character handling for BLAH Arguments and Environment attributes) Not implemented

Log on a cream ce as user tomcat. Create a proxy of yours and copy it as /tmp/proxy (change the ownership to tomcat.tomcat).

Create the file /home/dteam001/dir1/fstab (you can copy /etc/fstab).

Submit a job directly via blah (in the following change pbs and creamtest1 with the relevant batch system and queue names):

BLAH_JOB_SUBMIT 1 [Cmd="/bin/echo";Args="$HOSTNAME";Out="/tmp/stdout_l15367";In="/dev/null";GridType="pbs";Queue="creamtest1";x509userproxy="/tmp/proxy";Iwd="/tmp";TransferOutput="output_file";TransferOutputRemaps="output_file=/tmp/stdout_l15367";GridResource="blah"]

Verify that in the output file there is the hostname of the WN.

Bug #87419 (blparser_master add some spurious character in the BLParser command line) Not implemented

Configure a CREAM CE using the old blparser. Check the blparser process using ps. It shouldn't show urious characters:

root     26485  0.0  0.2 155564  5868 ?        Sl   07:36   0:00 /usr/bin/BLParserPBS -d 1 -l /var/log/cream/glite-pbsparser.log -s /var/torque -p 33333 -m 56565


CREAM

Fixes provided with CREAM 1.16.2

Bug #CREAM-113 Enhancement of the CREAM DB API (Sec. Vuln. )Not Implemented

#CREAM-113 Enhancement of the CREAM DB API Not Implemented

Bug #CREAM-124 Fix a bug related to gridsite (Sec. Vuln. )Not Implemented

#CREAM-124 Fix a bug related to gridsite Not Implemented

Bug #CREAM-125 SOAP Header is not set Not Implemented

#CREAM-125 SOAP Header is not set Not Implemented

Fixes provided with CREAM 1.16.1

Bug #CREAM-107 Bad timezone format Not Implemented

#CREAM-107 Bad timezone format. Not Implemented

Bug #CREAM-103 Wrong symlinks upgrading from EMI-2 to EMI-3 Not Implemented

#CREAM-103 Wrong symlinks upgrading from EMI-2 to EMI-3. Not Implemented

Bug #CREAM-101 Wrong time format for MaxWallClockTime Not Implemented

#CREAM-101 Wrong time format for MaxWallClockTime. Not Implemented

Bug #CREAM-99 Python error on ERT calculation: local variable 'est' referenced before assignment Not Implemented

#CREAM-99 Python error on ERT calculation: local variable 'est' referenced before assignment. Not Implemented

Bug #CREAM-82 Check permission for /var/cream_sandbox Not Implemented

#CREAM-82 Check permission for /var/cream_sandbox. Not Implemented

Bug #CREAM-78 Check for /etc/lrms in config_cream_gip_scheduler_plugin Not Implemented

#CREAM-78 Check for /etc/lrms in config_cream_gip_scheduler_plugin. Not Implemented

Bug #CREAM-77 List index error from persistent estimator Not Implemented

#CREAM-77 List index error from persistent estimator. Not Implemented

Bug #CREAM-74 Remove trustmanager from the provides list of cream-common Not Implemented

#CREAM-74 Remove trustmanager from the provides list of cream-common. Not Implemented

Bug #CREAM-75 CREAM should avoid to log the error messages by including even the full stack trace (i.e printStackTrace()) Not Implemented

#CREAM-75 CREAM should avoid to log the error messages by including even the full stack trace (i.e printStackTrace()). Not Implemented

Bug #CREAM-83 CREAM should be able to immediately kill also queued jobs, when the proxy is expiring Not Implemented

#CREAM-83 CREAM should be able to immediately kill also queued jobs, when the proxy is expiring. Not Implemented

Bug #CREAM-84 The DelegationPurger may cause a java.lang.OutOfMemoryError exception Not Implemented

#CREAM-84 The DelegationPurger may cause a java.lang.OutOfMemoryError exception. Not Implemented

Bug #CREAM-89 --leaseId option doesn't work Not Implemented

#CREAM-89 --leaseId option doesn't work. Not Implemented

Bug #CREAM-90 --help option doesn't work on glite-ce-job-submit and glite-ce-event-query Not Implemented

#CREAM-90 --help option doesn't work on glite-ce-job-submit and glite-ce-event-query. Not Implemented

Bug #CREAM-91 man page missing for glite-ce-job-lease Not Implemented

#CREAM-91 man page missing for glite-ce-job-lease. Not Implemented

TODO Bug #CREAM-102 cream job-wrapper gets stuck in EMI-3 if perusal is enabled Not Implemented

#CREAM-102 cream job-wrapper gets stuck in EMI-3 if perusal is enabled. Not Implemented

Fixes provided with CREAM 1.15.2

Bug #101221 CREAM sends wrong authorization requests to Argus containing attributes with empty values. Not Implemented

  • The bug is not reproducible because it is triggered out just by the certificate of a specific user

Bug #101108 Minor issues from INFN-T1 Not Implemented

  • On SL6 host install and configure with YAIM the CREAM CE disabling both ES and CEMonitor
  • In the YAIM log verify that the tomcat server has been restarted just twice, the first time for the configuration of all Web-Services and then for the configuration of secure-tomcat
  • Verify that the files ce-cream-es.xml and ce-monitor.xml are not present in the directory /etc/tomcat6/Catalina/localhost/
  • Increase the verbosity of the CREAM log from info to debug, restart the service, submit a big bunch of jobs and verify that the size of a log file is about 1Mb.

Fixes provided with CREAM 1.14.6

TODO Bug #CREAM-127 - Mount info item not parsed if it contains colon - Not Implemented

#CREAM-127 Mount info item not parsed if it contains colon Not Implemented

Bug #CREAM-113 Enhancement of the CREAM DB API (Sec. Vuln. )Not Implemented

#CREAM-113 Enhancement of the CREAM DB API Not Implemented

Bug #CREAM-124 Fix a bug related to gridsite (Sec. Vuln. )Not Implemented

#CREAM-124 Fix a bug related to gridsite Not Implemented

Fixes provided with CREAM 1.14.5

TODO Bug #CREAM-111 Wrong tokenization for epilogue argument list Not Implemented

#CREAM-111 Wrong tokenization for epilogue argument list Not Implemented

TODO Bug #CREAM-101 Wrong time format for MaxWallClockTime Not Implemented

#CREAM-101 Wrong time format for MaxWallClockTime. Not Implemented

TODO Bug #CREAM-99 Python error on ERT calculation: local variable 'est' referenced before assignment Not Implemented

#CREAM-99 Python error on ERT calculation: local variable 'est' referenced before assignment. Not Implemented

TODO Bug #CREAM-84 The DelegationPurger may cause a java.lang.OutOfMemoryError exception Not Implemented

#CREAM-84 The DelegationPurger may cause a java.lang.OutOfMemoryError exception. Not Implemented

TODO Bug #CREAM-83 CREAM should be able to immediately kill also queued jobs, when the proxy is expiring Not Implemented

#CREAM-83 CREAM should be able to immediately kill also queued jobs, when the proxy is expiring. Not Implemented

TODO Bug #CREAM-82 Check permission for /var/cream_sandbox Not Implemented

#CREAM-82 Check permission for /var/cream_sandbox. Not Implemented

Bug #CREAM-78 Check for /etc/lrms in config_cream_gip_scheduler_plugin Not Implemented

#CREAM-78 Check for /etc/lrms in config_cream_gip_scheduler_plugin. Not Implemented

TODO Bug #CREAM-77 List index error from persistent estimator Not Implemented

#CREAM-77 List index error from persistent estimator. Not Implemented

Bug #CREAM-75 CREAM should avoid to log the error messages by including even the full stack trace (i.e printStackTrace()) Not Implemented

#CREAM-75 CREAM should avoid to log the error messages by including even the full stack trace (i.e printStackTrace()). Not Implemented

Fixes provided with CREAM 1.14.3

Bug #99740 updateDelegationProxyInfo error: Rollback executed due to Deadlock Not Implemented

  • delegate the proxy credentials by specifying the ID of the delegated proxy
    > glite-ce-delegate-proxy -e cream-27.pd.infn.it:8443 myproxyid
  • submit ~10K long jobs to cream by forcing the submission to use the previously delegated user credentials, identified with the specified ID
    > glite-ce-job-submit --delegationId myproxyid -a -r cream-27.pd.infn.it:8443/cream-lsf-grid02 myjob.jdl
  • during the submission, at ~7000 jobs, execute in parallel several proxy renewal commands referring to the previously delegated user credentials
    > glite-ce-proxy-renew -e cream-27.pd.infn.it:8443 myproxyid
  • check the cream's logs and find the message error "updateDelegationProxyInfo error: Rollback executed due to: Deadlock found when trying to get lock; try restarting transaction": such message should not be present.

TODO Bug #99738 Under stress conditions due to job submissions, the command queue may accumulate thousand of job purging commands Not Implemented

  • submit ~5000 very short jobs to cream and wait their terminal state (e.g. DONE-OK)
  • edit the cream's configuration file (i.e. /etc/glite-ce-cream/cream-config.xml)
  • change the JOB_PURGE_RATE parameter value to 2 minutes
    <parameter name="JOB_PURGE_RATE" value="2" /> <!-- minutes -->
  • change the JOB_PURGE_POLICY parameter value to "ABORTED 2 minutes; CANCELLED 2 minutes; DONE-OK 2 minutes; DONE-FAILED 2 minutes; REGISTERED 2 days;"
    <parameter name="JOB_PURGE_POLICY" value="ABORTED 2 minutes; CANCELLED 2 minutes; DONE-OK 2 minutes; DONE-FAILED 2 minutes; REGISTERED 2 days;" />
  • restart cream (i.e. tomcat)
    > service tomcat6 restart
  • submit further jobs to cream
  • meanwhile, during the submission, check the cream log and observe the messages about the JobPurger
  • every 2 minutes the should be logged the JobPurger activity:
JobPurger - purging 0 jobs with status REGISTERED <= Wed Jan 16 16:55:55 CET 2013
JobPurger - purging 0 jobs with status ABORTED <= Tue Jan 08 16:56:55 CET 2013
JobPurger - purging 0 jobs with status CANCELLED <= Fri Jan 18 16:51:55 CET 2013
JobPurger - purging 500 jobs with status DONE-OK <= Fri Jan 18 16:51:55 CET 2013
JobPurger - purging 0 jobs with status DONE-FAILED <= Tue Jan 08 16:56:55 CET 2013
  • access to the cream database
  • execute
    use creamdb; select * from command_queue where name="PROXY_RENEW";
    the result should be always "Empty set (0.00 sec)"

Bug #98144 The switching off of the JobSubmissionManager makes the CREAM service not available for the users Not Implemented

  • Switch off the JobSubmissionManager in the CREAM configuration file (/etc/glite-ce-cream/cream-config.xml)
<parameter name="JOB_SUBMISSION_MANAGER_ENABLE" value="false" />

  • Restart tomcat
service tomcat5 restart

  • Submit a job by the CREAM UI.

  • Check that a message error like the following doesn't appear:
"Received NULL fault; the error is due to another cause: FaultString=[CREAM service not available: configuration failed!] - FaultCode=[SOAP-ENV:Server] - FaultSubCode=[SOAP-ENV:Server]"

TODO Bug #88134 JobWrapper doesn't handle correctly the jdl attribute “PerusalListFileURI” Not Implemented

  • Create the following two files:

perusal.jdl

[
Type="Job";
JobType="Normal";
Executable = "perusal.sh";
StdOutput = "stdout.log";
StdError = "stderr.log";
InputSandbox = "perusal.sh";
OutputSandbox = {"stdout.log", "stderr.log", "results.txt"};
PerusalFilesDestURI="gsiftp://cream-05.pd.infn.it/tmp";
PerusalFileEnable = true;
PerusalTimeInterval = 20;
outputsandboxbasedesturi="gsiftp://localhost";
PerusalListFileURI="gsiftp://cream-05.pd.infn.it/tmp/filelist.txt"
]

perusal.sh

#!/bin/sh
i=0
while ((i < 10))
do
date
voms-proxy-info --all >&2
df >> results.txt
sleep 10
let "i++"
echo i = $i
done

N.B: For this test, the file "gsiftp://cream-05.pd.infn.it/tmp/filelist.txt" must not exist!

  • Submit the job

  • Check that after about two minutes, the job terminated successfully.

Bug #95637 glite-ce-job-submit --help doesn't print out anything Not Implemented

  • execute the command:
    glite-ce-job-submit --help
  • the error message "JDL file not specified in the command line arguments. Stop." should not appear anymore. In place, the inline help is shown:
    >glite-ce-job-submit --help
    CREAM User Interface version 1.2.0
    
    glite-ce-job-submit allows the user to submit a job for execution on a CREAM based CE
    
    Usage: glite-ce-job-submit [options] -r <CEID> <JDLFile>
    
      --resource, -r CEID  Select the CE to send the JDL to. Format must be <host>[:<port>]/cream-<lrms-system-name>-<queue-name>
    
      <JDLFile>  Is the file containing the JDL directives for job submission;
    
    Options:
      --help, -h                
    [...]

Bug #95738 glite-ce-job-submit: error message to be improved if JDL file is missing Not Implemented

  • execute the command (remark: the test.jdl file doesn't exist):
    glite-ce-job-submit -a -r cream-23.pd.infn.it:8443/cream-lsf-creamtest1 test.jdl
  • the error message "Error while processing file [pippo.jdl]: Syntax error. Will not submit this JDL" should not appear anymore. In place, the message "JDL file [test.jdl] does not exist. Skipping..." should be prompted
    >glite-ce-job-submit -a -r cream-23.pd.infn.it:8443/cream-lsf-creamtest1 pippo.jdl
    2013-01-21 17:49:51,196 ERROR - JDL file [pippo.jdl] does not exist. Skipping...
    

TODO Bug #95041 YAIM could check the format of CE_OTHERDESCR Not Implemented

  • in the site-info.def define the variable CE_OTHERDESCR="Cores=10.5 , Benchmark=8.0-HEP-SPEC06"; run yaim and verify that no error is reported
  • define the variable CE_OTHERDESCR="Cores=10.5"; run yaim and verify that no error is reported
  • define the variable CE_OTHERDESCR="Cores=10.5,Benchmark=8.0", run yaim and verify that an error is reported

TODO Bug #98440 Missing revision number in EndpointImplementationVersion Not Implemented

  • run the following command
    ldapsearch -x -H ldap://hostname:2170 -b o=glue '(&(objectclass=GLUE2Endpoint)(GLUE2EndpointImplementationName=CREAM))' GLUE2EndpointImplementationVersion
    and verify that the GLUE2EndpointImplementationVersion reports the revision number

TODO Bug #98850 Empty ACBR list in SHARE variable Not Implemented

  • set the YAIM variable FQANVOVIEWS to "no"
  • set one the YAIM variables <QUEUE>_GROUP_ENABLE so that it contains more FQANs for the same VO (for example "atlas /atlas/ROLE=lcgadmin /atlas/ROLE=production /atlas/ROLE=pilot")
  • run the YAIM configurator and verify that in the file /etc/glite-ce-glue2/glite-ce-glue2.conf the items SHARE_<QUEUE>_<VO>_ACBRS don't contain any empty list

Bug #99072 Hard-coded reference to tomcat5.pid Not Implemented

  • on SL6 run the executable glite
    /usr/bin/glite_cream_load_monitor /etc/glite-ce-cream-utils/glite_cream_load_monitor.conf --show
  • verify that the value of "Threshold for tomcat FD" is not zero

Bug #99085 Improve parsing of my.cnf Not Implemented

  • in the file /etc/my.cnf specify the following value
    max_connections = 256
  • runt the YAIM configurator and verify that the function config_cream_db works correctly

TODO Bug #99282 Wrong regular expression for group.conf parsing Not Implemented

  • define YAIM variables so that one VO name is the prefix or the suffix of another one (for examples VOS=" ops dgops ")
  • run the YAIM configurator and verify that in the file /var/lib/bdii/gip/ldif/ComputingShare.ldif no references to one VO appear as attributes of the Shares or Policies of the other

TODO Bug #99747 glite-info-dynamic-ce does not update GLUE2ComputingShareServingState Not Implemented

  • disable the submissions to the CE
  • run the command
    /sbin/runuser -s /bin/sh ldap -c "/var/lib/bdii/gip/plugin/glite-info-cream-glue2" |grep ServingState
  • verify that the values for each "GLUE2ComputingShareServingState" are set to "draining"

Bug #99823 SHA-1 algorithm for PKCS10 generation in CREAM delegation service Not Implemented

  • insert the following line
    log4j.logger.org.glite.ce.cream.delegationmanagement.cmdexecutor=debug, fileout
    in the file /etc/glite-ce-cream/log4j.properties
  • restart tomcat
  • delegate a proxy on the CE using for the client authentication a voms-proxy whose signature algorithm is based on SHA-2, see
    openssl x509 -noout -text -in [voms-proxy]
  • verify the value reported in the log for "Signature algorithm to be used for pkcs10" is the same of the one used for the voms-proxy

Fixes provided with CREAM 1.14.2

TODO Bug #95328 In cluster mode, YAIM does not set GlueCEInfoHostName for CREAMs Not Implemented

  • Configure a CREAM CE in cluster mode, run the command
    ldapsearch -h cream-36.pd.infn.it -x -p 2170 -b "o=grid" objectclass=GlueCE | grep InfoHostName
    and verify that one or more items exists.

TODO Bug #95973 Missing Glue capability in GLUE2EntityOtherInfo Not Implemented

  • Define the YAIM variable CE_CAPABILITY="CPUScalingReferenceSI00=10 SNMPSupport=yes"
  • Configure with YAIM and run the command
    ldapsearch -h cream-36.pd.infn.it -x -p 2170 -b "o=glue" | grep GLUE2EntityOtherInfo
  • Verify that the attributes defined above are reported separately

Bug #96306 Wrong lowercase conversion for VO Tags Implemented

  • define a tag with uppercase in the file /opt/glite/var/info/[hostname]/[vo]/[vo].list
  • run the command
    ldapsearch -h cream-36.pd.infn.it -x -p 2170 -b "o=glue" | grep TESTTAG
  • verify that the attributes GLUE2ApplicationEnvironmentID and GLUE2ApplicationEnvironmentAppName are uppercase.

Bug #96310 Wrong lowercase conversion for Glue-1 VO Tags. Not Implemented

  • Put case-insensitive duplicated items, for example "RMON3.1 RMon3.1", in the file /opt/glite/var/info/[hostname]/[vo name]/[voname].list and file /opt/edg/var/info/[vo name]/[voname].list
  • Put a case-insensitive duplicated attributes in /var/lib/bdii/gip/ldif/static-file-Cluster.ldif, for example:
    GlueHostApplicationSoftwareRunTimeEnvironment: MPIch
    GlueHostApplicationSoftwareRunTimeEnvironment: MPICH
    
  • Run the wrapper script /var/lib/bdii/gip/plugin/glite-info-dynamic-software-wrapper
  • Verify that no duplicated attributes are printed on the stdout

Bug #97441 CREAM: Unwanted auto-updating of the field "creationTime" on the creamdb database Implemented

  • 1) access to the CREAM DB
  • 2) execute the following SQL command:
    use creamdb;
  • 3) execute the following SQL query and notice the result:
    select startUpTime, creationTime from db_info;
  • 4) configure CREAM with YAIM
  • 5) repeat the steps 1, 2, 3
  • 6) check the query result at step 4: the "creationTime" value should be the same in both results while the "startUpTime" should be changed.

Bug #96512 JobDBAdminPurger can't find commons-logging.jar Implemented

  • on the CREAM node, try to purge a job using the /usr/sbin/JobDBAdminPurger.sh script (none error should be reported)

Bug #97106 CREAM JW - fatal_error: command not found. Not Implemented

  • select an EMI-2 WN and uninstall the "glite-lb-client-progs" rpm in order to remove the /usr/bin/glite-lb-logevent program
  • submit several jobs to CREAM and check if at least one of them has been executed on the previously modified WN (there is no way to force to submission on a specific WN through CREAM)
  • open the StandardError file on the job sandbox: it should contain just the "Cannot find lb_logevent command" message.

TODO Bug #94418 The SIGTERM signal should be issued to all the processes belonging to the job. Not Implemented

  • Connect to the ce under test with your pool account user
  • Edit the file /tmp/test_bug94418.sh (as your pool account user) and paste on it the following text:
    #!/bin/bash                                                                                
                                                                                               
    OUTF="/tmp/sdpgu.out"                                                                      
                                                                                               
    MYPID=$$
    
    sleep 3600 & PID1=$!
    
    sleep 3600 & PID2=$!
    echo "MYPID=${MYPID}, PID1=${PID1}, PID2=${PID2}" > $OUTF
    echo "MYPID=${MYPID}, PID1=${PID1}, PID2=${PID2}"
    # supposedly this should kill the child processes on SIGTERM.
    trap "kill $PID1 $PID2" SIGTERM
    
    wait
    
  • On the UI prepare a jdl for executing the above script: example
    [
       Type = "job";
       JobType = "normal";
       VirtualOrganisation = "dteam";
       executable="test_bug94418.sh";
       InputSandbox = {"test_bug94418.sh"};
       InputSandboxBaseURI = "gsiftp://cream-XX.pd.infn.it/tmp"
    ]
  • submit the jdl
  • wait until the job reaches the REALLY-RUNNING state
  • check the name of the WN into which the job is running
  • on the WN, read the file /tmp/sdpgu.out which contains the PID of three processes: example
    MYPID=10551, PID1=10553, PID2=10554
  • check if all three processes exist
  • cancel the job
  • wait for its terminal state (CANCELLED)
  • check again on the WN if all three processes exist (actually they should be disappeared)

Bug #98707 Wrong warning message form ArgusPEPClient configuration Not Implemented

  • Run the CREAM service with default priority log level (Info)
  • Perform a simple operation, for example a job list
  • Verify that in the log does not appear any warning such as "Missing or wrong argument "

Fixes provided with CREAM 1.14.1

TODO Bug #89153 JobDBAdminPurger cannot purge jobs if CREAM DB is on another host - Not Implemented

  • on the CREAM node, which MUST be a different machine than the one where is installed the creamdb (DB node), edit the cream-config.xml
  • find the "url" field within the datasource_creamdb element (e.g. url="jdbc:mysql://localhost:3306/creamdb?autoReconnect=true")
  • replace the "localhost" with the name of the remote machine (DB node) hosting the creamdb (e.g. "jdbc:mysql://cream-47.pd.infn.it:3306/creamdb?autoReconnect=true")
  • find and annotate the value of the "username" field defined in the same xml element
  • on the CREAM DB node execute the following sql commands as root (NB: sobstitute the "username" with the right value previously annotated): GRANT ALL PRIVILEGES ON creamdb.* to username@'%' IDENTIFIED BY 'username'; FLUSH PRIVILEGES;
  • on the CREAM node execute the JobDBAdminPurger.sh script (none error should be reported)

Bug #95356 Better parsing for static definition files in lcg-info-dynamic-scheduler - Implemented

  • Insert in the file /var/lib/bdii/gip/ldif/ComputingShare.ldif an empty or corrupted attribute GLUE2PolicyRule (i.e. "GLUE2PolicyRule:" or "GLUE2PolicyRule: test").
  • Verify that in the file /etc/lrms/scheduler.conf the attribute "outputformat" is set to glue2 or both.
  • Run the command
    /var/lib/bdii/gip/plugin/glite-info-dynamic-scheduler-wrapper
    and verify that no exceptions are raised.

Bug #95480 CREAM doesn't transfert the output files remotely under well known conditions - Not Implemented

  • edit the cream-config.xml and set SANDBOX_TRANSFER_METHOD="LRMS"
  • restart the cream service
  • submit the following jdl from a WMS having the URL lexicographically greater than "gsiftp://localhost" (e.g. wms01.grid.hep.ph.ic.ac.uk)
[
Type = "Job";
#VAR1 = "test1";
#VAR2 = "test2";
executable = "/bin/echo";
Arguments = "hello world!!!";
StdOutput="stdout";
StdError="stderr";
OutputSandbox = {"stdout","stderr"};
]
  • glite-wms-job-output (retrieve and check the output files)

Bug #95552 Malformed URL from glite-ce-glue2-endpoint-static - Implemented

Run the command

/usr/libexec/glite-ce-glue2-endpoint-static /etc/glite-ce-glue2/glite-ce-glue2.conf | grep GLUE2EndpointURL
and verify that the URL is correctly defined (contains ":")

Example of the error:

GLUE2EndpointURL: https://cream-48.pd.infn.it8443/ce-cream/services

TOREMOVE Bug #95593 CREAM cannot insert in the command queue if the lenght of the localUser field is > 14 chars - Not Implemented

  • create a new local pool account having the name >14 chars size long
  • reconfigure the CE with YAIM (define USE_ARGUS=no if you don't want to configure ARGUS for handling the new local pool account)
  • execute any asynchronous CREAM command (e.g. jobStart, jobCancel, etc) by using the proper grid credentials which will be mapped to the new local user
  • check the CREAM log file: none message error like "Cannot enqueue the command id=-1: Data truncation: Data too long for column 'commandGroupId' at row 1 (rollback performed)" should be reported

Bug #96055 Wrong DN format in logfiles for accounting - Not Implemented

  • submit a job and wait for success
  • verify that in the file /var/log/cream/accounting/blahp.log-yyyymmdd the value for attribute userDN is published with X500 format (i.e. "/C=../O=...)

Bug #93091 Add some resubmission machinery to CREAM - Not Implemented

  • edit the blah script /usr/libexec/lsf_submit.sh
  • add "sleep 10m" on the top of the script
  • submit a job
  • after 200 seconds, the following error should appears in the cream log file:
"org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor - submission to BLAH failed [jobId=CREAMXXXXXXXXX; reason=BLAH error: submission command failed (exit code = 143) (stdout:) (stderr: <blah> execute_cmd: 200 seconds timeout expired, killing child process.-) N/A (jobId = CREAMXXXXXXXXX); retry count=1/3]
org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor - sleeping 10 sec...
org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor - sleeping 10 sec... done"
  • restore the original lsf_submit.sh script
  • after a while the job should be successufully submitted (i.e. ... JOB CREAMXXXXXXXXX STATUS CHANGED: PENDING => IDLE)
  • if you don't restore the script, the submission will be tried 3 times, then the job will abort: "JOB CREAMXXXXXXXXX STATUS CHANGED: PENDING => ABORTED [description=submission to BLAH failed [retry count=3]] [failureReason=BLAH error: submission command failed (exit code = 143) (stdout:) (stderr: execute_cmd: 200 seconds timeout expired, killing child process.-) N/A ..."

Fixes provided with CREAM 1.14

Bug #59871 lcg-info-dynamic-software must split tag lines on white space - Implemented

To verify the fix edit a VO.list file under =/opt/glite/var/info/cream-38.pd.infn.it/.list adding:

tag1 tag2
tag3

Wait 3 minutes and then query the resource bdii, where you should see:

...
GlueHostApplicationSoftwareRunTimeEnvironment: tag1
GlueHostApplicationSoftwareRunTimeEnvironment: tag2
GlueHostApplicationSoftwareRunTimeEnvironment: tag3
...

TODO Bug #68968 lcg-info-dynamic-software should protect against duplicate RTE tags - Not Implemented

To verify the fix edit a VO.list file under /opt/glite/var/info/cream-38.pd.infn.it/VO adding:

tag1
tag2
TAG1
tag1

Then query the resource bdii:

ldapsearch -h <CE host> -x -p 2170 -b "o=grid" | grep -i tag

This should return:

GlueHostApplicationSoftwareRunTimeEnvironment: tag1
GlueHostApplicationSoftwareRunTimeEnvironment: tag2

TODO Bug #69854 CreamCE should publish non-production state when job submission is disabled - Not Implemented

Disable job submission with glite-ce-disable-submission. Wait 3 minutes and then perform the following ldap query:

# ldapsearch -h <CREAM CE node> -x -p 2170 -b "o=grid" | grep GlueCEStateStatus

For each GlueCE this should return:

GlueCEStateStatus: Draining

Then re-enable the submission. Edit the configuration file /etc/glite-ce-cream-utils/glite_cream_load_monitor.conf to trigger job submission disabling. E.g. change:

$MemUsage  = 95;

with:

$MemUsage  = 1;

Wait 15 minutes and then perform the following ldap query:

# ldapsearch -h <CREAM CE node> -x -p 2170 -b "o=grid" | grep GlueCEStateStatus

For each GlueCE this should return:

GlueCEStateStatus: Draining

Bug #69857 Job submission to CreamCE is enabled by restart of service even if it was previously disabled - Implemented

STATUS: Implemented

To test the fix:

  • disable the submission on the CE
This can be achieved via the `glite-ce-disable-submission host:port` command (provided by the CREAM CLI package installed on the UI), that can be issued only by a CREAM CE administrator, that is the DN of this person must be listed in the /etc/grid-security/admin-list file of the CE.

Output should be: "Operation for disabling new submissions succeeded"

  • restart tomcat on the CREAM CE (service tomcat restart - on CE)

  • verify if the submission is disabled (glite-ce-allowed-submission)
This can be achieved via the `glite-ce-enable-submission host:port` command (provided by the CREAM CLI package installed on the UI).

Output should be: "Job submission to this CREAM CE is disabled"

Bug #77791 CREAM installation does not fail if sudo is not installed - Not Implemented

Try to configure via yaim a CREAM-CE where the sudo executable is not installed,

The configuration should fail saying:

 ERROR: sudo probably not installed !

TOREMOVE Bug #79362 location of python files provided with lcg-info-dynamic-scheduler-generic-2.3.5-0.sl5 - Not Implemented - obsolete

To verify the fix, do a:

rpm -ql dynsched-generic

and verify that the files are installed in usr/lib/python2.4 and not more in /usr/lib/python.

TOREMOVE Bug #80295 Allow dynamic scheduler to function correctly when login shell is false - Not Implemented - overriden by https://savannah.cern.ch/bugs/?99747

To verify the fix, log on the CREAM CE as user root and run:

/sbin/runuser -s /bin/sh ldap -c "/usr/libexec/lcg-info-dynamic-scheduler -c /etc/lrms/scheduler.conf"

It should return some information in ldif format

Bug #80410 CREAM bulk submission CLI is desirable - Not Implemented

To test the fix, specify multiple JDLs in the glite-ce-job-submit command, e.g.:

glite-ce-job-submit --debug -a -r cream-47.pd.infn.it:8443/cream-lsf-creamtest1 jdl1.jdl jdl2.jdl jdl3.jdl

Considering the above example, verify that 3 jobs are submitted and 3 jobids are returned.

Bug #81734 removed conf file retrieve from old path that is not EMI compliant - Not Implemented

To test the fix, create the conf file /etc/glite_cream.conf with the following content:

[
CREAM_URL_PREFIX="abc://";
]

Try then e.g. the following command:

glite-ce-job-list --debug cream-47.pd.infn.it

It should report that it is trying to contact abc://cream-47.pd.infn.it:8443//ce-cream/services/CREAM2:

2012-01-13 14:44:39,028 DEBUG - Service address=[abc://cream-47.pd.infn.it:8443//ce-cream/services/CREAM2]

Move the conf file as /etc/VO/glite_cream.conf and repeat the test which should give the same result

Then move the conf file as ~/.glite/VO/glite_cream.conf and repeat the test which should give the same result

Bug #82206 yaim-cream-ce: BATCH_LOG_DIR missing among the required attributes - Not Implemented

Try to configure a CREAM CE with Torque using yaim without setting BLPARSER_WITH_UPDATER_NOTIFIER and without setting BATCH_LOG_DIR.

It should fail saying:

 INFO: Executing function: config_cream_blah_check 
 ERROR: BATCH_LOG_DIR is not set
 ERROR: Error during the execution of function: config_cream_blah_check

Bug #83314 Information about the RTEpublisher service should be available also in glue2 - Not Implemented

Check if the resource BDII publishes glue 2 GLUE2ComputingEndPoint objectclasses with GLUE2EndpointInterfaceName equal to org.glite.ce.ApplicationPublisher. If the CE is configured in no cluster mode there should be one of such objectclass. If the CE is configured in cluster mode and the gLite-CLUSTER is deployed on a different node, there shouldn't be any of such objectclasses.

ldapsearch -h  <CREAM CE hostname> -x -p 2170 -b "o=glue" "(&(objectclass=GLUE2ComputingEndPoint)(GLUE2EndpointInterfaceName=org.glite.ce.ApplicationPublisher))"

Bug #83338 endpointType (in GLUE2ServiceComplexity) hardwired to 1 in CREAM CE is not always correct - Implemented

Perform the following query on the resource bdii of the CREAM CE:

 -p 2170 -b "o=glue" | grep -i endpointtype

endpointtype should be 3 if CEMon is deployed (USE_CEMON is true). 2 otherwise.

Bug #83474 Some problems concerning glue2 publications of CREAM CE configured in cluster mode - Not Implemented

Configure a CREAM CE in cluster mode, with the gLite-CLUSTER configured on a different host.

ldapsearch -h <CREAM CE hostname> -x -p 2170 -b "o=glue" objectclass=GLUE2ComputingService

ldapsearch -h <CREAM CE hostname> -x -p 2170 -b "o=glue" "(&(objectclass=GLUE2ComputingEndPoint)(GLUE2EndpointInterfaceName=org.glite.ce.CREAM))"

  • Check if the resource BDII publishes glue 2 GLUE2Manager objectclasses. There shouldn't be any GLUE2Manager objectclass.

ldapsearch -h <CREAM CE hostname> -x -p 2170 -b "o=glue" objectclass=GLUE2Manager

  • Check if the resource BDII publishes glue 2 GLUE2Share objectclasses. There shouldn't be any GLUE2Share objectclass.

ldapsearch -h <CREAM CE hostname> -x -p 2170 -b "o=glue" objectclass=GLUE2Share

ldapsearch -h <CREAM CE hostname> -x -p 2170 -b "o=glue" objectclass=GLUE2ExecutionEnvironment

ldapsearch -h  <CREAM CE hostname> -x -p 2170 -b "o=glue" "(&(objectclass=GLUE2ComputingEndPoint)(GLUE2EndpointInterfaceName=org.glite.ce.ApplicationPublisher))"

Bug #83592 CREAM client doesn't allow the delegation of RFC proxies - Implemented

Create a RFC proxy, e.g.:

voms-proxy-init -voms dteam -rfc

and then submit using glite-ce-job-submit a job using ISB and OSB, e.g.:

[
executable="ssh1.sh";
inputsandbox={"file:///home/sgaravat/JDLExamples/ssh1.sh", "file:///home/sgaravat/a"};
stdoutput="out3.out";
stderror="err2.err";
outputsandbox={"out3.out", "err2.err", "ssh1.sh", "a"};
outputsandboxbasedesturi="gsiftp://localhost";
]

Verify that the final status is DONE-OK

Bug #83593 Problems limiting RFC proxies in CREAM - Implemented

Consider the same test done for bug #83592

TODOBug #84308 Error on glite_cream_load_monitor if cream db is on another host - Not Implemented

Configure a CREAM CE with the database installed on a different host than the CREAM CE.

Run:

/usr/bin/glite_cream_load_monitor /etc/glite-ce-cream-utils/glite_cream_load_monitor.conf --show

which shouldn't report any error.

Bug #86522 glite-ce-job-submit authorization error message difficoult to understand - Not Implemented

To check this fix, try a submission towards a CREAM CE configured to use ARGUS when you are not authorized. You should see an error message like:

$  glite-ce-job-submit -a -r emi-demo13.cnaf.infn.it:8443/cream-lsf-demo oo.jdl
2012-05-07 20:26:51,649 FATAL - CN=Massimo Sgaravatto,L=Padova,OU=Personal Certificate,O=INFN,C=IT not authorized for {http://www.gridsite.org/namespaces/delegation-2}getProxyReq

and not like the one reported in the savannah bug.

TOREMOVE Bug #86609 yaim variable CE_OTHERDESCR not properly managed for Glue2 - Not Implemented - merge with https://savannah.cern.ch/bugs/?95041

Try to set the yaim variable CE_OTHERDESCR to:

CE_OTHERDESCR="Cores=1"

Perform the following ldap query on the resource bdii:

ldapsearch -h <CREAM CE node> -x -p 2170 -b "o=glue" objectclass=GLUE2ExecutionEnvironment GLUE2EntityOtherInfo

This should also return:

GLUE2EntityOtherInfo: Cores=1

Try then to set the yaim variable CE_OTHERDESCR to:

CE_OTHERDESCR="Cores=1,Benchmark=150-HEP-SPEC06

and reconfigure via yaim.

Perform the following ldap query on the resource bdii:

ldapsearch -h <CREAM CE node> -x -p 2170 -b "o=glue" objectclass=GLUE2ExecutionEnvironment GLUE2EntityOtherInfo

This should also return:

GLUE2EntityOtherInfo: Cores=1

Then perform the following ldap query on the resource bdii:

ldapsearch -h <CREAM CE node> -x -p 2170 -b "o=glue" objectclass=Glue2Benchmark 

This should return something like:

dn: GLUE2BenchmarkID=cream-47.pd.infn.it_hep-spec06,GLUE2ResourceID=cream-47.pd.infn.it,GLUE2ServiceID=cream-47.pd.infn.it_ComputingElement,GLUE2GroupID=re
 source,o=glue
GLUE2BenchmarkExecutionEnvironmentForeignKey: cream-47.pd.infn.it
GLUE2BenchmarkID: cream-47.pd.infn.it_hep-spec06
GLUE2BenchmarkType: hep-spec06
objectClass: GLUE2Entity
objectClass: GLUE2Benchmark
GLUE2EntityCreationTime: 2012-01-13T14:04:48Z
GLUE2BenchmarkValue: 150
GLUE2EntityOtherInfo: InfoProviderName=glite-ce-glue2-benchmark-static
GLUE2EntityOtherInfo: InfoProviderVersion=1.0
GLUE2EntityOtherInfo: InfoProviderHost=cream-47.pd.infn.it
GLUE2BenchmarkComputingManagerForeignKey: cream-47.pd.infn.it_ComputingElement_Manager
GLUE2EntityName: Benchmark hep-spec06

TOREMOVE Bug #86694 A different port number than 9091 should be used for LRMS_EVENT_LISTENER - Not Implemented

On a running CREAM CE, perform the following command:

netstat -an | grep -i 9091

This shouldn't return anything.

Then perform the following command:

netstat -an | grep -i 49152

This should return:

tcp        0      0 :::49152                    :::*                        LISTEN      

[root@cream-47 ~]# netstat -an | grep -i 49153 [root@cream-47 ~]# netstat -an | grep -i 49154 [root@cream-47 ~]# netstat -an | grep -i 9091

Bug #86697 User application's exit code not recorded in the CREAM log file - Not Implemented

Submit a job and wait for its completion. Then check the glite-ce-cream.log file on the CREAM CE. The user exit code should be reported (filed exitCode), e.g.:

13 Jan 2012 15:22:52,966 org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor - JOB CREAM124031222 STATUS CHANGED: REALLY-RUNNING => DONE-OK [failureReason=reason=0] [exitCode=23] [localUser=dteam004] [workerNode=prod-wn-001.pn.pd.infn.it] [delegationId=7a52772caaeea96628a1ff9223e67a1f6c6dde9f]

Bug #86737 A different port number than 9909 should be used for CREAM_JOB_SENSOR - Not Implemented

On a running CREAM CE, perform the following command:

netstat -an | grep -i 9909

This shouldn't return anything.

Bug #86773 wrong /etc/glite-ce-cream/cream-config.xml with multiple ARGUS servers set - Not Implemented

To test the fix, set in the siteinfo,def:

USE_ARGUS=yes
ARGUS_PEPD_ENDPOINTS="https://cream-46.pd.infn.it:8154/authz https://cream-46-1.pd.infn.it:8154/authz"
CREAM_PEPC_RESOURCEID="http://pd.infn.it/cream-47"

i.e. 2 values for ARGUS_PEPD_ENDPOINTS.

Then configure via yaim.

In /etc/glite-ce-cream/cream-config.xml there should be:

 <argus-pep name="pep-client1"
             resource_id="http://pd.infn.it/cream-47"
             cert="/etc/grid-security/tomcat-cert.pem"
             key="/etc/grid-security/tomcat-key.pem"
             passwd=""
             mapping_class="org.glite.ce.cream.authz.argus.ActionMapping">
    <endpoint url="https://cream-46.pd.infn.it:8154/authz" />
    <endpoint url="https://cream-46-1.pd.infn.it:8154/authz" />
  </argus-pep>

TOREMOVE Bug #87690 Not possible to map different queues to different clusters for CREAM configured in cluster mode - Not Implemented - just one cluster supported

Configure via yaim a CREAM CE in cluster mode with different queues mapped to different clusters, e.g.:

CREAM_CLUSTER_MODE=yes
CE_HOST_cream_47_pd_infn_it_QUEUES="creamtest1 creamtest2"
QUEUE_CREAMTEST1_CLUSTER_UniqueID=cl1id
QUEUE_CREAMTEST2_CLUSTER_UniqueID=cl2id

Then query the resource bdii of the CREAM, and check the GlueForeignKey attributes of the different glueCEs: they should refer to the specified clusters:

ldapsearch -h cream-47.pd.infn.it -p 2170 -x -b o=grid objectclass=GlueCE GlueForeignKey
# extended LDIF
#
# LDAPv3
# base <o=grid> with scope subtree
# filter: objectclass=GlueCE
# requesting: GlueForeignKey 
#

# cream-47.pd.infn.it:8443/cream-lsf-creamtest2, resource, grid
dn: GlueCEUniqueID=cream-47.pd.infn.it:8443/cream-lsf-creamtest2,Mds-Vo-name=r
 esource,o=grid
GlueForeignKey: GlueClusterUniqueID=cl12d

# cream-47.pd.infn.it:8443/cream-lsf-creamtest1, resource, grid
dn: GlueCEUniqueID=cream-47.pd.infn.it:8443/cream-lsf-creamtest1,Mds-Vo-name=r
 esource,o=grid
GlueForeignKey: GlueClusterUniqueID=cl1id

Bug #87799 Add yaim variables to configure the GLUE 2 WorkingArea attributes - Not Implemented

Set all (or some) of the following yaim variables:

WORKING_AREA_SHARED
WORKING_AREA_GUARANTEED
WORKING_AREA_TOTAL
WORKING_AREA_FREE
WORKING_AREA_LIFETIME
WORKING_AREA_MULTISLOT_TOTAL
WORKING_AREA_MULTISLOT_FREE
WORKING_AREA_MULTISLOT_LIFETIME

and then configure via yaim. Then query the resource bdii of the CREAM CE and verify that the relevant attributes of the glue2 ComputingManager object are set.

TOREMOVE Bug #88078 CREAM DB names should be configurable - Not Implemented

Configure from scratch a CREAM CE setting the yaim variables: CREAM_DB_NAME and DELEGATION_DB_NAME, e.g.:

CREAM_DB_NAME=abc
DELEGATION_DB_NAME=xyz

and then configure via yaim.

Then check if the two databases have been created:

# mysql -u xxx -p Enter password: Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 7176 Server version: 5.0.77 Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> show databases;

+--------------------+

Database
+--------------------+
information_schema
abc
test
xyz
+--------------------+ 4 rows in set (0.02 sec)

Try also a job submission to verify if everything works properly.

Bug #89489 yaim plugin for CREAM CE does not execute a check function due to name mismatch - Implemented

Configure a CREAM CE via yaim and save the yaim output. It should contain the string:

INFO: Executing function: config_cream_gip_scheduler_plugin_check

TOREMOVE Bug #89664 yaim-cream-ce doesn't manage spaces in CE_OTHERDESCR - Not Implemented - merge with https://savannah.cern.ch/bugs/?95041

Try to set the yaim variable CE_OTHERDESCR to:

CE_OTHERDESCR="Cores=1"

Perform the following ldap query on the resource bdii:

ldapsearch -h <CREAM CE node> -x -p 2170 -b "o=glue" objectclass=GLUE2ExecutionEnvironment GLUE2EntityOtherInfo

This should also return:

GLUE2EntityOtherInfo: Cores=1

Try then to set the yaim variable CE_OTHERDESCR to:

CE_OTHERDESCR="Cores=2, Benchmark=4-HEP-SPEC06"

and reconfigure via yaim.

Perform the following ldap query on the resource bdii:

ldapsearch -h <CREAM CE node> -x -p 2170 -b "o=glue" objectclass=GLUE2ExecutionEnvironment GLUE2EntityOtherInfo

This should also return:

GLUE2EntityOtherInfo: Cores=2

Then perform the following ldap query on the resource bdii:

ldapsearch -h <CREAM CE node> -x -p 2170 -b "o=glue" objectclass=Glue2Benchmark 

This should return something like:

# cream-47.pd.infn.it_hep-spec06, cream-47.pd.infn.it, ppp, resource, glue
dn: GLUE2BenchmarkID=cream-47.pd.infn.it_hep-spec06,GLUE2ResourceID=cream-47.pd.infn.it,GLUE2ServiceID=ppp,GLUE2GroupID=resource,o=glue
GLUE2BenchmarkExecutionEnvironmentForeignKey: cream-47.pd.infn.it
GLUE2BenchmarkID: cream-47.pd.infn.it_hep-spec06
GLUE2BenchmarkType: hep-spec06
objectClass: GLUE2Entity
objectClass: GLUE2Benchmark
GLUE2EntityCreationTime: 2012-01-13T17:07:52Z
GLUE2BenchmarkValue: 4
GLUE2EntityOtherInfo: InfoProviderName=glite-ce-glue2-benchmark-static
GLUE2EntityOtherInfo: InfoProviderVersion=1.0
GLUE2EntityOtherInfo: InfoProviderHost=cream-47.pd.infn.it
GLUE2BenchmarkComputingManagerForeignKey: ppp_Manager
GLUE2EntityName: Benchmark hep-spec06

Bug #89784 Improve client side description of authorization failure - Not Implemented

Try to remove the lsc files for your VO and try a submission to that CE.

It should return an authorization error.

Then check the glite-ce-cream.log. It should report something like:

13 Jan 2012 18:21:21,270 org.glite.voms.PKIVerifier - Cannot find usable certificates to validate the AC. Check that the voms server host certificate is in your vomsdir directory.
13 Jan 2012 18:21:21,602 org.glite.ce.commonj.authz.gjaf.LocalUserPIP - glexec error: [gLExec]:   LCAS failed, see '/var/log/glexec/lcas_lcmaps.log' for more info.
13 Jan 2012 18:21:21,603 org.glite.ce.commonj.authz.gjaf.ServiceAuthorizationChain - Failed to get the local user id via glexec: glexec error: [gLExec]:   LCAS failed, see '/var/log/glexec/lcas_lcmaps.log' for more info.
org.glite.ce.commonj.authz.AuthorizationException: Failed to get the local user id via glexec: glexec error: [gLExec]:   LCAS failed, see '/var/log/glexec/lcas_lcmaps.log' for more info.

Bug #91819 glite_cream_load_monitor should read the thresholds from a conf file Implemented

Tested through the limiter test of the Robot based test-suite

TODO Bug #92102 Tomcat attributes in the CREAM CE should be configurable via yaim - Not Implemented

Set in siteinfo.def:

CREAM_JAVA_OPTS_HEAP="-Xms512m -Xmx1024m"

and configure via yaim.

The check /etc/tomca5.conf (in EMI2 SL5 X86_64 /etc/tomcat5/tomcat5.conf), where there should be:

JAVA_OPTS="${JAVA_OPTS} -server -Xms512m -Xmx1024m"

TODO Bug #92338 CREAM load limiter should not disable job submissions when there is no swap space - Not Implemented

To test the fix, consider a CREAM CE on a machine without swap.

Verify that the limiter doesn't disable job submissions.

Note: to show swap partition: cat /proc/swaps to check swap: top to disable swap: swapoff -a (or swapoff /) to enable swap: swapon -a The limiter checks every 10 minutes the memory usage.

Bug #93768 There's a bug in logfile handling - Not Implemented

To verify the fix, try the --logfile option with e.g. the glite-ce-job-submit command.

Berify that the log file is created in the specified path

Fixes provided with CREAM 1.13.4

Bug #95480 CREAM doesn't transfert the output files remotely under well known conditions - see Fix for 1.14.1

Fixes provided with CREAM 1.13.3

Bug #81561 Make JobDBAdminPurger script compliant with CREAM EMI environment. - Implemented

STATUS: Implemented

To test the fix, simply run on the CREAM CE as root the JobDBAdminPurger.sh. E.g.:

# JobDBAdminPurger.sh -c /etc/glite-ce-cream/cream-config.xml -u <user> -p <passwd> -s DONE-FAILED,0 
START jobAdminPurger

It should work without reporting error messages:

-----------------------------------------------------------
Job CREAM595579358 is going to be purged ...
- Job deleted. JobId = CREAM595579358
CREAM595579358 has been purged!
-----------------------------------------------------------

STOP jobAdminPurger

Starting from EMI2 the command to run is:

JobDBAdminPurger.sh -c /etc/glite-ce-cream/cream-config.xml  -s DONE-FAILED,0

Bug #83238 Sometimes CREAM does not update the state of a failed job. - Implemented

STATUS: Implemented

To test the fix, try to kill by hand a job.

The status of the job should eventually be:

   Status        = [DONE-FAILED]
   ExitCode      = [N/A]
   FailureReason = [Job has been terminated (got SIGTERM)]

Bug #83749 JobDBAdminPurger cannot purge jobs if configured sandbox dir has changed. - Implemented

STATUS: Not implemented

To test the fix, submit some jobs and then reconfigure the service with a different value of CREAM_SANDBOX_PATH. Then try, with the JobDBAdminPurger.sh script, to purge some jobs submitted before the switch.

It must be verified:

  • that the jobs have been purged from the CREAM DB (i.e. a glite-ce-job-status should not find them anymore)
  • that the relevant CREAM sandbox directories have been deleted

Bug #84374 yaim-cream-ce: GlueForeignKey: GlueCEUniqueID: published using : instead of=. - Implemented

STATUS: Implemented

To test the fix, query the resource bdii of the CREAM-CE:

ldapsearch -h <CREAM CE host> -x -p 2170 -b "o=grid" | grep -i foreignkey | grep -i glueceuniqueid

Entries such as:

GlueForeignKey: GlueCEUniqueID=cream-35.pd.infn.it:8443/cream-lsf-creamtest1

i.e.:

GlueForeignKey: GlueCEUniqueID=<CREAM CE ID>

should appear.

Bug #86191 No info published by the lcg-info-dynamic-scheduler for one VOView - Implemented

STATUS: Implemented

To test the fix, issue the following ldapsearch query towards the resource bdii of the CREAM-CE:

$ ldapsearch -h cream-35 -x -p 2170 -b "o=grid" | grep -i GlueCEStateWaitingJobs | grep -i 444444

It should not find anything

Bug #87361 The attribute cream_concurrency_level should be configurable via yaim. - Implemented

STATUS: Implemented

To test the fix, set in seiteinfo.def the variable CREAM_CONCURRENCY_LEVEL to a certain number (n). After configuration verify that in =/etc/glite-ce-cream/cream-config.xml there is:

  • [EMI1]

         cream_concurrency_level="n"

  • [EMI2]
              commandworkerpoolsize="n"
    

Bug #87492 CREAM doesn't handle correctly the jdl attribute "environment". - Implemented

STATUS: Implemented

To test the fix, submit the following JDL using glite-ce-job-submit:

Environment = {
"GANGA_LCG_VO='camont:/camont/Role=lcgadmin'",
"LFC_HOST='lfc0448.gridpp.rl.ac.uk'",
"GANGA_LOG_HANDLER='WMS'"
}; 
executable="/bin/env";
stdoutput="out.out";
outputsandbox={"out.out"};
outputsandboxbasedesturi="gsiftp://localhost";

When the job is done, retrieve the output and check that in out.out the variables GANGA_LCG_VO, LFC_HOST and GANGA_LOG_HANDLER have exactly the values defined in the JDL.

gLite-CLUSTER

Fixes provided with glite-CLUSTER v. 2.0.1

Bug #CREAM-96 CLUSTER Yaim generates incorrect Subcluster configuration Not Implemented

#CREAM-96 CLUSTER Yaim generates incorrect Subcluster configuration Not implemented

Bug #CREAM-98 Generated configuration files conflict with CREAM ones Not Implemented

#CREAM-98 Generated configuration files conflict with CREAM ones Not implemented

Bug #CREAM-100 Remove lcg prefix from template Not Implemented

[https://issues.infn.it/jira/browse/CREAM-100?focusedCommentId=29883&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-29883][#CREAM-100]] Remove lcg prefix from template Not implemented

Bug #CREAM-97 Wrong GLUE output format in a CREAM+Cluster installation Not Implemented

#CREAM-97 Wrong GLUE output format in a CREAM+Cluster installation Not implemented

Bug #CREAM-95 CE_* YAIM variables not mandatory for GLUE2 in cluster mode Not Implemented

#CREAM-95 CE_* YAIM variables not mandatory for GLUE2 in cluster mode Not implemented

Fixes provided with previous versions

Bug #69318 The cluster publisher needs to publish in GLUE 2 too Not implemented

ldapsearch -h <gLite-CUSTER hostname> -x -p 2170 -b "o=glue" objectclass=GLUE2ComputingService

ldapsearch -h <gLite-CUSTER hostname> -x -p 2170 -b "o=glue" objectclass=GLUE2Manager

  • Check if the resource BDII publishes glue 2 GLUE2Share objectclasses. There should be one GLUE2Share objectclass per each VOview.

ldapsearch -h <gLite-CUSTER hostname> -x -p 2170 -b "o=glue" objectclass=GLUE2Share

ldapsearch -h <gLite-CUSTER hostname> -x -p 2170 -b "o=glue" objectclass=GLUE2ExecutionEnvironment

ldapsearch -h  <gLite-CUSTER hostname> -x -p 2170 -b "o=glue" "(&(objectclass=GLUE2ComputingEndPoint)(GLUE2EndpointInterfaceName=org.glite.ce.ApplicationPublisher))"

Bug #86512 YAIM CLuster Publisher incorrectly configures GlueClusterService and GlueForeignKey for CreamCEs- Not implemented

To test the fix issue a ldapsearch such as:

ldapsearch -h <gLite-CLUSTER> -x -p 2170 -b "o=grid" | grep GlueClusterService

Then issue a ldapsearch such as:

ldapsearch -h  <gLite-CLUSTER> -x -p 2170 -b "o=grid" | grep GlueForeignKey | grep -v Site

Verify that for each returned line, the format is:

<hostname>:8443/cream-<lrms>-<queue>

Bug #87691 Not possible to map different queues of the same CE to different clusters - Not implemented

To test this fix, configure a gLite-CLUSTER with at least two different queues mapped to different clusters (use the yaim variables QUEUE__CLUSTER_UniqueID), e.g."

QUEUE_CREAMTEST1_CLUSTER_UniqueID=cl1id
QUEUE_CREAMTEST2_CLUSTER_UniqueID=cl2id

Then query the resource bdii of the gLite-CLUSTER and verify that:

  • for the GlueCluster objectclass with GlueClusterUniqueID equal to cl1id, the attributes GlueClusterService and GlueForeignKey refers to CEIds with creamtest1 as queue
  • for the GlueCluster objectclass with GlueClusterUniqueID equal to cl2id, the attributes GlueClusterService and GlueForeignKey refers to CEIds with creamtest2 as queue

Bug #87799 Add yaim variables to configure the GLUE 2 WorkingArea attributes - Not implemented

Set all (or some) of the following yaim variables:

WORKING_AREA_SHARED
WORKING_AREA_GUARANTEED
WORKING_AREA_TOTAL
WORKING_AREA_FREE
WORKING_AREA_LIFETIME
WORKING_AREA_MULTISLOT_TOTAL
WORKING_AREA_MULTISLOT_FREE
WORKING_AREA_MULTISLOT_LIFETIME

and then configure via yaim. Then query the resource bdii of the gLite cluster and verify that the relevant attributes of the glue2 ComputingManager object are set.

CREAM Torque module

Fixes provided with CREAM TORQUE module 2.1.2

TODO CREAM #119 - Wrong total cpu count from PBS infoprovider Not Implemented

CREAM #119 Wrong total cpu count from PBS infoprovider Not Implemented

Fixes provided with CREAM TORQUE module 2.1.1

CREAM #101 - Wrong time format for MaxWallClockTime Not Implemented

CREAM #101 Wrong time format for MaxWallClockTime Not Implemented

CREAM #107 - Bad timezone format Not Implemented

CREAM #107 Bad timezone format Not Implemented

Fixes provided with CREAM TORQUE module 2.0.1

TODO Bug #95184 Missing real value for GlueCEPolicyMaxSlotsPerJob Not Implemented

  • configure the TORQUE server so that the parameter "resources_max.procct" for a given queue is defined and greater than zero
  • run the script /var/lib/bdii/gip/plugin/glite-info-dynamic-ce
  • verify that the attribute GlueCEPolicyMaxSlotsPerJob for the given queue reports the value of the parameter above

TODO Bug #96636 Time limits for GLUE 2 are different to GLUE 1 Not Implemented

  • run the command
    ldapsearch -x -H ldap://hostname:2170 -b o=glue '(&(objectclass=GLUE2ComputingShare))' [attributeName] 
    where attributeName is one of the following arguments: GLUE2ComputingShareMaxCPUTime, GLUE2ComputingShareMaxWallTime, GLUE2ComputingShareDefaultCPUTime, GLUE2ComputingShareDefaultWallTime
  • verify that each value, if available, is expressed in seconds
  • run the command
    ldapsearch -x -H ldap://hostname:2170 -b o=grid '(&(objectclass=GLUECE))' [attributeName] 
    where attributeName is one of the following arguments: GlueCEPolicyMaxCPUTime, GlueCEPolicyMaxObtainableCPUTime, GlueCEPolicyMaxWallClockTime, GlueCEPolicyMaxObtainableWallClockTime
  • verify that each value, if available, is expressed in minutes

TODO Bug #99639 lcg-info-dynamic-scheduler-pbs cannot parse qstat output with spurious lines Not Implemented

  • save the output of the command
    qstat -f
    in a temporary file
  • insert several spurious lines within the block of job data, it's better to insert also some empty lines.
  • run the command
    lcg-info-dynamic-scheduler-pbs -c [temporary file]
  • verify that the execution works fine

Fixes from previous releases

Bug #17325 Default time limits not taken into account - Not implemented

To test the fix for this bug, consider a PBS installation where for a certain queue both default and max values are specified, e.g.:

resources_max.cput = A
resources_max.walltime = B
resources_default.cput = C
resources_default.walltime = D

Verify that the published value for GlueCEPolicyMaxCPUTime is C and that the published value for GlueCEPolicyMaxWallClockTime is D

Bug #49653 lcg-info-dynamic-pbs should check pcput in addition to cput - Not implemented

To test the fix for this bug, consider a PBS installation where for a certain queue both cput and pcput max values are specified, e.g.:

resources_max.cput = A
resources_max.pcput = B

Verify that the published value for GlueCEPolicyMaxCPUTime is the minimum between A an B.

Then consider a PBS installation where for a certain queue both cput and pcput max and default values are specified, e.g.:

resources_max.cput = C
resources_default.cput = D
resources_max.pcput = E
resources_default.pcput = F

Verify that the published value for GlueCEPolicyMaxCPUTime is the minimum between D and F.

TOREMOVE Bug #76162 YAIM for APEL parsers to use the BATCH_LOG_DIR for the batch system log location - Not implemented - obsolete

To test the fix for this bug, set the yaim variable BATCH_ACCT_DIR and configure via yaim.

Check the file /etc/glite-apel-pbs/parser-config-yaim.xml and verify the section:

<Logs searchSubDirs="yes" reprocess="no">
            <Dir>X</Dir>

X should be the value specified for BATCH_ACCT_DIR.

Then reconfigure without setting BATCH_ACCT_DIR.

Check the file /etc/glite-apel-pbs/parser-config-yaim.xml and verify that the directory name is ${TORQUE_VAR_DIR}/server_priv/accounting

TOREMOVE Bug #77106 PBS info provider doesn't allow - in a queue name - Not implemented - obsolete

To test the fix, configure a CREAM CE in a PBS installation where at least a queue has a - in its name.

Then log as root on the CREAM CE and run:

/sbin/runuser -s /bin/sh ldap -c "/var/lib/bdii/gip/plugin/glite-info-dynamic-ce"

Check if the returned information is correct.

CREAM LSF module

Fixes provided with CREAM LSF module 2.0.3

CREAM #114 - Execution error: Uncaught exception Not Implemented

CREAM #114 Execution error: Uncaught exception Not Implemented

Bug #88720 Too many '9' in GlueCEPolicyMaxCPUTime for LSF - Not implemented

To test the fix, query the CREAM CE resource bdii in the following way:

ldapsearch -h <CREAM CE node> -x -p 2170 -b "o=grid" | grep GlueCEPolicyMaxCPUTime | grep 9999999999

This shouldn't return anything.

Bug #89767 The LSF dynamic infoprovider shouldn't publish GlueCEStateFreeCPUs and GlueCEStateFreeJobSlots - Not implemented

To test the fix, log as root on the CREAM CE and run:

/sbin/runuser -s /bin/sh ldap -c "/var/lib/bdii/gip/plugin/glite-info-dynamic-ce"

Among the returned information, there shouldn't be GlueCEStateFreeCPUs and GlueCEStateFreeJobSlots.

Bug #89794 LSF info provider doesn't allow - in a queue name - Not implemented

To test the fix, configure a CREAM CE in a LSF installation where at least a queue has a - in its name.

Then log as root on the CREAM CE and run:

/sbin/runuser -s /bin/sh ldap -c "/var/lib/bdii/gip/plugin/glite-info-dynamic-ce"

Check if the returned information is correct.

Bug #90113 missing yaim check for batch system - Not implemented

To test the fix, configure a CREAM CE without having also installed LSF.

yaim installation should fail saying that there were problems with LSF installation.

CREAM SLURM module

Fixes provided with CREAM SLURM module 1.0.1

CREAM #116 - Missing CE information from SLURM infoprovider Not Implemented

CREAM #116 Missing CE information from SLURM infoprovider Not Implemented
Edit | Attach | PDF | History: r111 < r110 < r109 < r108 < r107 | Backlinks | Raw View | More topic actions
Topic revision: r111 - 2013-10-25 - CristinaAiftimiei
 

This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback