Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
Regression Test Work Plan
BLAHFixes provided with BLAH 1.20.3Bug #CREAM-112 sge_local_submit_attributes.sh problem with memory requirements Not Implemented#CREAM-112 sge_local_submit_attributes.sh problem with memory requirements. Not ImplementedFixes provided with BLAH 1.20.2Bug #CREAM-88 Memory leak in BNotifier Not Implemented#CREAM-88 Memory leak in BNotifier. Not ImplementedBug #CREAM-94 Command not found error in /usr/libexec/sge_cancel.sh Not Implemented#CREAM-94 Command not found error in /usr/libexec/sge_cancel.sh. Not ImplementedBug #CREAM-105 Env vars in blah.config should be exported also in daemons Not Implemented#CREAM-105 Env vars in blah.config should be exported also in daemons. Not ImplementedFixes provided with BLAH 1.18.4Bug #CREAM-94 Command not found error in /usr/libexec/sge_cancel.sh Not Implemented#CREAM-94 Command not found error in /usr/libexec/sge_cancel.sh. Not ImplementedBug #CREAM-105 Env vars in blah.config should be exported also in daemons Not Implemented#CREAM-105 Env vars in blah.config should be exported also in daemons. Not ImplementedBug #CREAM-112 sge_local_submit_attributes.sh problem with memory requirements Not Implemented#CREAM-112 sge_local_submit_attributes.sh problem with memory requirements Not ImplementedFixes provided with BLAH 1.18.2Bug #97491 (BUpdaterLSF should not execute any bhist query if all the bhist related conf parameter are set to "no") Hard to test - Not implemented
We are running this patch in production at CERN. The patch is motivated by the fact that bhist calls are very expensive and calls to the command don't scale. Running the command makes both the CE and the batch master unresponsive and has therefore a severe impact on the system performance. While discussing the issue with the CREAM developers we found out that it is possible to obsolete these calls and replace them with less expensive batch system queries. For that we use the btools tool suite which provides a set of additional LSF batch system queries which return output in machine readable form. Pre-requisites: --------------- Note on btools: btools are shipped in source code with the LSF information provider plugin. Compiling them requires LSF header files. Binaries depend on the LSF version which is used, therefore they cannot be shipped or automatically build due to licensing reasons. Building Instructions: - ensure that the LSF headers are installed on your build host # yum install gcc rpmbuild automake autoconf info-dynamic-scheduler-lsf-btools-2.2.0-1.noarch.rpm # cd /tmp # tar -zxvf /usr/src/egi/btools.src.tgz # cd btools # ./autogen.sh # make rpm and install the resulting rpm Patch version: -------------- On EMI1 CEs (our production version) we are using a private build of the patch glite-ce-blahp-1.16.99-0_201208291258.slc5 On EMI2 we've been testing a new build glite-ce-blahp-1.18.1-2 (Both rpms are private builds we got from Massimo Mezzadri. Configuration of the patch -------------------------- in /etc/blah.conf we set: # use btools to obsolete bhist calls bupdater_use_btools=yes bupdater_btools_path=/usr/bin # bupdater_use_bhist_for_susp=no bupdater_use_bhist_for_killed=no bupdater_use_bhist_for_idle=no bupdater_use_bhist_time_constraint=no # CERN add caching for LSF queries lsf_batch_caching_enabled=yes batch_command_caching_filter=/usr/libexec/runcmd The runcmd command is shipped with the LSF information providers. You need at least info-dynamic-scheduler-lsf-2.2.0-1. In our configuration we cache all batch system responses and share them using an NFS file system. The cache directory is a convenient way to check if any bhist calls are done by any of the CEs by just checking for a cache file. With the above settings there are no such calls any longer. Bug #95385 (Misleading message when Cream SGE aborts jobs requesting more than one CPU) Not implementedFully tested by CERN: the test requires the CERN environment based on SGE.Fixes provided with BLAH 1.18.1Bug #94414 (BLParserLSF could crash if a suspend on an idle job is done) ImplementedTry to suspend a job whose status is "IDLE" and verify that the daemon BLParserLSF doesn't crash.Bug #94519 (Updater for LSF can misidentify killed jobs as finished) ImplementedVerify that the value for bupdater_use_bhist_for_killed is set to yes, submit and cancel a job and verify that the status of the job reports a "jobstatus=3"Bug #94712 (Due to a timestamp problem bupdater for LSF can leave job in IDLE state) Hard to Reproduce - Not implementedNot easy to reproduce (unpredictable behaviour)Bug #95392 Heavy usage of 'bjobsinfo' still hurts LSF @ CERN Cannot be reproduced - Not implementedThis is a cosmetic update required by CERN team; it can be reproduced only using the tools developed at CERN.Fixes provided with BLAH 1.18.0Bug #84261 BNotifier on CREAM CE seems to not restart cleanly ImplementedTo test the fix, configure a CREAM CE using the new blparser. Then try a:service gLite restartIt shouldn't report the error message: Starting BNotifier: /opt/glite/bin/BNotifier: Error creating and binding socket: Address already in use Bug #86238 blahpd doesn't check the status of its daemons when idling ImplementedTo test the fix configure a CREAM CE with the new blparser. Don't use it (i.e. do not submit jobs nor issue any other commands). kill the budater and bnotifier processes. Wait for 1 minute: you should see that the bupdater and bnotifier have been restarted.Bug #86918 Request for passing all submit command attributes to the local configuration script. Not implementedTo test this fix, create/edit the/usr/libexec/pbs_local_submit_attributes.sh (for PBS) script adding:
export gridType x509UserProxyFQAN uniquejobid queue ceid VirtualOrganisation ClientJobId x509UserProxySubject env > /tmp/filewithenvEdit the /etc/blah,config file adding:
blah_pass_all_submit_attributes=yesSubmit a job. In the CREAM CE the file /tmp/filewithenv should be created and it should contain the setting of some variables, including the ones exported in the /usr/libexec/pbs_local_submit_attributes.sh script.
Then edit the /etc/blah,config file, removing the previously added line, and adding:
blah_pass_submit_attributes[0]="x509UserProxySubject" blah_pass_submit_attributes[1]="x509UserProxyFQAN"Submit a job. In the CREAM CE the file /tmp/filewithenv should be created and it should contain the setting of some variables, including x509UserProxySubject and x509UserProxyFQAN .
Bug #90085 Suspend command doesn't work with old parser ImplementedTo test the fix configure a CREAM CE with the old blparser. Then submit a job and after a while suspend it using theglite-ce-job-suspend command.
Check the job status which eventually should be HELD .
Bug #90331 Not implementedTo test the fix submit a job, ban yourself in the ce (check here how to ban a user) and try a glite-ce-job-status. It should throw an authorization fault.Bug #90927 Problem with init script for blparser Not implementedTo check the fix, try to stop/start the blparser:service glite-ce-blparser start / stopThen verify that the blparser has indeed been started/stopped Bug #91318 Request to change functions in blah_common_submit_functions.sh Not implementedVerify that in/usr/libexec/blah_common_submit_functions.sh there is this piece of code:
function bls_add_job_wrapper () { bls_start_job_wrapper >> $bls_tmp_file bls_finish_job_wrapper >> $bls_tmp_file bls_test_working_dir } Bug #90101: Missing 'Iwd' Attribute when trasferring files with the 'TransferInput' attribute may cause thread to loop TBDBug #92554: BNotifier problem can leave connection in CLOSE_WAIT state TBDBug #89504: Repeated notification problem for BLParserLSF TBDBug #90082: BUpdaterPBS workaround if tracejob is in infinite loop TBDFixes provided with BLAH 1.16.6See Fixes provided with BLAH 1.18.1Fixes provided with BLAH 1.16.5Bug #89527 BLAHP produced -W stage(in/out) directives are incompatible with Torque 2.5.8 Not implementedTo test this fix, configure a CREAM CE with PBS/Torque 2.5.8. If this is not possible and you have another torque version, apply the change documented at: https://wiki.italiangrid.it/twiki/bin/view/CREAM/TroubleshootingGuide#5_1_Saving_the_batch_job_submiss to save the submission script. Submit a job and check in /tmp the pbs job submission script. It should contain something like:#PBS -W stagein=\'CREAM610186385_jobWrapper.sh.18757.13699.1328001723@cream-38.pd.infn.it:/var/c\ ream_sandbox/dteam/CN_Massimo_Sgaravatto_L_Padova_OU_Personal_Certificate_O_INFN_C_IT_dteam_Role\ _NULL_Capability_NULL_dteam042/61/CREAM610186385/CREAM610186385_jobWrapper.sh,cre38_610186385.pr\ oxy@cream-38.pd.infn.it:/var/cream_sandbox/dteam/CN_Massimo_Sgaravatto_L_Padova_OU_Personal_Cert\ ificate_O_INFN_C_IT_dteam_Role_NULL_Capability_NULL_dteam042/proxy/5a34c64e2a8db2569284306e9a472\ 3d2d40045a7_13647008746533\' #PBS -W stageout=\'out_cre38_610186385_StandardOutput@cream-38.pd.infn.it:/var/cream_sandbox/dte\ am/CN_Massimo_Sgaravatto_L_Padova_OU_Personal_Certificate_O_INFN_C_IT_dteam_Role_NULL_Capability\ _NULL_dteam042/61/CREAM610186385/StandardOutput,err_cre38_610186385_StandardError@cream-38.pd.in\ fn.it:/var/cream_sandbox/dteam/CN_Massimo_Sgaravatto_L_Padova_OU_Personal_Certificate_O_INFN_C_I\ T_dteam_Role_NULL_Capability_NULL_dteam042/61/CREAM610186385/StandardError\'i.e. a stagein and a stageout directives, with escaped quotes around the whole lists. Bug #91037 BUpdaterLSF should use bjobs to detect final job state Not implementedTo test the fix, configure a CREAM CE with LSF with the new blparser. Then editblah.config setting:
bupdater_debug_level=3Delete the bupdater log file and restart the blparser. Submit a job and wait for its completion and wait till then a notification with status 4 is logged in the bnotifier log file. grep the bupdater log file for the bhist string, which should not be found, apart from something like:
2012-03-09 07:56:15 /usr/bin/BUpdaterLSF: key bupdater_use_bhist_time_constraint not found - using the default:no 2012-03-09 07:56:15 /usr/bin/BUpdaterLSF: key bupdater_use_bhist_for_killed not found - using the default:no 2012-03-09 07:56:15 /usr/bin/BUpdaterLSF: key bupdater_use_bhist_time_constraint not found - using the default:no 2012-03-09 07:56:15 /usr/bin/BUpdaterLSF: key bupdater_use_bhist_for_killed not found - using the default:no Fixes provided with BLAH 1.16.4Bug #88974 BUpdaterSGE and BNotifier don't start if sge_helperpath var is not fixed Not implementedInstall and configure (via yaim) a CREAM-CE using GE as batch system. Make sure that in/etc/blah.config the variable sge_helperpath is commented/is not there.
Try to restart the blparser: /etc/init.d/glite-ce-blahparser restart
It should work without problems. In particular it should not report the following error:
Starting BNotifier: /usr/bin/BNotifier: sge_helperpath not defined. Exiting [FAILED] Starting BUpdaterSGE: /usr/bin/BUpdaterSGE: sge_helperpath not defined. Exiting [FAILED] Bug 89859 There is a memory leak in the updater for LSF, PBS and Condor Not implementedConfigure a CREAM CE using the new blparser. Submit 1000 jobs using e.g. this JDL:[ executable="/bin/sleep"; arguments="100"; ]Keep monitoring the memory used by the bupdaterxxx process. It should basically not increase. The test should be done for both LSF and Torque/PBS. Fixes provided with BLAH 1.16.3Bug #75854 Problems related to the growth of the blah registry) Not implementedConfigure a CREAM CE using the new BLparser. Verify that in /etc/blah.config there is:job_registry_use_mmap=yes (default scenario).
Submit 5000 jobs on a CREAM CE using the following JDL:
[ executable="/bin/sleep"; arguments="100"; ]Monitor the BLAH processed. Verify that each of them doesn't use more than 50 MB. Bug #77776 (BUpdater should have an option to use cached batch system commands) Not implementedAdd:lsf_batch_caching_enabled=yes batch_command_caching_filter=/usr/bin/runcmd.plin /etc/blah.config .
Create and fill /usr/bin/runcmd.pl with the following content:
#!/usr/bin/perl #---------------------# # PROGRAM: argv.pl # #---------------------# $numArgs = $#ARGV + 1; open (MYFILE, '>>/tmp/xyz'); foreach $argnum (0 .. $#ARGV) { print MYFILE "$ARGV[$argnum] "; } print MYFILE "\n"; close (MYFILE);Submit some jobs. Check that in /tmp/xyz the queries to the batch system are recorded. E.g. for LSF something like that should be reported:
/opt/lsf/7.0/linux2.6-glibc2.3-x86/bin/bjobs -u all -l /opt/lsf/7.0/linux2.6-glibc2.3-x86/bin/bjobs -u all -l ... Bug #80805 (BLAH job registry permissions should be improved) Not implementedCheck permissions and ownership under/var/blah . They should be:
/var/blah: total 12 -rw-r--r-- 1 tomcat tomcat 5 Oct 18 07:32 blah_bnotifier.pid -rw-r--r-- 1 tomcat tomcat 5 Oct 18 07:32 blah_bupdater.pid drwxrwx--t 4 tomcat tomcat 4096 Oct 18 07:38 user_blah_job_registry.bjr /var/blah/user_blah_job_registry.bjr: total 16 -rw-rw-r-- 1 tomcat tomcat 1712 Oct 18 07:38 registry -rw-r--r-- 1 tomcat tomcat 260 Oct 18 07:38 registry.by_blah_index -rw-rw-rw- 1 tomcat tomcat 0 Oct 18 07:38 registry.locktest drwxrwx-wt 2 tomcat tomcat 4096 Oct 18 07:38 registry.npudir drwxrwx-wt 2 tomcat tomcat 4096 Oct 18 07:38 registry.proxydir -rw-rw-r-- 1 tomcat tomcat 0 Oct 18 07:32 registry.subjectlist /var/blah/user_blah_job_registry.bjr/registry.npudir: total 0 /var/blah/user_blah_job_registry.bjr/registry.proxydir: total 0 Bug #81354 (Missing 'Iwd' Attribute when trasferring files with the 'TransferInput' attribute causes thread to loop) Not implementedLog on a cream ce as user tomcat. Create a proxy of yours and copy it as/tmp/proxy (change the ownership to tomcat.tomcat).
Create the file /home/dteam001/dir1/fstab (you can copy /etc/fstab).
Submit a job directly via blah (in the following change pbs and creamtest2 with the relevant batch system and queue names):
$ /usr/bin/blahpd $GahpVersion: 1.16.2 Mar 31 2008 INFN\ blahpd\ (poly,new_esc_format) $ BLAH_SET_SUDO_ID dteam001 S Sudo\ mode\ on blah_job_submit 1 [cmd="/bin/cp";Args="fstab\ fstab.out";TransferInput="/home/dteam001/dir1/fstab";TransferOutput="fstab.out";TransferOutputRemaps="fstab.out=/home/dteam001/dir1/fstab.out";gridtype="pbs";queue="creamtest2";x509userproxy="/tmp/proxy"] S results S 1 1 0 No\ error pbs/20111010/304.cream-38.pd.infn.itEventually check the content of /home/dteam001/dir1/ where you see both fstab and fstab.out :
$ ls /home/dteam001/dir1/ fstab fstab.out Bug #81824 (yaim-cream-ce should manage the attribute bupdater_loop_interval) ImplementedSetBUPDATER_LOOP_INTERVAL to 30 in siteinfo.def and reconfigure via yaim. Then verify that in blah.config there is:
bupdater_loop_interval=30 Bug #82281 (blahp.log records should always contain CREAM job ID) Not implementSubmit a job directly to CREAM using CREAM-CLI. Then submit a job to CREAM through the WMS. In the accounting log file (/var/log/cream/accounting/blahp.log-<date>) in both cases the clientID field should end with the numeric part of the CREAM jobid, e.g.:"timestamp=2011-10-10 14:37:38" "userDN=/C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Massimo Sgaravatto" "userFQAN=/dteam/Role=NULL/Capability=NULL" "userFQAN=/dteam/NGI_IT/Role=NULL/Capability=NULL" "ceID=cream-38.pd.infn.it:8443/cream-pbs-creamtest2" "jobID=CREAM956286045" "lrmsID=300.cream-38.pd.infn.it" "localUser=18757" "clientID=cre38_956286045" "timestamp=2011-10-10 14:39:57" "userDN=/C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Massimo Sgaravatto" "userFQAN=/dteam/Role=NULL/Capability=NULL" "userFQAN=/dteam/NGI_IT/Role=NULL/Capability=NULL" "ceID=cream-38.pd.infn.it:8443/cream-pbs-creamtest2" "jobID=https://devel19.cnaf.infn.it:9000/dLvm84LvD7w7QXtLZK4L0A" "lrmsID=302.cream-38.pd.infn.it" "localUser=18757" "clientID=cre38_315532638" Bug #82297 (blahp.log rotation period is too short) Not implementedCheck that in/etc/logrotate.d/blahp-logrotate rotate is equal to 365:
# cat /etc/logrotate.d/blahp-logrotate /var/log/cream/accounting/blahp.log { copytruncate rotate 365 size = 10M missingok nomail } Bug #83275 (Problem in updater with very short jobs that can cause no notification to cream) Not implementedConfigure a CREAM CE using the new blparser. Submit a job using the following JDL:[ executable="/bin/echo"; arguments="ciao"; ]Check in the bnotifier log file ( /var/log/cream/glite-ce-bnotifier.log that at least a notification is sent for this job, e.g.:
2011-11-04 14:11:11 Sent for Cream:[BatchJobId="927.cream-38.pd.infn.it"; JobStatus=4; ChangeTime="2011-11-04 14:08:55"; JwExitCode=0; Reason="reason=0"; ClientJobId="622028514"; BlahJobName="cre38_622028514";] Bug #83347 (Incorrect special character handling for BLAH Arguments and Environment attributes) Not implementedLog on a cream ce as user tomcat. Create a proxy of yours and copy it as/tmp/proxy (change the ownership to tomcat.tomcat).
Create the file /home/dteam001/dir1/fstab (you can copy /etc/fstab).
Submit a job directly via blah (in the following change pbs and creamtest1 with the relevant batch system and queue names):
BLAH_JOB_SUBMIT 1 [Cmd="/bin/echo";Args="$HOSTNAME";Out="/tmp/stdout_l15367";In="/dev/null";GridType="pbs";Queue="creamtest1";x509userproxy="/tmp/proxy";Iwd="/tmp";TransferOutput="output_file";TransferOutputRemaps="output_file=/tmp/stdout_l15367";GridResource="blah"]Verify that in the output file there is the hostname of the WN. Bug #87419 (blparser_master add some spurious character in the BLParser command line) Not implementedConfigure a CREAM CE using the old blparser. Check the blparser process using ps. It shouldn't show urious characters:root 26485 0.0 0.2 155564 5868 ? Sl 07:36 0:00 /usr/bin/BLParserPBS -d 1 -l /var/log/cream/glite-pbsparser.log -s /var/torque -p 33333 -m 56565 CREAMFixes provided with CREAM 1.16.2Bug #CREAM-113 Enhancement of the CREAM DB API (Sec. Vuln. )Not Implemented#CREAM-113 Enhancement of the CREAM DB API Not ImplementedBug #CREAM-124 Fix a bug related to gridsite (Sec. Vuln. )Not Implemented#CREAM-124 Fix a bug related to gridsite Not ImplementedBug #CREAM-125 SOAP Header is not set Not Implemented#CREAM-125 SOAP Header is not set Not ImplementedFixes provided with CREAM 1.16.1Bug #CREAM-107 Bad timezone format Not Implemented#CREAM-107 Bad timezone format. Not ImplementedBug #CREAM-103 Wrong symlinks upgrading from EMI-2 to EMI-3 Not Implemented#CREAM-103 Wrong symlinks upgrading from EMI-2 to EMI-3. Not ImplementedBug #CREAM-101 Wrong time format for MaxWallClockTime Not Implemented#CREAM-101 Wrong time format for MaxWallClockTime. Not ImplementedBug #CREAM-99 Python error on ERT calculation: local variable 'est' referenced before assignment Not Implemented#CREAM-99 Python error on ERT calculation: local variable 'est' referenced before assignment. Not ImplementedBug #CREAM-82 Check permission for /var/cream_sandbox Not Implemented#CREAM-82 Check permission for /var/cream_sandbox. Not ImplementedBug #CREAM-78 Check for /etc/lrms in config_cream_gip_scheduler_plugin Not Implemented#CREAM-78 Check for /etc/lrms in config_cream_gip_scheduler_plugin. Not ImplementedBug #CREAM-77 List index error from persistent estimator Not Implemented#CREAM-77 List index error from persistent estimator. Not ImplementedBug #CREAM-74 Remove trustmanager from the provides list of cream-common Not Implemented#CREAM-74 Remove trustmanager from the provides list of cream-common. Not ImplementedBug #CREAM-75 CREAM should avoid to log the error messages by including even the full stack trace (i.e printStackTrace()) Not Implemented#CREAM-75 CREAM should avoid to log the error messages by including even the full stack trace (i.e printStackTrace()). Not ImplementedBug #CREAM-83 CREAM should be able to immediately kill also queued jobs, when the proxy is expiring Not Implemented#CREAM-83 CREAM should be able to immediately kill also queued jobs, when the proxy is expiring. Not ImplementedBug #CREAM-84 The DelegationPurger may cause a java.lang.OutOfMemoryError exception Not Implemented#CREAM-84 The DelegationPurger may cause a java.lang.OutOfMemoryError exception. Not ImplementedBug #CREAM-89 --leaseId option doesn't work Not Implemented#CREAM-89 --leaseId option doesn't work. Not ImplementedBug #CREAM-90 --help option doesn't work on glite-ce-job-submit and glite-ce-event-query Not Implemented#CREAM-90 --help option doesn't work on glite-ce-job-submit and glite-ce-event-query. Not ImplementedBug #CREAM-91 man page missing for glite-ce-job-lease Not Implemented#CREAM-91 man page missing for glite-ce-job-lease. Not ImplementedTODO Bug #CREAM-102 cream job-wrapper gets stuck in EMI-3 if perusal is enabled Not Implemented#CREAM-102 cream job-wrapper gets stuck in EMI-3 if perusal is enabled. Not ImplementedFixes provided with CREAM 1.15.2Bug #101221 CREAM sends wrong authorization requests to Argus containing attributes with empty values. Not Implemented
Bug #101108 Minor issues from INFN-T1 Not Implemented
Fixes provided with CREAM 1.14.6TODO Bug #CREAM-127 - Mount info item not parsed if it contains colon - Not Implemented#CREAM-127 Mount info item not parsed if it contains colon Not ImplementedBug #CREAM-113 Enhancement of the CREAM DB API (Sec. Vuln. )Not Implemented#CREAM-113 Enhancement of the CREAM DB API Not ImplementedBug #CREAM-124 Fix a bug related to gridsite (Sec. Vuln. )Not Implemented#CREAM-124 Fix a bug related to gridsite Not ImplementedFixes provided with CREAM 1.14.5TODO Bug #CREAM-111 Wrong tokenization for epilogue argument list Not Implemented#CREAM-111 Wrong tokenization for epilogue argument list Not ImplementedTODO Bug #CREAM-101 Wrong time format for MaxWallClockTime Not Implemented#CREAM-101 Wrong time format for MaxWallClockTime. Not ImplementedTODO Bug #CREAM-99 Python error on ERT calculation: local variable 'est' referenced before assignment Not Implemented#CREAM-99 Python error on ERT calculation: local variable 'est' referenced before assignment. Not Implemented | ||||||||
Changed: | ||||||||
< < | Bug #CREAM-84 The DelegationPurger may cause a java.lang.OutOfMemoryError exception Not Implemented | |||||||
> > | TODO Bug #CREAM-84 The DelegationPurger may cause a java.lang.OutOfMemoryError exception Not Implemented | |||||||
#CREAM-84 The DelegationPurger may cause a java.lang.OutOfMemoryError exception. Not Implemented | ||||||||
Changed: | ||||||||
< < | Bug #CREAM-83 CREAM should be able to immediately kill also queued jobs, when the proxy is expiring Not Implemented | |||||||
> > | TODO Bug #CREAM-83 CREAM should be able to immediately kill also queued jobs, when the proxy is expiring Not Implemented | |||||||
#CREAM-83 CREAM should be able to immediately kill also queued jobs, when the proxy is expiring. Not Implemented
TODO Bug #CREAM-82 Check permission for /var/cream_sandbox Not Implemented#CREAM-82 Check permission for /var/cream_sandbox. Not ImplementedBug #CREAM-78 Check for /etc/lrms in config_cream_gip_scheduler_plugin Not Implemented#CREAM-78 Check for /etc/lrms in config_cream_gip_scheduler_plugin. Not ImplementedTODO Bug #CREAM-77 List index error from persistent estimator Not Implemented#CREAM-77 List index error from persistent estimator. Not ImplementedBug #CREAM-75 CREAM should avoid to log the error messages by including even the full stack trace (i.e printStackTrace()) Not Implemented#CREAM-75 CREAM should avoid to log the error messages by including even the full stack trace (i.e printStackTrace()). Not ImplementedFixes provided with CREAM 1.14.3Bug #99740 updateDelegationProxyInfo error: Rollback executed due to Deadlock Not Implemented
| ||||||||
Changed: | ||||||||
< < | Bug #99738 Under stress conditions due to job submissions, the command queue may accumulate thousand of job purging commands Not Implemented | |||||||
> > | TODO Bug #99738 Under stress conditions due to job submissions, the command queue may accumulate thousand of job purging commands Not Implemented | |||||||
JobPurger - purging 0 jobs with status REGISTERED <= Wed Jan 16 16:55:55 CET 2013 JobPurger - purging 0 jobs with status ABORTED <= Tue Jan 08 16:56:55 CET 2013 JobPurger - purging 0 jobs with status CANCELLED <= Fri Jan 18 16:51:55 CET 2013 JobPurger - purging 500 jobs with status DONE-OK <= Fri Jan 18 16:51:55 CET 2013 JobPurger - purging 0 jobs with status DONE-FAILED <= Tue Jan 08 16:56:55 CET 2013
Bug #98144 The switching off of the JobSubmissionManager makes the CREAM service not available for the users Not Implemented
<parameter name="JOB_SUBMISSION_MANAGER_ENABLE" value="false" />
service tomcat5 restart
"Received NULL fault; the error is due to another cause: FaultString=[CREAM service not available: configuration failed!] - FaultCode=[SOAP-ENV:Server] - FaultSubCode=[SOAP-ENV:Server]" TODO Bug #88134 JobWrapper doesn't handle correctly the jdl attribute “PerusalListFileURI” Not Implemented
[ Type="Job"; JobType="Normal"; Executable = "perusal.sh"; StdOutput = "stdout.log"; StdError = "stderr.log"; InputSandbox = "perusal.sh"; OutputSandbox = {"stdout.log", "stderr.log", "results.txt"}; PerusalFilesDestURI="gsiftp://cream-05.pd.infn.it/tmp"; PerusalFileEnable = true; PerusalTimeInterval = 20; outputsandboxbasedesturi="gsiftp://localhost"; PerusalListFileURI="gsiftp://cream-05.pd.infn.it/tmp/filelist.txt" ]perusal.sh #!/bin/sh i=0 while ((i < 10)) do date voms-proxy-info --all >&2 df >> results.txt sleep 10 let "i++" echo i = $i doneN.B: For this test, the file "gsiftp://cream-05.pd.infn.it/tmp/filelist.txt" must not exist!
Bug #95637 glite-ce-job-submit --help doesn't print out anything Not Implemented
Bug #95738 glite-ce-job-submit: error message to be improved if JDL file is missing Not Implemented
TODO Bug #95041 YAIM could check the format of CE_OTHERDESCR Not Implemented
TODO Bug #98440 Missing revision number in EndpointImplementationVersion Not Implemented
TODO Bug #98850 Empty ACBR list in SHARE variable Not Implemented
Bug #99072 Hard-coded reference to tomcat5.pid Not Implemented
Bug #99085 Improve parsing of my.cnf Not Implemented
TODO Bug #99282 Wrong regular expression for group.conf parsing Not Implemented
TODO Bug #99747 glite-info-dynamic-ce does not update GLUE2ComputingShareServingState Not Implemented
Bug #99823 SHA-1 algorithm for PKCS10 generation in CREAM delegation service Not Implemented
Fixes provided with CREAM 1.14.2TODO Bug #95328 In cluster mode, YAIM does not set GlueCEInfoHostName for CREAMs Not Implemented
TODO Bug #95973 Missing Glue capability in GLUE2EntityOtherInfo Not Implemented
Bug #96306 Wrong lowercase conversion for VO Tags Implemented
Bug #96310 Wrong lowercase conversion for Glue-1 VO Tags. Not Implemented
Bug #97441 CREAM: Unwanted auto-updating of the field "creationTime" on the creamdb database Implemented
Bug #96512 JobDBAdminPurger can't find commons-logging.jar Implemented
Bug #97106 CREAM JW - fatal_error: command not found. Not Implemented
TODO Bug #94418 The SIGTERM signal should be issued to all the processes belonging to the job. Not Implemented
Bug #98707 Wrong warning message form ArgusPEPClient configuration Not Implemented
Fixes provided with CREAM 1.14.1TODO Bug #89153 JobDBAdminPurger cannot purge jobs if CREAM DB is on another host - Not Implemented
Bug #95356 Better parsing for static definition files in lcg-info-dynamic-scheduler - Implemented
Bug #95480 CREAM doesn't transfert the output files remotely under well known conditions - Not Implemented
[ Type = "Job"; #VAR1 = "test1"; #VAR2 = "test2"; executable = "/bin/echo"; Arguments = "hello world!!!"; StdOutput="stdout"; StdError="stderr"; OutputSandbox = {"stdout","stderr"}; ]
Bug #95552 Malformed URL from glite-ce-glue2-endpoint-static - ImplementedRun the command/usr/libexec/glite-ce-glue2-endpoint-static /etc/glite-ce-glue2/glite-ce-glue2.conf | grep GLUE2EndpointURLand verify that the URL is correctly defined (contains ":") Example of the error: GLUE2EndpointURL: https://cream-48.pd.infn.it8443/ce-cream/services TOREMOVE Bug #95593 CREAM cannot insert in the command queue if the lenght of the localUser field is > 14 chars - Not Implemented
Bug #96055 Wrong DN format in logfiles for accounting - Not Implemented
Bug #93091 Add some resubmission machinery to CREAM - Not Implemented
"org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor - submission to BLAH failed [jobId=CREAMXXXXXXXXX; reason=BLAH error: submission command failed (exit code = 143) (stdout:) (stderr: <blah> execute_cmd: 200 seconds timeout expired, killing child process.-) N/A (jobId = CREAMXXXXXXXXX); retry count=1/3] org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor - sleeping 10 sec... org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor - sleeping 10 sec... done"
Fixes provided with CREAM 1.14Bug #59871 lcg-info-dynamic-software must split tag lines on white space - ImplementedTo verify the fix edit a VO.list file under =/opt/glite/var/info/cream-38.pd.infn.it/tag1 tag2 tag3Wait 3 minutes and then query the resource bdii, where you should see: ... GlueHostApplicationSoftwareRunTimeEnvironment: tag1 GlueHostApplicationSoftwareRunTimeEnvironment: tag2 GlueHostApplicationSoftwareRunTimeEnvironment: tag3 ... TODO Bug #68968 lcg-info-dynamic-software should protect against duplicate RTE tags - Not ImplementedTo verify the fix edit a VO.list file under/opt/glite/var/info/cream-38.pd.infn.it/VO adding:
tag1 tag2 TAG1 tag1Then query the resource bdii: ldapsearch -h <CE host> -x -p 2170 -b "o=grid" | grep -i tagThis should return: GlueHostApplicationSoftwareRunTimeEnvironment: tag1 GlueHostApplicationSoftwareRunTimeEnvironment: tag2 TODO Bug #69854 CreamCE should publish non-production state when job submission is disabled - Not ImplementedDisable job submission withglite-ce-disable-submission . Wait 3 minutes and then perform the following ldap query:
# ldapsearch -h <CREAM CE node> -x -p 2170 -b "o=grid" | grep GlueCEStateStatusFor each GlueCE this should return: GlueCEStateStatus: DrainingThen re-enable the submission. Edit the configuration file /etc/glite-ce-cream-utils/glite_cream_load_monitor.conf to trigger job submission disabling. E.g. change:
$MemUsage = 95;with: $MemUsage = 1;Wait 15 minutes and then perform the following ldap query: # ldapsearch -h <CREAM CE node> -x -p 2170 -b "o=grid" | grep GlueCEStateStatusFor each GlueCE this should return: GlueCEStateStatus: Draining Bug #69857 Job submission to CreamCE is enabled by restart of service even if it was previously disabled - ImplementedSTATUS: Implemented To test the fix:
`glite-ce-disable-submission host:port` command (provided by the CREAM CLI package installed on the UI), that can be issued only by a CREAM CE administrator, that is the DN of this person must be listed in the /etc/grid-security/admin-list file of the CE.
Output should be: "Operation for disabling new submissions succeeded"
glite-ce-enable-submission host:port` command (provided by the CREAM CLI package installed on the UI).
Output should be: "Job submission to this CREAM CE is disabled"
Bug #77791 CREAM installation does not fail if sudo is not installed - Not ImplementedTry to configure via yaim a CREAM-CE where the sudo executable is not installed, The configuration should fail saying:ERROR: sudo probably not installed ! TOREMOVE Bug #79362 location of python files provided with lcg-info-dynamic-scheduler-generic-2.3.5-0.sl5 - Not Implemented - obsoleteTo verify the fix, do a:rpm -ql dynsched-genericand verify that the files are installed in usr/lib/python2.4 and not more in /usr/lib/python .
TOREMOVE Bug #80295 Allow dynamic scheduler to function correctly when login shell is false - Not Implemented - overriden by https://savannah.cern.ch/bugs/?99747To verify the fix, log on the CREAM CE as user root and run:/sbin/runuser -s /bin/sh ldap -c "/usr/libexec/lcg-info-dynamic-scheduler -c /etc/lrms/scheduler.conf"It should return some information in ldif format Bug #80410 CREAM bulk submission CLI is desirable - Not ImplementedTo test the fix, specify multiple JDLs in theglite-ce-job-submit command, e.g.:
glite-ce-job-submit --debug -a -r cream-47.pd.infn.it:8443/cream-lsf-creamtest1 jdl1.jdl jdl2.jdl jdl3.jdlConsidering the above example, verify that 3 jobs are submitted and 3 jobids are returned. Bug #81734 removed conf file retrieve from old path that is not EMI compliant - Not ImplementedTo test the fix, create the conf file/etc/glite_cream.conf with the following content:
[ CREAM_URL_PREFIX="abc://"; ]Try then e.g. the following command: glite-ce-job-list --debug cream-47.pd.infn.itIt should report that it is trying to contact abc://cream-47.pd.infn.it:8443//ce-cream/services/CREAM2 :
2012-01-13 14:44:39,028 DEBUG - Service address=[abc://cream-47.pd.infn.it:8443//ce-cream/services/CREAM2]Move the conf file as /etc/VO/glite_cream.conf and repeat the test which should give the same result
Then move the conf file as ~/.glite/VO/glite_cream.conf and repeat the test which should give the same result
Bug #82206 yaim-cream-ce: BATCH_LOG_DIR missing among the required attributes - Not ImplementedTry to configure a CREAM CE with Torque using yaim without settingBLPARSER_WITH_UPDATER_NOTIFIER and without setting BATCH_LOG_DIR .
It should fail saying:
INFO: Executing function: config_cream_blah_check ERROR: BATCH_LOG_DIR is not set ERROR: Error during the execution of function: config_cream_blah_check Bug #83314 Information about the RTEpublisher service should be available also in glue2 - Not ImplementedCheck if the resource BDII publishes glue 2 GLUE2ComputingEndPoint objectclasses with GLUE2EndpointInterfaceName equal to org.glite.ce.ApplicationPublisher. If the CE is configured in no cluster mode there should be one of such objectclass. If the CE is configured in cluster mode and the gLite-CLUSTER is deployed on a different node, there shouldn't be any of such objectclasses.ldapsearch -h <CREAM CE hostname> -x -p 2170 -b "o=glue" "(&(objectclass=GLUE2ComputingEndPoint)(GLUE2EndpointInterfaceName=org.glite.ce.ApplicationPublisher))" Bug #83338 endpointType (in GLUE2ServiceComplexity) hardwired to 1 in CREAM CE is not always correct - ImplementedPerform the following query on the resource bdii of the CREAM CE:-p 2170 -b "o=glue" | grep -i endpointtypeendpointtype should be 3 if CEMon is deployed ( USE_CEMON is true). 2 otherwise.
Bug #83474 Some problems concerning glue2 publications of CREAM CE configured in cluster mode - Not ImplementedConfigure a CREAM CE in cluster mode, with the gLite-CLUSTER configured on a different host.
ldapsearch -h <CREAM CE hostname> -x -p 2170 -b "o=glue" objectclass=GLUE2ComputingService
ldapsearch -h <CREAM CE hostname> -x -p 2170 -b "o=glue" "(&(objectclass=GLUE2ComputingEndPoint)(GLUE2EndpointInterfaceName=org.glite.ce.CREAM))"
ldapsearch -h <CREAM CE hostname> -x -p 2170 -b "o=glue" objectclass=GLUE2Manager
ldapsearch -h <CREAM CE hostname> -x -p 2170 -b "o=glue" objectclass=GLUE2Share
ldapsearch -h <CREAM CE hostname> -x -p 2170 -b "o=glue" objectclass=GLUE2ExecutionEnvironment
ldapsearch -h <CREAM CE hostname> -x -p 2170 -b "o=glue" "(&(objectclass=GLUE2ComputingEndPoint)(GLUE2EndpointInterfaceName=org.glite.ce.ApplicationPublisher))" Bug #83592 CREAM client doesn't allow the delegation of RFC proxies - ImplementedCreate a RFC proxy, e.g.:voms-proxy-init -voms dteam -rfcand then submit using glite-ce-job-submit a job using ISB and OSB, e.g.:
[ executable="ssh1.sh"; inputsandbox={"file:///home/sgaravat/JDLExamples/ssh1.sh", "file:///home/sgaravat/a"}; stdoutput="out3.out"; stderror="err2.err"; outputsandbox={"out3.out", "err2.err", "ssh1.sh", "a"}; outputsandboxbasedesturi="gsiftp://localhost"; ]Verify that the final status is DONE-OK
Bug #83593 Problems limiting RFC proxies in CREAM - ImplementedConsider the same test done for bug #83592TODOBug #84308 Error on glite_cream_load_monitor if cream db is on another host - Not ImplementedConfigure a CREAM CE with the database installed on a different host than the CREAM CE. Run:/usr/bin/glite_cream_load_monitor /etc/glite-ce-cream-utils/glite_cream_load_monitor.conf --showwhich shouldn't report any error. Bug #86522 glite-ce-job-submit authorization error message difficoult to understand - Not ImplementedTo check this fix, try a submission towards a CREAM CE configured to use ARGUS when you are not authorized. You should see an error message like:$ glite-ce-job-submit -a -r emi-demo13.cnaf.infn.it:8443/cream-lsf-demo oo.jdl 2012-05-07 20:26:51,649 FATAL - CN=Massimo Sgaravatto,L=Padova,OU=Personal Certificate,O=INFN,C=IT not authorized for {http://www.gridsite.org/namespaces/delegation-2}getProxyReqand not like the one reported in the savannah bug. TOREMOVE Bug #86609 yaim variable CE_OTHERDESCR not properly managed for Glue2 - Not Implemented - merge with https://savannah.cern.ch/bugs/?95041Try to set the yaim variableCE_OTHERDESCR to:
CE_OTHERDESCR="Cores=1"Perform the following ldap query on the resource bdii: ldapsearch -h <CREAM CE node> -x -p 2170 -b "o=glue" objectclass=GLUE2ExecutionEnvironment GLUE2EntityOtherInfoThis should also return: GLUE2EntityOtherInfo: Cores=1Try then to set the yaim variable CE_OTHERDESCR to:
CE_OTHERDESCR="Cores=1,Benchmark=150-HEP-SPEC06and reconfigure via yaim. Perform the following ldap query on the resource bdii: ldapsearch -h <CREAM CE node> -x -p 2170 -b "o=glue" objectclass=GLUE2ExecutionEnvironment GLUE2EntityOtherInfoThis should also return: GLUE2EntityOtherInfo: Cores=1Then perform the following ldap query on the resource bdii: ldapsearch -h <CREAM CE node> -x -p 2170 -b "o=glue" objectclass=Glue2BenchmarkThis should return something like: dn: GLUE2BenchmarkID=cream-47.pd.infn.it_hep-spec06,GLUE2ResourceID=cream-47.pd.infn.it,GLUE2ServiceID=cream-47.pd.infn.it_ComputingElement,GLUE2GroupID=re source,o=glue GLUE2BenchmarkExecutionEnvironmentForeignKey: cream-47.pd.infn.it GLUE2BenchmarkID: cream-47.pd.infn.it_hep-spec06 GLUE2BenchmarkType: hep-spec06 objectClass: GLUE2Entity objectClass: GLUE2Benchmark GLUE2EntityCreationTime: 2012-01-13T14:04:48Z GLUE2BenchmarkValue: 150 GLUE2EntityOtherInfo: InfoProviderName=glite-ce-glue2-benchmark-static GLUE2EntityOtherInfo: InfoProviderVersion=1.0 GLUE2EntityOtherInfo: InfoProviderHost=cream-47.pd.infn.it GLUE2BenchmarkComputingManagerForeignKey: cream-47.pd.infn.it_ComputingElement_Manager GLUE2EntityName: Benchmark hep-spec06 TOREMOVE Bug #86694 A different port number than 9091 should be used for LRMS_EVENT_LISTENER - Not ImplementedOn a running CREAM CE, perform the following command:netstat -an | grep -i 9091This shouldn't return anything. Then perform the following command: netstat -an | grep -i 49152This should return: tcp 0 0 :::49152 :::* LISTEN[root@cream-47 ~]# netstat -an | grep -i 49153 [root@cream-47 ~]# netstat -an | grep -i 49154 [root@cream-47 ~]# netstat -an | grep -i 9091 Bug #86697 User application's exit code not recorded in the CREAM log file - Not ImplementedSubmit a job and wait for its completion. Then check the glite-ce-cream.log file on the CREAM CE. The user exit code should be reported (filedexitCode ), e.g.:
13 Jan 2012 15:22:52,966 org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor - JOB CREAM124031222 STATUS CHANGED: REALLY-RUNNING => DONE-OK [failureReason=reason=0] [exitCode=23] [localUser=dteam004] [workerNode=prod-wn-001.pn.pd.infn.it] [delegationId=7a52772caaeea96628a1ff9223e67a1f6c6dde9f] Bug #86737 A different port number than 9909 should be used for CREAM_JOB_SENSOR - Not ImplementedOn a running CREAM CE, perform the following command:netstat -an | grep -i 9909This shouldn't return anything. Bug #86773 wrong /etc/glite-ce-cream/cream-config.xml with multiple ARGUS servers set - Not ImplementedTo test the fix, set in the siteinfo,def:USE_ARGUS=yes ARGUS_PEPD_ENDPOINTS="https://cream-46.pd.infn.it:8154/authz https://cream-46-1.pd.infn.it:8154/authz" CREAM_PEPC_RESOURCEID="http://pd.infn.it/cream-47"i.e. 2 values for ARGUS_PEPD_ENDPOINTS .
Then configure via yaim.
In /etc/glite-ce-cream/cream-config.xml there should be:
<argus-pep name="pep-client1" resource_id="http://pd.infn.it/cream-47" cert="/etc/grid-security/tomcat-cert.pem" key="/etc/grid-security/tomcat-key.pem" passwd="" mapping_class="org.glite.ce.cream.authz.argus.ActionMapping"> <endpoint url="https://cream-46.pd.infn.it:8154/authz" /> <endpoint url="https://cream-46-1.pd.infn.it:8154/authz" /> </argus-pep> TOREMOVE Bug #87690 Not possible to map different queues to different clusters for CREAM configured in cluster mode - Not Implemented - just one cluster supportedConfigure via yaim a CREAM CE in cluster mode with different queues mapped to different clusters, e.g.:CREAM_CLUSTER_MODE=yes CE_HOST_cream_47_pd_infn_it_QUEUES="creamtest1 creamtest2" QUEUE_CREAMTEST1_CLUSTER_UniqueID=cl1id QUEUE_CREAMTEST2_CLUSTER_UniqueID=cl2idThen query the resource bdii of the CREAM, and check the GlueForeignKey attributes of the different glueCEs: they should refer to the specified clusters:
ldapsearch -h cream-47.pd.infn.it -p 2170 -x -b o=grid objectclass=GlueCE GlueForeignKey # extended LDIF # # LDAPv3 # base <o=grid> with scope subtree # filter: objectclass=GlueCE # requesting: GlueForeignKey # # cream-47.pd.infn.it:8443/cream-lsf-creamtest2, resource, grid dn: GlueCEUniqueID=cream-47.pd.infn.it:8443/cream-lsf-creamtest2,Mds-Vo-name=r esource,o=grid GlueForeignKey: GlueClusterUniqueID=cl12d # cream-47.pd.infn.it:8443/cream-lsf-creamtest1, resource, grid dn: GlueCEUniqueID=cream-47.pd.infn.it:8443/cream-lsf-creamtest1,Mds-Vo-name=r esource,o=grid GlueForeignKey: GlueClusterUniqueID=cl1id Bug #87799 Add yaim variables to configure the GLUE 2 WorkingArea attributes - Not ImplementedSet all (or some) of the following yaim variables:WORKING_AREA_SHARED WORKING_AREA_GUARANTEED WORKING_AREA_TOTAL WORKING_AREA_FREE WORKING_AREA_LIFETIME WORKING_AREA_MULTISLOT_TOTAL WORKING_AREA_MULTISLOT_FREE WORKING_AREA_MULTISLOT_LIFETIMEand then configure via yaim. Then query the resource bdii of the CREAM CE and verify that the relevant attributes of the glue2 ComputingManager object are set. TOREMOVE Bug #88078 CREAM DB names should be configurable - Not ImplementedConfigure from scratch a CREAM CE setting the yaim variables:CREAM_DB_NAME and DELEGATION_DB_NAME , e.g.:
CREAM_DB_NAME=abc DELEGATION_DB_NAME=xyzand then configure via yaim. Then check if the two databases have been created: # mysql -u xxx -p Enter password: Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 7176 Server version: 5.0.77 Source distribution Type 'help;' or '\h' for help. Type '\c' to clear the buffer. mysql> show databases; +--------------------+
Bug #89489 yaim plugin for CREAM CE does not execute a check function due to name mismatch - ImplementedConfigure a CREAM CE via yaim and save the yaim output. It should contain the string:INFO: Executing function: config_cream_gip_scheduler_plugin_check TOREMOVE Bug #89664 yaim-cream-ce doesn't manage spaces in CE_OTHERDESCR - Not Implemented - merge with https://savannah.cern.ch/bugs/?95041Try to set the yaim variableCE_OTHERDESCR to:
CE_OTHERDESCR="Cores=1"Perform the following ldap query on the resource bdii: ldapsearch -h <CREAM CE node> -x -p 2170 -b "o=glue" objectclass=GLUE2ExecutionEnvironment GLUE2EntityOtherInfoThis should also return: GLUE2EntityOtherInfo: Cores=1Try then to set the yaim variable CE_OTHERDESCR to:
CE_OTHERDESCR="Cores=2, Benchmark=4-HEP-SPEC06"and reconfigure via yaim. Perform the following ldap query on the resource bdii: ldapsearch -h <CREAM CE node> -x -p 2170 -b "o=glue" objectclass=GLUE2ExecutionEnvironment GLUE2EntityOtherInfoThis should also return: GLUE2EntityOtherInfo: Cores=2Then perform the following ldap query on the resource bdii: ldapsearch -h <CREAM CE node> -x -p 2170 -b "o=glue" objectclass=Glue2BenchmarkThis should return something like: # cream-47.pd.infn.it_hep-spec06, cream-47.pd.infn.it, ppp, resource, glue dn: GLUE2BenchmarkID=cream-47.pd.infn.it_hep-spec06,GLUE2ResourceID=cream-47.pd.infn.it,GLUE2ServiceID=ppp,GLUE2GroupID=resource,o=glue GLUE2BenchmarkExecutionEnvironmentForeignKey: cream-47.pd.infn.it GLUE2BenchmarkID: cream-47.pd.infn.it_hep-spec06 GLUE2BenchmarkType: hep-spec06 objectClass: GLUE2Entity objectClass: GLUE2Benchmark GLUE2EntityCreationTime: 2012-01-13T17:07:52Z GLUE2BenchmarkValue: 4 GLUE2EntityOtherInfo: InfoProviderName=glite-ce-glue2-benchmark-static GLUE2EntityOtherInfo: InfoProviderVersion=1.0 GLUE2EntityOtherInfo: InfoProviderHost=cream-47.pd.infn.it GLUE2BenchmarkComputingManagerForeignKey: ppp_Manager GLUE2EntityName: Benchmark hep-spec06 Bug #89784 Improve client side description of authorization failure - Not ImplementedTry to remove the lsc files for your VO and try a submission to that CE. It should return an authorization error. Then check the glite-ce-cream.log. It should report something like:13 Jan 2012 18:21:21,270 org.glite.voms.PKIVerifier - Cannot find usable certificates to validate the AC. Check that the voms server host certificate is in your vomsdir directory. 13 Jan 2012 18:21:21,602 org.glite.ce.commonj.authz.gjaf.LocalUserPIP - glexec error: [gLExec]: LCAS failed, see '/var/log/glexec/lcas_lcmaps.log' for more info. 13 Jan 2012 18:21:21,603 org.glite.ce.commonj.authz.gjaf.ServiceAuthorizationChain - Failed to get the local user id via glexec: glexec error: [gLExec]: LCAS failed, see '/var/log/glexec/lcas_lcmaps.log' for more info. org.glite.ce.commonj.authz.AuthorizationException: Failed to get the local user id via glexec: glexec error: [gLExec]: LCAS failed, see '/var/log/glexec/lcas_lcmaps.log' for more info. Bug #91819 glite_cream_load_monitor should read the thresholds from a conf file ImplementedTested through thelimiter test of the Robot based test-suite
TODO Bug #92102 Tomcat attributes in the CREAM CE should be configurable via yaim - Not ImplementedSet in siteinfo.def:CREAM_JAVA_OPTS_HEAP="-Xms512m -Xmx1024m"and configure via yaim. The check /etc/tomca5.conf (in EMI2 SL5 X86_64 /etc/tomcat5/tomcat5.conf), where there should be: JAVA_OPTS="${JAVA_OPTS} -server -Xms512m -Xmx1024m" TODO Bug #92338 CREAM load limiter should not disable job submissions when there is no swap space - Not ImplementedTo test the fix, consider a CREAM CE on a machine without swap. Verify that the limiter doesn't disable job submissions. Note: to show swap partition: cat /proc/swaps to check swap: top to disable swap: swapoff -a (or swapoff /Bug #93768 There's a bug in logfile handling - Not ImplementedTo verify the fix, try the--logfile option with e.g. the glite-ce-job-submit command.
Berify that the log file is created in the specified path
Fixes provided with CREAM 1.13.4Bug #95480 CREAM doesn't transfert the output files remotely under well known conditions - see Fix for 1.14.1Fixes provided with CREAM 1.13.3Bug #81561 Make JobDBAdminPurger script compliant with CREAM EMI environment. - ImplementedSTATUS: Implemented To test the fix, simply run on the CREAM CE as root the JobDBAdminPurger.sh. E.g.:# JobDBAdminPurger.sh -c /etc/glite-ce-cream/cream-config.xml -u <user> -p <passwd> -s DONE-FAILED,0 START jobAdminPurgerIt should work without reporting error messages: ----------------------------------------------------------- Job CREAM595579358 is going to be purged ... - Job deleted. JobId = CREAM595579358 CREAM595579358 has been purged! ----------------------------------------------------------- STOP jobAdminPurgerStarting from EMI2 the command to run is: JobDBAdminPurger.sh -c /etc/glite-ce-cream/cream-config.xml -s DONE-FAILED,0 Bug #83238 Sometimes CREAM does not update the state of a failed job. - ImplementedSTATUS: Implemented To test the fix, try to kill by hand a job. The status of the job should eventually be:Status = [DONE-FAILED] ExitCode = [N/A] FailureReason = [Job has been terminated (got SIGTERM)] Bug #83749 JobDBAdminPurger cannot purge jobs if configured sandbox dir has changed. - ImplementedSTATUS: Not implemented To test the fix, submit some jobs and then reconfigure the service with a different value ofCREAM_SANDBOX_PATH . Then try, with the JobDBAdminPurger.sh script, to purge some jobs submitted before the switch.
It must be verified:
Bug #84374 yaim-cream-ce: GlueForeignKey: GlueCEUniqueID: published using : instead of=. - ImplementedSTATUS: Implemented To test the fix, query the resource bdii of the CREAM-CE:ldapsearch -h <CREAM CE host> -x -p 2170 -b "o=grid" | grep -i foreignkey | grep -i glueceuniqueidEntries such as: GlueForeignKey: GlueCEUniqueID=cream-35.pd.infn.it:8443/cream-lsf-creamtest1i.e.: GlueForeignKey: GlueCEUniqueID=<CREAM CE ID>should appear. Bug #86191 No info published by the lcg-info-dynamic-scheduler for one VOView - ImplementedSTATUS: Implemented To test the fix, issue the following ldapsearch query towards the resource bdii of the CREAM-CE:$ ldapsearch -h cream-35 -x -p 2170 -b "o=grid" | grep -i GlueCEStateWaitingJobs | grep -i 444444It should not find anything Bug #87361 The attribute cream_concurrency_level should be configurable via yaim. - ImplementedSTATUS: Implemented To test the fix, set inseiteinfo.def the variable CREAM_CONCURRENCY_LEVEL to a certain number (n). After configuration verify that in =/etc/glite-ce-cream/cream-config.xml there is:
cream_concurrency_level="n"
Bug #87492 CREAM doesn't handle correctly the jdl attribute "environment". - ImplementedSTATUS: Implemented To test the fix, submit the following JDL usingglite-ce-job-submit :
Environment = { "GANGA_LCG_VO='camont:/camont/Role=lcgadmin'", "LFC_HOST='lfc0448.gridpp.rl.ac.uk'", "GANGA_LOG_HANDLER='WMS'" }; executable="/bin/env"; stdoutput="out.out"; outputsandbox={"out.out"}; outputsandboxbasedesturi="gsiftp://localhost";When the job is done, retrieve the output and check that in out.out the variables GANGA_LCG_VO , LFC_HOST and GANGA_LOG_HANDLER have exactly the values defined in the JDL.
gLite-CLUSTERFixes provided with glite-CLUSTER v. 2.0.1Bug #CREAM-96 CLUSTER Yaim generates incorrect Subcluster configuration Not Implemented#CREAM-96 CLUSTER Yaim generates incorrect Subcluster configuration Not implementedBug #CREAM-98 Generated configuration files conflict with CREAM ones Not Implemented#CREAM-98 Generated configuration files conflict with CREAM ones Not implementedBug #CREAM-100 Remove lcg prefix from template Not Implemented[https://issues.infn.it/jira/browse/CREAM-100?focusedCommentId=29883&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-29883][#CREAM-100]] Remove lcg prefix from template Not implementedBug #CREAM-97 Wrong GLUE output format in a CREAM+Cluster installation Not Implemented#CREAM-97 Wrong GLUE output format in a CREAM+Cluster installation Not implementedBug #CREAM-95 CE_* YAIM variables not mandatory for GLUE2 in cluster mode Not Implemented#CREAM-95 CE_* YAIM variables not mandatory for GLUE2 in cluster mode Not implementedFixes provided with previous versionsBug #69318 The cluster publisher needs to publish in GLUE 2 too Not implemented
ldapsearch -h <gLite-CUSTER hostname> -x -p 2170 -b "o=glue" objectclass=GLUE2ComputingService
ldapsearch -h <gLite-CUSTER hostname> -x -p 2170 -b "o=glue" objectclass=GLUE2Manager
ldapsearch -h <gLite-CUSTER hostname> -x -p 2170 -b "o=glue" objectclass=GLUE2Share
ldapsearch -h <gLite-CUSTER hostname> -x -p 2170 -b "o=glue" objectclass=GLUE2ExecutionEnvironment
ldapsearch -h <gLite-CUSTER hostname> -x -p 2170 -b "o=glue" "(&(objectclass=GLUE2ComputingEndPoint)(GLUE2EndpointInterfaceName=org.glite.ce.ApplicationPublisher))" Bug #86512 YAIM CLuster Publisher incorrectly configures GlueClusterService and GlueForeignKey for CreamCEs- Not implementedTo test the fix issue a ldapsearch such as:ldapsearch -h <gLite-CLUSTER> -x -p 2170 -b "o=grid" | grep GlueClusterServiceThen issue a ldapsearch such as: ldapsearch -h <gLite-CLUSTER> -x -p 2170 -b "o=grid" | grep GlueForeignKey | grep -v SiteVerify that for each returned line, the format is: <hostname>:8443/cream-<lrms>-<queue> Bug #87691 Not possible to map different queues of the same CE to different clusters - Not implementedTo test this fix, configure a gLite-CLUSTER with at least two different queues mapped to different clusters (use the yaim variablesQUEUE_ ), e.g."
QUEUE_CREAMTEST1_CLUSTER_UniqueID=cl1id QUEUE_CREAMTEST2_CLUSTER_UniqueID=cl2idThen query the resource bdii of the gLite-CLUSTER and verify that:
Bug #87799 Add yaim variables to configure the GLUE 2 WorkingArea attributes - Not implementedSet all (or some) of the following yaim variables:WORKING_AREA_SHARED WORKING_AREA_GUARANTEED WORKING_AREA_TOTAL WORKING_AREA_FREE WORKING_AREA_LIFETIME WORKING_AREA_MULTISLOT_TOTAL WORKING_AREA_MULTISLOT_FREE WORKING_AREA_MULTISLOT_LIFETIMEand then configure via yaim. Then query the resource bdii of the gLite cluster and verify that the relevant attributes of the glue2 ComputingManager object are set. CREAM Torque moduleFixes provided with CREAM TORQUE module 2.1.2TODO CREAM #119 - Wrong total cpu count from PBS infoprovider Not ImplementedCREAM #119 Wrong total cpu count from PBS infoprovider Not ImplementedFixes provided with CREAM TORQUE module 2.1.1CREAM #101 - Wrong time format for MaxWallClockTime Not ImplementedCREAM #101 Wrong time format for MaxWallClockTime Not ImplementedCREAM #107 - Bad timezone format Not ImplementedCREAM #107 Bad timezone format Not ImplementedFixes provided with CREAM TORQUE module 2.0.1TODO Bug #95184 Missing real value for GlueCEPolicyMaxSlotsPerJob Not Implemented
TODO Bug #96636 Time limits for GLUE 2 are different to GLUE 1 Not Implemented
TODO Bug #99639 lcg-info-dynamic-scheduler-pbs cannot parse qstat output with spurious lines Not Implemented
Fixes from previous releasesBug #17325 Default time limits not taken into account - Not implementedTo test the fix for this bug, consider a PBS installation where for a certain queue both default and max values are specified, e.g.:resources_max.cput = A resources_max.walltime = B resources_default.cput = C resources_default.walltime = DVerify that the published value for GlueCEPolicyMaxCPUTime is C and that the published value for GlueCEPolicyMaxWallClockTime is D Bug #49653 lcg-info-dynamic-pbs should check pcput in addition to cput - Not implementedTo test the fix for this bug, consider a PBS installation where for a certain queue both cput and pcput max values are specified, e.g.:resources_max.cput = A resources_max.pcput = BVerify that the published value for GlueCEPolicyMaxCPUTime is the minimum between A an B. Then consider a PBS installation where for a certain queue both cput and pcput max and default values are specified, e.g.: resources_max.cput = C resources_default.cput = D resources_max.pcput = E resources_default.pcput = FVerify that the published value for GlueCEPolicyMaxCPUTime is the minimum between D and F. TOREMOVE Bug #76162 YAIM for APEL parsers to use the BATCH_LOG_DIR for the batch system log location - Not implemented - obsoleteTo test the fix for this bug, set the yaim variableBATCH_ACCT_DIR and configure via yaim.
Check the file /etc/glite-apel-pbs/parser-config-yaim.xml and verify the section:
<Logs searchSubDirs="yes" reprocess="no"> <Dir>X</Dir>X should be the value specified for BATCH_ACCT_DIR .
Then reconfigure without setting BATCH_ACCT_DIR .
Check the file /etc/glite-apel-pbs/parser-config-yaim.xml and verify that the directory name is ${TORQUE_VAR_DIR}/server_priv/accounting
TOREMOVE Bug #77106 PBS info provider doesn't allow - in a queue name - Not implemented - obsoleteTo test the fix, configure a CREAM CE in a PBS installation where at least a queue has a - in its name. Then log as root on the CREAM CE and run:/sbin/runuser -s /bin/sh ldap -c "/var/lib/bdii/gip/plugin/glite-info-dynamic-ce"Check if the returned information is correct. CREAM LSF moduleFixes provided with CREAM LSF module 2.0.3CREAM #114 - Execution error: Uncaught exception Not ImplementedCREAM #114 Execution error: Uncaught exception Not ImplementedBug #88720 Too many '9' in GlueCEPolicyMaxCPUTime for LSF - Not implementedTo test the fix, query the CREAM CE resource bdii in the following way:ldapsearch -h <CREAM CE node> -x -p 2170 -b "o=grid" | grep GlueCEPolicyMaxCPUTime | grep 9999999999This shouldn't return anything. Bug #89767 The LSF dynamic infoprovider shouldn't publish GlueCEStateFreeCPUs and GlueCEStateFreeJobSlots - Not implementedTo test the fix, log as root on the CREAM CE and run:/sbin/runuser -s /bin/sh ldap -c "/var/lib/bdii/gip/plugin/glite-info-dynamic-ce"Among the returned information, there shouldn't be GlueCEStateFreeCPUs and GlueCEStateFreeJobSlots. Bug #89794 LSF info provider doesn't allow - in a queue name - Not implementedTo test the fix, configure a CREAM CE in a LSF installation where at least a queue has a - in its name. Then log as root on the CREAM CE and run:/sbin/runuser -s /bin/sh ldap -c "/var/lib/bdii/gip/plugin/glite-info-dynamic-ce"Check if the returned information is correct. Bug #90113 missing yaim check for batch system - Not implementedTo test the fix, configure a CREAM CE without having also installed LSF. yaim installation should fail saying that there were problems with LSF installation.CREAM SLURM moduleFixes provided with CREAM SLURM module 1.0.1CREAM #116 - Missing CE information from SLURM infoprovider Not ImplementedCREAM #116 Missing CE information from SLURM infoprovider Not Implemented
|