Tags:
,
view all tags
---+!! Regression Test Work Plan %TOC% ---+ BLAH #FixesBLAH1182 ---++ Fixes provided with BLAH 1.18.2 ---+++ [[https://savannah.cern.ch/bugs/?97491][Bug #97491]] (BUpdaterLSF should not execute any bhist query if all the bhist related conf parameter are set to "no") %NAVY%Hard to test%ENDCOLOR% - %RED%Not implemented%ENDCOLOR% * the fix has been fully tested by CERN * the following is the report provided by Ulrich: <verbatim> We are running this patch in production at CERN. The patch is motivated by the fact that bhist calls are very expensive and calls to the command don't scale. Running the command makes both the CE and the batch master unresponsive and has therefore a severe impact on the system performance. While discussing the issue with the CREAM developers we found out that it is possible to obsolete these calls and replace them with less expensive batch system queries. For that we use the btools tool suite which provides a set of additional LSF batch system queries which return output in machine readable form. Pre-requisites: --------------- Note on btools: btools are shipped in source code with the LSF information provider plugin. Compiling them requires LSF header files. Binaries depend on the LSF version which is used, therefore they cannot be shipped or automatically build due to licensing reasons. Building Instructions: - ensure that the LSF headers are installed on your build host # yum install gcc rpmbuild automake autoconf info-dynamic-scheduler-lsf-btools-2.2.0-1.noarch.rpm # cd /tmp # tar -zxvf /usr/src/egi/btools.src.tgz # cd btools # ./autogen.sh # make rpm and install the resulting rpm Patch version: -------------- On EMI1 CEs (our production version) we are using a private build of the patch glite-ce-blahp-1.16.99-0_201208291258.slc5 On EMI2 we've been testing a new build glite-ce-blahp-1.18.1-2 (Both rpms are private builds we got from Massimo Mezzadri. Configuration of the patch -------------------------- in /etc/blah.conf we set: # use btools to obsolete bhist calls bupdater_use_btools=yes bupdater_btools_path=/usr/bin # bupdater_use_bhist_for_susp=no bupdater_use_bhist_for_killed=no bupdater_use_bhist_for_idle=no bupdater_use_bhist_time_constraint=no # CERN add caching for LSF queries lsf_batch_caching_enabled=yes batch_command_caching_filter=/usr/libexec/runcmd The runcmd command is shipped with the LSF information providers. You need at least info-dynamic-scheduler-lsf-2.2.0-1. In our configuration we cache all batch system responses and share them using an NFS file system. The cache directory is a convenient way to check if any bhist calls are done by any of the CEs by just checking for a cache file. With the above settings there are no such calls any longer. </verbatim> ---+++ [[https://savannah.cern.ch/bugs/?95385][Bug #95385]] (Misleading message when Cream SGE aborts jobs requesting more than one CPU) %RED%Not implemented%ENDCOLOR% Fully tested by CERN: the test requires the CERN environment based on SGE. #FixesBLAH1181 ---++ Fixes provided with BLAH 1.18.1 ---+++ [[https://savannah.cern.ch/bugs/?94414][Bug #94414]] (BLParserLSF could crash if a suspend on an idle job is done) %GREEN%Implemented%ENDCOLOR% Try to suspend a job whose status is "IDLE" and verify that the daemon !BLParserLSF doesn't crash. ---+++ [[https://savannah.cern.ch/bugs/?94519][Bug #94519]] (Updater for LSF can misidentify killed jobs as finished) %GREEN%Implemented%ENDCOLOR% Verify that the value for bupdater_use_bhist_for_killed is set to yes, submit and cancel a job and verify that the status of the job reports a "jobstatus=3" ---+++ [[https://savannah.cern.ch/bugs/?94712][Bug #94712]] (Due to a timestamp problem bupdater for LSF can leave job in IDLE state) %NAVY%Hard to Reproduce%ENDCOLOR% - %RED%Not implemented%ENDCOLOR% Not easy to reproduce (unpredictable behaviour) ---+++ [[https://savannah.cern.ch/bugs/?95392][Bug #95392]] Heavy usage of 'bjobsinfo' still hurts LSF @ CERN %NAVY%Cannot be reproduced%ENDCOLOR% - %RED%Not implemented%ENDCOLOR% This is a cosmetic update required by CERN team; it can be reproduced only using the tools developed at CERN. ---++ Fixes provided with BLAH 1.18.0 ---+++ [[https://savannah.cern.ch/bugs/?84261][Bug #84261]] BNotifier on CREAM CE seems to not restart cleanly %GREEN%Implemented%GREEN% To test the fix, configure a CREAM CE using the new blparser. Then try a: <verbatim> service gLite restart </verbatim> It shouldn't report the error message: <verbatim> Starting BNotifier: /opt/glite/bin/BNotifier: Error creating and binding socket: Address already in use </verbatim> ---+++ [[https://savannah.cern.ch/bugs/?86238][Bug #86238]] blahpd doesn't check the status of its daemons when idling %GREEN%Implemented%ENDCOLOR% To test the fix configure a CREAM CE with the new blparser. Don't use it (i.e. do not submit jobs nor issue any other commands). kill the budater and bnotifier processes. Wait for 1 minute: you should see that the bupdater and bnotifier have been restarted. ---+++ [[https://savannah.cern.ch/bugs/?86918][Bug #86918]] Request for passing all submit command attributes to the local configuration script. %RED%Not implemented%ENDCOLOR% To test this fix, create/edit the =/usr/libexec/pbs_local_submit_attributes.sh= (for PBS) script adding: <verbatim> export gridType x509UserProxyFQAN uniquejobid queue ceid VirtualOrganisation ClientJobId x509UserProxySubject env > /tmp/filewithenv </verbatim> Edit the =/etc/blah,config= file adding: <verbatim> blah_pass_all_submit_attributes=yes </verbatim> Submit a job. In the CREAM CE the file =/tmp/filewithenv= should be created and it should contain the setting of some variables, including the ones exported in the =/usr/libexec/pbs_local_submit_attributes.sh= script. Then edit the =/etc/blah,config= file, removing the previously added line, and adding: <verbatim> blah_pass_submit_attributes[0]="x509UserProxySubject" blah_pass_submit_attributes[1]="x509UserProxyFQAN" </verbatim> Submit a job. In the CREAM CE the file =/tmp/filewithenv= should be created and it should contain the setting of some variables, including =x509UserProxySubject= and =x509UserProxyFQAN=. ---+++ [[https://savannah.cern.ch/bugs/?90085][Bug #90085]] Suspend command doesn't work with old parser %GREEN%Implemented%ENDCOLOR% To test the fix configure a CREAM CE with the old blparser. Then submit a job and after a while suspend it using the =glite-ce-job-suspend= command. Check the job status which eventually should be =HELD=. ---+++ [[https://savannah.cern.ch/bugs/?90085][Bug #90331]] %RED%Not implemented%ENDCOLOR% To test the fix submit a job, ban yourself in the ce (check [[https://wiki.italiangrid.it/twiki/bin/view/CREAM/ServiceReferenceCard#How_to_block_ban_a_user][here]] how to ban a user) and try a glite-ce-job-status. It should throw an authorization fault. ---+++ [[https://savannah.cern.ch/bugs/?90927][Bug #90927]] Problem with init script for blparser %RED%Not implemented%ENDCOLOR% To check the fix, try to stop/start the blparser: <verbatim> service glite-ce-blparser start / stop </verbatim> Then verify that the blparser has indeed been started/stopped ---+++ [[https://savannah.cern.ch/bugs/?91318][Bug #91318]] Request to change functions in blah_common_submit_functions.sh %RED%Not implemented%ENDCOLOR% Verify that in =/usr/libexec/blah_common_submit_functions.sh= there is this piece of code: <verbatim> function bls_add_job_wrapper () { bls_start_job_wrapper >> $bls_tmp_file bls_finish_job_wrapper >> $bls_tmp_file bls_test_working_dir } </verbatim> ---+++ [[https://savannah.cern.ch/bugs/?90101][Bug #90101: Missing 'Iwd' Attribute when trasferring files with the 'TransferInput' attribute may cause thread to loop]] %RED% TBD %ENDCOLOR% ---+++ [[https://savannah.cern.ch/bugs/?92554][Bug #92554: BNotifier problem can leave connection in CLOSE_WAIT state]] %RED% TBD %ENDCOLOR% ---+++ [[https://savannah.cern.ch/bugs/?89504][Bug #89504: Repeated notification problem for BLParserLSF]] %RED% TBD %ENDCOLOR% ---+++ [[https://savannah.cern.ch/bugs/?90082][Bug #90082: BUpdaterPBS workaround if tracejob is in infinite loop]] %RED% TBD %ENDCOLOR% ---++ Fixes provided with BLAH 1.16.6 See [[#FixesBLAH1181][Fixes provided with BLAH 1.18.1]] ---++ Fixes provided with BLAH 1.16.5 ---+++ [[https://savannah.cern.ch/bugs/?89527][Bug #89527]] BLAHP produced -W stage(in/out) directives are incompatible with Torque 2.5.8 %RED%Not implemented%ENDCOLOR% To test this fix, configure a CREAM CE with PBS/Torque 2.5.8. If this is not possible and you have another torque version, apply the change documented at: https://wiki.italiangrid.it/twiki/bin/view/CREAM/TroubleshootingGuide#5_1_Saving_the_batch_job_submiss to save the submission script. Submit a job and check in /tmp the pbs job submission script. It should contain something like: <verbatim> #PBS -W stagein=\'CREAM610186385_jobWrapper.sh.18757.13699.1328001723@cream-38.pd.infn.it:/var/c\ ream_sandbox/dteam/CN_Massimo_Sgaravatto_L_Padova_OU_Personal_Certificate_O_INFN_C_IT_dteam_Role\ _NULL_Capability_NULL_dteam042/61/CREAM610186385/CREAM610186385_jobWrapper.sh,cre38_610186385.pr\ oxy@cream-38.pd.infn.it:/var/cream_sandbox/dteam/CN_Massimo_Sgaravatto_L_Padova_OU_Personal_Cert\ ificate_O_INFN_C_IT_dteam_Role_NULL_Capability_NULL_dteam042/proxy/5a34c64e2a8db2569284306e9a472\ 3d2d40045a7_13647008746533\' #PBS -W stageout=\'out_cre38_610186385_StandardOutput@cream-38.pd.infn.it:/var/cream_sandbox/dte\ am/CN_Massimo_Sgaravatto_L_Padova_OU_Personal_Certificate_O_INFN_C_IT_dteam_Role_NULL_Capability\ _NULL_dteam042/61/CREAM610186385/StandardOutput,err_cre38_610186385_StandardError@cream-38.pd.in\ fn.it:/var/cream_sandbox/dteam/CN_Massimo_Sgaravatto_L_Padova_OU_Personal_Certificate_O_INFN_C_I\ T_dteam_Role_NULL_Capability_NULL_dteam042/61/CREAM610186385/StandardError\' </verbatim> i.e. a stagein and a stageout directives, with escaped quotes around the whole lists. ---+++ [[https://savannah.cern.ch/bugs/?91037][Bug #91037]] BUpdaterLSF should use bjobs to detect final job state %RED%Not implemented%ENDCOLOR% To test the fix, configure a CREAM CE with LSF with the new blparser. Then edit =blah.config= setting: <verbatim> bupdater_debug_level=3 </verbatim> Delete the bupdater log file and restart the blparser. Submit a job and wait for its completion and wait till then a notification with status 4 is logged in the bnotifier log file. grep the bupdater log file for the =bhist= string, which should not be found, apart from something like: <verbatim> 2012-03-09 07:56:15 /usr/bin/BUpdaterLSF: key bupdater_use_bhist_time_constraint not found - using the default:no 2012-03-09 07:56:15 /usr/bin/BUpdaterLSF: key bupdater_use_bhist_for_killed not found - using the default:no 2012-03-09 07:56:15 /usr/bin/BUpdaterLSF: key bupdater_use_bhist_time_constraint not found - using the default:no 2012-03-09 07:56:15 /usr/bin/BUpdaterLSF: key bupdater_use_bhist_for_killed not found - using the default:no </verbatim> ---++ Fixes provided with BLAH 1.16.4 ---+++ [[https://savannah.cern.ch/bugs/?88974][Bug #88974]] BUpdaterSGE and BNotifier don't start if sge_helperpath var is not fixed %RED%Not implemented%ENDCOLOR% Install and configure (via yaim) a CREAM-CE using GE as batch system. Make sure that in =/etc/blah.config= the variable =sge_helperpath= is commented/is not there. Try to restart the blparser: =/etc/init.d/glite-ce-blahparser restart= It should work without problems. In particular it should not report the following error: <verbatim> Starting BNotifier: /usr/bin/BNotifier: sge_helperpath not defined. Exiting [FAILED] Starting BUpdaterSGE: /usr/bin/BUpdaterSGE: sge_helperpath not defined. Exiting [FAILED] </verbatim> ---+++ [[https://savannah.cern.ch/bugs/?89859][Bug 89859]] There is a memory leak in the updater for LSF, PBS and Condor %RED%Not implemented%ENDCOLOR% Configure a CREAM CE using the new blparser. Submit 1000 jobs using e.g. this JDL: <verbatim> [ executable="/bin/sleep"; arguments="100"; ] </verbatim> Keep monitoring the memory used by the bupdaterxxx process. It should basically not increase. The test should be done for both LSF and Torque/PBS. ---++ Fixes provided with BLAH 1.16.3 ---+++ [[https://savannah.cern.ch/bugs/?75854][Bug #75854]] Problems related to the growth of the blah registry) %RED%Not implemented%ENDCOLOR% Configure a CREAM CE using the new BLparser. Verify that in /etc/blah.config there is: =job_registry_use_mmap=yes= (default scenario). Submit 5000 jobs on a CREAM CE using the following JDL: <verbatim> [ executable="/bin/sleep"; arguments="100"; ] </verbatim> Monitor the BLAH processed. Verify that each of them doesn't use more than 50 MB. ---+++ [[https://savannah.cern.ch/bugs/?77776][Bug #77776]] (BUpdater should have an option to use cached batch system commands) %RED%Not implemented%ENDCOLOR% Add: <verbatim> lsf_batch_caching_enabled=yes batch_command_caching_filter=/usr/bin/runcmd.pl </verbatim> in =/etc/blah.config=. Create and fill =/usr/bin/runcmd.pl= with the following content: <verbatim> #!/usr/bin/perl #---------------------# # PROGRAM: argv.pl # #---------------------# $numArgs = $#ARGV + 1; open (MYFILE, '>>/tmp/xyz'); foreach $argnum (0 .. $#ARGV) { print MYFILE "$ARGV[$argnum] "; } print MYFILE "\n"; close (MYFILE); </verbatim> Submit some jobs. Check that in =/tmp/xyz= the queries to the batch system are recorded. E.g. for LSF something like that should be reported: <verbatim> /opt/lsf/7.0/linux2.6-glibc2.3-x86/bin/bjobs -u all -l /opt/lsf/7.0/linux2.6-glibc2.3-x86/bin/bjobs -u all -l ... </verbatim> ---+++ [[https://savannah.cern.ch/bugs/?80805][Bug #80805]] (BLAH job registry permissions should be improved) %RED% Not implemented %ENDCOLOR% Check permissions and ownership under =/var/blah=. They should be: <verbatim> /var/blah: total 12 -rw-r--r-- 1 tomcat tomcat 5 Oct 18 07:32 blah_bnotifier.pid -rw-r--r-- 1 tomcat tomcat 5 Oct 18 07:32 blah_bupdater.pid drwxrwx--t 4 tomcat tomcat 4096 Oct 18 07:38 user_blah_job_registry.bjr /var/blah/user_blah_job_registry.bjr: total 16 -rw-rw-r-- 1 tomcat tomcat 1712 Oct 18 07:38 registry -rw-r--r-- 1 tomcat tomcat 260 Oct 18 07:38 registry.by_blah_index -rw-rw-rw- 1 tomcat tomcat 0 Oct 18 07:38 registry.locktest drwxrwx-wt 2 tomcat tomcat 4096 Oct 18 07:38 registry.npudir drwxrwx-wt 2 tomcat tomcat 4096 Oct 18 07:38 registry.proxydir -rw-rw-r-- 1 tomcat tomcat 0 Oct 18 07:32 registry.subjectlist /var/blah/user_blah_job_registry.bjr/registry.npudir: total 0 /var/blah/user_blah_job_registry.bjr/registry.proxydir: total 0 </verbatim> ---+++ [[https://savannah.cern.ch/bugs/?81354][Bug #81354]] (Missing 'Iwd' Attribute when trasferring files with the 'TransferInput' attribute causes thread to loop) %RED% Not implemented %ENDCOLOR% Log on a cream ce as user tomcat. Create a proxy of yours and copy it as =/tmp/proxy= (change the ownership to tomcat.tomcat). Create the file =/home/dteam001/dir1/fstab= (you can copy /etc/fstab). Submit a job directly via blah (in the following change pbs and creamtest2 with the relevant batch system and queue names): <verbatim> $ /usr/bin/blahpd $GahpVersion: 1.16.2 Mar 31 2008 INFN\ blahpd\ (poly,new_esc_format) $ BLAH_SET_SUDO_ID dteam001 S Sudo\ mode\ on blah_job_submit 1 [cmd="/bin/cp";Args="fstab\ fstab.out";TransferInput="/home/dteam001/dir1/fstab";TransferOutput="fstab.out";TransferOutputRemaps="fstab.out=/home/dteam001/dir1/fstab.out";gridtype="pbs";queue="creamtest2";x509userproxy="/tmp/proxy"] S results S 1 1 0 No\ error pbs/20111010/304.cream-38.pd.infn.it </verbatim> Eventually check the content of =/home/dteam001/dir1/= where you see both =fstab= and =fstab.out=: <verbatim> $ ls /home/dteam001/dir1/ fstab fstab.out </verbatim> ---+++ [[https://savannah.cern.ch/bugs/?81824][Bug #81824]] (yaim-cream-ce should manage the attribute bupdater_loop_interval) %GREEN%Implemented%ENDCOLOR% Set =BUPDATER_LOOP_INTERVAL= to 30 in siteinfo.def and reconfigure via yaim. Then verify that in =blah.config= there is: <verbatim> bupdater_loop_interval=30 </verbatim> ---+++ [[https://savannah.cern.ch/bugs/?82281][Bug #82281]] (blahp.log records should always contain CREAM job ID) %RED%Not implement%ENDCOLOR% Submit a job directly to CREAM using CREAM-CLI. Then submit a job to CREAM through the WMS. In the accounting log file (/var/log/cream/accounting/blahp.log-<date>) in both cases the clientID field should end with the numeric part of the CREAM jobid, e.g.: <verbatim> "timestamp=2011-10-10 14:37:38" "userDN=/C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Massimo Sgaravatto" "userFQAN=/dteam/Role=NULL/Capability=NULL" "userFQAN=/dteam/NGI_IT/Role=NULL/Capability=NULL" "ceID=cream-38.pd.infn.it:8443/cream-pbs-creamtest2" "jobID=CREAM956286045" "lrmsID=300.cream-38.pd.infn.it" "localUser=18757" "clientID=cre38_956286045" "timestamp=2011-10-10 14:39:57" "userDN=/C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Massimo Sgaravatto" "userFQAN=/dteam/Role=NULL/Capability=NULL" "userFQAN=/dteam/NGI_IT/Role=NULL/Capability=NULL" "ceID=cream-38.pd.infn.it:8443/cream-pbs-creamtest2" "jobID=https://devel19.cnaf.infn.it:9000/dLvm84LvD7w7QXtLZK4L0A" "lrmsID=302.cream-38.pd.infn.it" "localUser=18757" "clientID=cre38_315532638" </verbatim> ---+++ [[https://savannah.cern.ch/bugs/?82297][Bug #82297]] (blahp.log rotation period is too short) %RED%Not implemented%ENDCOLOR% Check that in =/etc/logrotate.d/blahp-logrotate= rotate is equal to 365: <verbatim> # cat /etc/logrotate.d/blahp-logrotate /var/log/cream/accounting/blahp.log { copytruncate rotate 365 size = 10M missingok nomail } </verbatim> ---+++ [[https://savannah.cern.ch/bugs/?83275][Bug #83275]] (Problem in updater with very short jobs that can cause no notification to cream) %RED%Not implemented%ENDCOLOR% Configure a CREAM CE using the new blparser. Submit a job using the following JDL: <verbatim> [ executable="/bin/echo"; arguments="ciao"; ] </verbatim> Check in the bnotifier log file (=/var/log/cream/glite-ce-bnotifier.log= that at least a notification is sent for this job, e.g.: <verbatim> 2011-11-04 14:11:11 Sent for Cream:[BatchJobId="927.cream-38.pd.infn.it"; JobStatus=4; ChangeTime="2011-11-04 14:08:55"; JwExitCode=0; Reason="reason=0"; ClientJobId="622028514"; BlahJobName="cre38_622028514";] </verbatim> ---+++ [[https://savannah.cern.ch/bugs/?83347][Bug #83347]] (Incorrect special character handling for BLAH Arguments and Environment attributes) %RED%Not implemented%ENDCOLOR% Log on a cream ce as user tomcat. Create a proxy of yours and copy it as =/tmp/proxy= (change the ownership to tomcat.tomcat). Create the file =/home/dteam001/dir1/fstab= (you can copy /etc/fstab). Submit a job directly via blah (in the following change pbs and creamtest1 with the relevant batch system and queue names): <verbatim> BLAH_JOB_SUBMIT 1 [Cmd="/bin/echo";Args="$HOSTNAME";Out="/tmp/stdout_l15367";In="/dev/null";GridType="pbs";Queue="creamtest1";x509userproxy="/tmp/proxy";Iwd="/tmp";TransferOutput="output_file";TransferOutputRemaps="output_file=/tmp/stdout_l15367";GridResource="blah"] </verbatim> Verify that in the output file there is the hostname of the WN. ---+++ [[https://savannah.cern.ch/bugs/?87419][Bug #87419]] (blparser_master add some spurious character in the BLParser command line) %RED%Not implemented%ENDCOLOR% Configure a CREAM CE using the old blparser. Check the blparser process using ps. It shouldn't show urious characters: <verbatim> root 26485 0.0 0.2 155564 5868 ? Sl 07:36 0:00 /usr/bin/BLParserPBS -d 1 -l /var/log/cream/glite-pbsparser.log -s /var/torque -p 33333 -m 56565 </verbatim> -------------------------- ---++ Fixes provided with CREAM 1.14.3 ---+++ [[https://savannah.cern.ch/bugs/?99740][Bug #99740]] updateDelegationProxyInfo error: Rollback executed due to Deadlock %RED%Not Implemented%ENDCOLOR% * delegate the proxy credentials by specifying the ID of the delegated proxy <verbatim>> glite-ce-delegate-proxy -e cream-27.pd.infn.it:8443 myproxyid</verbatim> * submit ~10K *long* jobs to cream by forcing the submission to use the previously delegated user credentials, identified with the specified ID <verbatim>> glite-ce-job-submit --delegationId myproxyid -a -r cream-27.pd.infn.it:8443/cream-lsf-grid02 myjob.jdl</verbatim> * during the submission, at ~7000 jobs, execute *in parallel* several proxy renewal commands referring to the previously delegated user credentials <verbatim>> glite-ce-proxy-renew -e cream-27.pd.infn.it:8443 myproxyid</verbatim> * check the cream's logs and find the message error "updateDelegationProxyInfo error: Rollback executed due to: Deadlock found when trying to get lock; try restarting transaction": such message should not be present. ---+++ [[https://savannah.cern.ch/bugs/?99738][Bug #99738]] Under stress conditions due to job submissions, the command queue may accumulate thousand of job purging commands %RED%Not Implemented%ENDCOLOR% * submit ~5000 *very short* jobs to cream and wait their terminal state (e.g. DONE-OK) * edit the cream's configuration file (i.e. /etc/glite-ce-cream/cream-config.xml) * change the JOB_PURGE_RATE parameter value to 2 minutes <verbatim><parameter name="JOB_PURGE_RATE" value="2" /> <!-- minutes --></verbatim> * change the JOB_PURGE_POLICY parameter value to "ABORTED 2 minutes; CANCELLED 2 minutes; DONE-OK 2 minutes; DONE-FAILED 2 minutes; REGISTERED 2 days;" <verbatim><parameter name="JOB_PURGE_POLICY" value="ABORTED 2 minutes; CANCELLED 2 minutes; DONE-OK 2 minutes; DONE-FAILED 2 minutes; REGISTERED 2 days;" /></verbatim> * restart cream (i.e. tomcat) <verbatim>> service tomcat6 restart</verbatim> * submit further jobs to cream * meanwhile, during the submission, check the cream log and observe the messages about the JobPurger * every 2 minutes the should be logged the JobPurger activity: <verbatim> JobPurger - purging 0 jobs with status REGISTERED <= Wed Jan 16 16:55:55 CET 2013 JobPurger - purging 0 jobs with status ABORTED <= Tue Jan 08 16:56:55 CET 2013 JobPurger - purging 0 jobs with status CANCELLED <= Fri Jan 18 16:51:55 CET 2013 JobPurger - purging 500 jobs with status DONE-OK <= Fri Jan 18 16:51:55 CET 2013 JobPurger - purging 0 jobs with status DONE-FAILED <= Tue Jan 08 16:56:55 CET 2013 </verbatim> * access to the cream database * execute <verbatim>use creamdb; select * from command_queue where name="PROXY_RENEW";</verbatim> the result should be always "Empty set (0.00 sec)" ---+++ [[https://savannah.cern.ch/bugs/?98144][Bug #98144]] The switching off of the JobSubmissionManager makes the CREAM service not available for the users %RED%Not Implemented%ENDCOLOR% * Switch off the JobSubmissionManager in the CREAM configuration file (/etc/glite-ce-cream/cream-config.xml) <verbatim><parameter name="JOB_SUBMISSION_MANAGER_ENABLE" value="false" /></verbatim> * Restart tomcat <verbatim>service tomcat5 restart</verbatim> * Submit a job by the CREAM UI. * Check that a message error like the following doesn't appear: <verbatim>"Received NULL fault; the error is due to another cause: FaultString=[CREAM service not available: configuration failed!] - FaultCode=[SOAP-ENV:Server] - FaultSubCode=[SOAP-ENV:Server]"</verbatim> ---+++ [[https://savannah.cern.ch/bugs/?88134][Bug #88134]] JobWrapper doesn't handle correctly the jdl attribute “PerusalListFileURI” %RED%Not Implemented%ENDCOLOR% * Create the following two files: perusal.jdl <verbatim> [ Type="Job"; JobType="Normal"; Executable = "perusal.sh"; StdOutput = "stdout.log"; StdError = "stderr.log"; InputSandbox = "perusal.sh"; OutputSandbox = {"stdout.log", "stderr.log", "results.txt"}; PerusalFilesDestURI="gsiftp://cream-05.pd.infn.it/tmp"; PerusalFileEnable = true; PerusalTimeInterval = 20; outputsandboxbasedesturi="gsiftp://localhost"; PerusalListFileURI="gsiftp://cream-05.pd.infn.it/tmp/filelist.txt" ] </verbatim> perusal.sh <verbatim> #!/bin/sh i=0 while ((i < 10)) do date voms-proxy-info --all >&2 df >> results.txt sleep 10 let "i++" echo i = $i done </verbatim> N.B: For this test, the file "gsiftp://cream-05.pd.infn.it/tmp/filelist.txt" must not exist! * Submit the job * Check that after about two minutes, the job terminated successfully. ---+++ [[https://savannah.cern.ch/bugs/?95637][Bug #95637]] glite-ce-job-submit --help doesn't print out anything %RED%Not Implemented%ENDCOLOR% * execute the command:<verbatim>glite-ce-job-submit --help</verbatim> * the error message _"JDL file not specified in the command line arguments. Stop."_ should not appear anymore. In place, the inline help is shown:<verbatim> >glite-ce-job-submit --help CREAM User Interface version 1.2.0 glite-ce-job-submit allows the user to submit a job for execution on a CREAM based CE Usage: glite-ce-job-submit [options] -r <CEID> <JDLFile> --resource, -r CEID Select the CE to send the JDL to. Format must be <host>[:<port>]/cream-<lrms-system-name>-<queue-name> <JDLFile> Is the file containing the JDL directives for job submission; Options: --help, -h [...]</verbatim> ---+++ [[https://savannah.cern.ch/bugs/?95738][Bug #95738]] glite-ce-job-submit: error message to be improved if JDL file is missing %RED%Not Implemented%ENDCOLOR% * execute the command (remark: the _test.jdl_ file doesn't exist):<verbatim>glite-ce-job-submit -a -r cream-23.pd.infn.it:8443/cream-lsf-creamtest1 test.jdl</verbatim> * the error message _"Error while processing file [pippo.jdl]: Syntax error. Will not submit this JDL"_ should not appear anymore. In place, the message _"JDL file [test.jdl] does not exist. Skipping..."_ should be prompted<verbatim> >glite-ce-job-submit -a -r cream-23.pd.infn.it:8443/cream-lsf-creamtest1 pippo.jdl 2013-01-21 17:49:51,196 ERROR - JDL file [pippo.jdl] does not exist. Skipping... </verbatim> ---+++ [[https://savannah.cern.ch/bugs/?95041][Bug #95041]] YAIM could check the format of CE_OTHERDESCR %RED%Not Implemented%ENDCOLOR% * in the site-info.def define the variable CE_OTHERDESCR="Cores=10.5 , Benchmark=8.0-HEP-SPEC06"; run yaim and verify that no error is reported * define the variable CE_OTHERDESCR="Cores=10.5"; run yaim and verify that no error is reported * define the variable CE_OTHERDESCR="Cores=10.5,Benchmark=8.0", run yaim and verify that an error is reported ---+++ [[https://savannah.cern.ch/bugs/?98440][Bug #98440]] Missing revision number in EndpointImplementationVersion %RED%Not Implemented%ENDCOLOR% * run the following command <verbatim>ldapsearch -x -H ldap://hostname:2170 -b o=glue '(&(objectclass=GLUE2Endpoint)(GLUE2EndpointImplementationName=CREAM))' GLUE2EndpointImplementationVersion</verbatim> and verify that the !GLUE2EndpointImplementationVersion reports the revision number ---+++ [[https://savannah.cern.ch/bugs/?98850][Bug #98850]] Empty ACBR list in SHARE variable %RED%Not Implemented%ENDCOLOR% * set the YAIM variable FQANVOVIEWS to "no" * set one the YAIM variables <QUEUE>_GROUP_ENABLE so that it contains more FQANs for the same VO (for example "atlas /atlas/ROLE=lcgadmin /atlas/ROLE=production /atlas/ROLE=pilot") * run the YAIM configurator and verify that in the file /etc/glite-ce-glue2/glite-ce-glue2.conf the items SHARE_<QUEUE>_<VO>_ACBRS don't contain any empty list ---+++ [[https://savannah.cern.ch/bugs/?99072][Bug #99072]] Hard-coded reference to tomcat5.pid %RED%Not Implemented%ENDCOLOR% * on SL6 run the executable glite <verbatim>/usr/bin/glite_cream_load_monitor /etc/glite-ce-cream-utils/glite_cream_load_monitor.conf --show</verbatim> * verify that the value of "Threshold for tomcat FD" is not zero ---+++ [[https://savannah.cern.ch/bugs/?99085][Bug #99085]] Improve parsing of my.cnf %RED%Not Implemented%ENDCOLOR% * in the file /etc/my.cnf specify the following value <verbatim>max_connections = 256</verbatim> * runt the YAIM configurator and verify that the function config_cream_db works correctly ---+++ [[https://savannah.cern.ch/bugs/?99282][Bug #99282]] Wrong regular expression for group.conf parsing %RED%Not Implemented%ENDCOLOR% * define YAIM variables so that one VO name is the prefix or the suffix of another one (for examples VOS=" ops dgops ") * run the YAIM configurator and verify that in the file /var/lib/bdii/gip/ldif/ComputingShare.ldif no references to one VO appear as attributes of the Shares or Policies of the other ---+++ [[https://savannah.cern.ch/bugs/?99747][Bug #99747]] glite-info-dynamic-ce does not update !GLUE2ComputingShareServingState %RED%Not Implemented%ENDCOLOR% * disable the submissions to the CE * run the command <verbatim>/sbin/runuser -s /bin/sh ldap -c "/var/lib/bdii/gip/plugin/glite-info-cream-glue2" |grep ServingState</verbatim> * verify that the values for each "GLUE2ComputingShareServingState" are set to "draining" ---+++ [[https://savannah.cern.ch/bugs/?99823][Bug #99823]] SHA-1 algorithm for PKCS10 generation in CREAM delegation service %RED%Not Implemented%ENDCOLOR% * insert the following line <verbatim>log4j.logger.org.glite.ce.cream.delegationmanagement.cmdexecutor=debug, fileout</verbatim> in the file /etc/glite-ce-cream/log4j.properties * restart tomcat * delegate a proxy on the CE using for the client authentication a voms-proxy whose signature algorithm is based on SHA-2, see <verbatim>openssl x509 -noout -text -in [voms-proxy]</verbatim> * verify the value reported in the log for "Signature algorithm to be used for pkcs10" is the same of the one used for the voms-proxy ---++ Fixes provided with CREAM 1.14.2 ---+++ [[https://savannah.cern.ch/bugs/?95328][Bug #95328]] In cluster mode, YAIM does not set !GlueCEInfoHostName for CREAMs %RED%Not Implemented%ENDCOLOR% * Configure a CREAM CE in cluster mode, run the command <verbatim>ldapsearch -h cream-36.pd.infn.it -x -p 2170 -b "o=grid" objectclass=GlueCE | grep InfoHostName</verbatim> and verify that one or more items exists. ---+++ [[https://savannah.cern.ch/bugs/?95973][Bug #95973]] Missing Glue capability in !GLUE2EntityOtherInfo %RED%Not Implemented%ENDCOLOR% * Define the YAIM variable CE_CAPABILITY="CPUScalingReferenceSI00=10 SNMPSupport=yes" * Configure with YAIM and run the command <verbatim>ldapsearch -h cream-36.pd.infn.it -x -p 2170 -b "o=glue" | grep GLUE2EntityOtherInfo</verbatim> * Verify that the attributes defined above are reported separately ---+++ [[https://savannah.cern.ch/bugs/?96306][Bug #96306]] Wrong lowercase conversion for VO Tags %GREEN%Implemented%ENDCOLOR% * define a tag with uppercase in the file /opt/glite/var/info/[hostname]/[vo]/[vo].list * run the command <verbatim>ldapsearch -h cream-36.pd.infn.it -x -p 2170 -b "o=glue" | grep TESTTAG</verbatim> * verify that the attributes !GLUE2ApplicationEnvironmentID and !GLUE2ApplicationEnvironmentAppName are uppercase. ---+++ [[https://savannah.cern.ch/bugs/?96310][Bug #96310]] Wrong lowercase conversion for Glue-1 VO Tags. %RED%Not Implemented%ENDCOLOR% * Put case-insensitive duplicated items, for example "RMON3.1 RMon3.1", in the file /opt/glite/var/info/[hostname]/[vo name]/[voname].list and file /opt/edg/var/info/[vo name]/[voname].list * Put a case-insensitive duplicated attributes in /var/lib/bdii/gip/ldif/static-file-Cluster.ldif, for example: <verbatim> GlueHostApplicationSoftwareRunTimeEnvironment: MPIch GlueHostApplicationSoftwareRunTimeEnvironment: MPICH </verbatim> * Run the wrapper script /var/lib/bdii/gip/plugin/glite-info-dynamic-software-wrapper * Verify that no duplicated attributes are printed on the stdout ---+++ [[https://savannah.cern.ch/bugs/?97441][Bug #97441]] CREAM: Unwanted auto-updating of the field "creationTime" on the creamdb database %GREEN%Implemented%ENDCOLOR% * 1) access to the CREAM DB * 2) execute the following SQL command: <verbatim>use creamdb;</verbatim> * 3) execute the following SQL query and notice the result: <verbatim>select startUpTime, creationTime from db_info;</verbatim> * 4) configure CREAM with YAIM * 5) repeat the steps 1, 2, 3 * 6) check the query result at step 4: the "creationTime" value should be the same in both results while the "startUpTime" should be changed. ---+++ [[https://savannah.cern.ch/bugs/?96512][Bug #96512]] JobDBAdminPurger can't find commons-logging.jar %GREEN%Implemented%ENDCOLOR% * on the CREAM node, try to purge a job using the /usr/sbin/JobDBAdminPurger.sh script (none error should be reported) ---+++ [[https://savannah.cern.ch/bugs/?97106][Bug #97106]] CREAM JW - fatal_error: command not found. %RED%Not Implemented%ENDCOLOR% * select an EMI-2 WN and uninstall the "glite-lb-client-progs" rpm in order to remove the /usr/bin/glite-lb-logevent program * submit several jobs to CREAM and check if at least one of them has been executed on the previously modified WN (there is no way to force to submission on a specific WN through CREAM) * open the StandardError file on the job sandbox: it should contain just the "Cannot find lb_logevent command" message. ---+++ [[https://savannah.cern.ch/bugs/?94418][Bug #94418]] The SIGTERM signal should be issued to all the processes belonging to the job. %RED%Not Implemented%ENDCOLOR% * Connect to the ce under test with your pool account user * Edit the file /tmp/test_bug94418.sh (as your pool account user) and paste on it the following text: <verbatim> #!/bin/bash OUTF="/tmp/sdpgu.out" MYPID=$$ sleep 3600 & PID1=$! sleep 3600 & PID2=$! echo "MYPID=${MYPID}, PID1=${PID1}, PID2=${PID2}" > $OUTF echo "MYPID=${MYPID}, PID1=${PID1}, PID2=${PID2}" # supposedly this should kill the child processes on SIGTERM. trap "kill $PID1 $PID2" SIGTERM wait </verbatim> * On the UI prepare a jdl for executing the above script: example <verbatim> [ Type = "job"; JobType = "normal"; VirtualOrganisation = "dteam"; executable="test_bug94418.sh"; InputSandbox = {"test_bug94418.sh"}; InputSandboxBaseURI = "gsiftp://cream-XX.pd.infn.it/tmp" ]</verbatim> * submit the jdl * wait until the job reaches the REALLY-RUNNING state * check the name of the WN into which the job is running * on the WN, read the file /tmp/sdpgu.out which contains the PID of three processes: example <verbatim>MYPID=10551, PID1=10553, PID2=10554</verbatim> * check if all three processes exist * cancel the job * wait for its terminal state (CANCELLED) * check again on the WN if all three processes exist (actually they should be disappeared) ---+++ [[https://savannah.cern.ch/bugs/?98707][Bug #98707]] Wrong warning message form !ArgusPEPClient configuration %RED%Not Implemented%ENDCOLOR% * Run the CREAM service with default priority log level (Info) * Perform a simple operation, for example a job list * Verify that in the log does not appear any warning such as "Missing or wrong argument <property name>" ---++ Fixes provided with CREAM 1.14.1 ---+++ [[https://savannah.cern.ch/bugs/?89153][Bug #89153]] JobDBAdminPurger cannot purge jobs if CREAM DB is on another host - %RED%Not Implemented%ENDCOLOR% * on the CREAM node, which MUST be a different machine than the one where is installed the creamdb (DB node), edit the cream-config.xml * find the "url" field within the datasource_creamdb element (e.g. url="jdbc:mysql://localhost:3306/creamdb?autoReconnect=true") * replace the "localhost" with the name of the remote machine (DB node) hosting the creamdb (e.g. "jdbc:mysql://cream-47.pd.infn.it:3306/creamdb?autoReconnect=true") * find and annotate the value of the "username" field defined in the same xml element * on the CREAM DB node execute the following sql commands as root (NB: sobstitute the "username" with the right value previously annotated): GRANT ALL PRIVILEGES ON creamdb.* to username@'%' IDENTIFIED BY 'username'; FLUSH PRIVILEGES; * on the CREAM node execute the JobDBAdminPurger.sh script (none error should be reported) ---+++ [[https://savannah.cern.ch/bugs/?95356][Bug #95356]] Better parsing for static definition files in lcg-info-dynamic-scheduler - %GREEN% Implemented %ENDCOLOR% * Insert in the file /var/lib/bdii/gip/ldif/ComputingShare.ldif an empty or corrupted attribute !GLUE2PolicyRule (i.e. "GLUE2PolicyRule:" or "GLUE2PolicyRule: test"). * Verify that in the file /etc/lrms/scheduler.conf the attribute "outputformat" is set to glue2 or both. * Run the command <verbatim>/var/lib/bdii/gip/plugin/glite-info-dynamic-scheduler-wrapper</verbatim> and verify that no exceptions are raised. #FixFor95480 ---+++ [[https://savannah.cern.ch/bugs/?95480][Bug #95480]] CREAM doesn't transfert the output files remotely under well known conditions - %RED%Not Implemented%ENDCOLOR% * edit the cream-config.xml and set SANDBOX_TRANSFER_METHOD="LRMS" * restart the cream service * submit the following jdl from a WMS having the URL lexicographically greater than "gsiftp://localhost" (e.g. wms01.grid.hep.ph.ic.ac.uk) <verbatim> [ Type = "Job"; #VAR1 = "test1"; #VAR2 = "test2"; executable = "/bin/echo"; Arguments = "hello world!!!"; StdOutput="stdout"; StdError="stderr"; OutputSandbox = {"stdout","stderr"}; ] </verbatim> * glite-wms-job-output (retrieve and check the output files) ---+++ [[https://savannah.cern.ch/bugs/?95552][Bug #95552]] Malformed URL from glite-ce-glue2-endpoint-static - %GREEN% Implemented %ENDCOLOR% Run the command <verbatim>/usr/libexec/glite-ce-glue2-endpoint-static /etc/glite-ce-glue2/glite-ce-glue2.conf | grep GLUE2EndpointURL</verbatim> and verify that the URL is correctly defined (contains ":") Example of the error: <verbatim>GLUE2EndpointURL: https://cream-48.pd.infn.it8443/ce-cream/services</verbatim> ---+++ [[https://savannah.cern.ch/bugs/?95593][Bug #95593]] CREAM cannot insert in the command queue if the lenght of the localUser field is > 14 chars - %RED%Not Implemented%ENDCOLOR% * create a new local pool account having the name >14 chars size long * reconfigure the CE with YAIM (define USE_ARGUS=no if you don't want to configure ARGUS for handling the new local pool account) * execute any asynchronous CREAM command (e.g. jobStart, jobCancel, etc) by using the proper grid credentials which will be mapped to the new local user * check the CREAM log file: none message error like "Cannot enqueue the command id=-1: Data truncation: Data too long for column 'commandGroupId' at row 1 (rollback performed)" should be reported ---+++ [[https://savannah.cern.ch/bugs/?96055][Bug #96055]] Wrong DN format in logfiles for accounting - %RED%Not Implemented%ENDCOLOR% * submit a job and wait for success * verify that in the file /var/log/cream/accounting/blahp.log-yyyymmdd the value for attribute userDN is published with X500 format (i.e. "/C=../O=...) ---+++ [[https://savannah.cern.ch/bugs/?93091][Bug #93091]] Add some resubmission machinery to CREAM - %RED%Not Implemented%ENDCOLOR% * edit the blah script /usr/libexec/lsf_submit.sh * add "sleep 10m" on the top of the script * submit a job * after 200 seconds, the following error should appears in the cream log file: <verbatim> "org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor - submission to BLAH failed [jobId=CREAMXXXXXXXXX; reason=BLAH error: submission command failed (exit code = 143) (stdout:) (stderr: <blah> execute_cmd: 200 seconds timeout expired, killing child process.-) N/A (jobId = CREAMXXXXXXXXX); retry count=1/3] org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor - sleeping 10 sec... org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor - sleeping 10 sec... done" </verbatim> * restore the original lsf_submit.sh script * after a while the job should be successufully submitted (i.e. ... JOB CREAMXXXXXXXXX STATUS CHANGED: PENDING => IDLE) * if you don't restore the script, the submission will be tried 3 times, then the job will abort: "JOB CREAMXXXXXXXXX STATUS CHANGED: PENDING => ABORTED [description=submission to BLAH failed [retry count=3]] [failureReason=BLAH error: submission command failed (exit code = 143) (stdout:) (stderr: <blah> execute_cmd: 200 seconds timeout expired, killing child process.-) N/A ..." ---++ Fixes provided with CREAM 1.14 ---+++ [[https://savannah.cern.ch/bugs/?59871][Bug #59871]] lcg-info-dynamic-software must split tag lines on white space - %GREEN% Implemented %ENDCOLOR% To verify the fix edit a VO.list file under =/opt/glite/var/info/cream-38.pd.infn.it/<vo_name>.list adding: <verbatim> tag1 tag2 tag3 </verbatim> Wait 3 minutes and then query the resource bdii, where you should see: <verbatim> ... GlueHostApplicationSoftwareRunTimeEnvironment: tag1 GlueHostApplicationSoftwareRunTimeEnvironment: tag2 GlueHostApplicationSoftwareRunTimeEnvironment: tag3 ... </verbatim> ---+++ [[https://savannah.cern.ch/bugs/?68968][Bug #68968]] lcg-info-dynamic-software should protect against duplicate RTE tags - %RED% Not Implemented %ENDCOLOR% To verify the fix edit a VO.list file under =/opt/glite/var/info/cream-38.pd.infn.it/VO= adding: <verbatim> tag1 tag2 TAG1 tag1 </verbatim> Then query the resource bdii: <verbatim> ldapsearch -h <CE host> -x -p 2170 -b "o=grid" | grep -i tag </verbatim> This should return: <verbatim> GlueHostApplicationSoftwareRunTimeEnvironment: tag1 GlueHostApplicationSoftwareRunTimeEnvironment: tag2 </verbatim> ---+++ [[https://savannah.cern.ch/bugs/?69854][Bug #69854]] CreamCE should publish non-production state when job submission is disabled - %RED% Not Implemented %ENDCOLOR% Disable job submission with =glite-ce-disable-submission=. Wait 3 minutes and then perform the following ldap query: <verbatim> # ldapsearch -h <CREAM CE node> -x -p 2170 -b "o=grid" | grep GlueCEStateStatus </verbatim> For each GlueCE this should return: <verbatim> GlueCEStateStatus: Draining </verbatim> Then re-enable the submission. Edit the configuration file =/etc/glite-ce-cream-utils/glite_cream_load_monitor.conf= to trigger job submission disabling. E.g. change: <verbatim> $MemUsage = 95; </verbatim> with: <verbatim> $MemUsage = 1; </verbatim> Wait 15 minutes and then perform the following ldap query: <verbatim> # ldapsearch -h <CREAM CE node> -x -p 2170 -b "o=grid" | grep GlueCEStateStatus </verbatim> For each GlueCE this should return: <verbatim> GlueCEStateStatus: Draining </verbatim> ---+++ [[http://savannah.cern.ch/bugs/?69857][Bug #69857]] Job submission to CreamCE is enabled by restart of service even if it was previously disabled - %GREEN% Implemented %ENDCOLOR% STATUS: %GREEN% Implemented %ENDCOLOR% To test the fix: * disable the submission on the CE This can be achieved via the<code> `glite-ce-disable-submission host:port` </code> command (provided by the CREAM CLI package installed on the UI), that can be issued only by a CREAM CE administrator, that is the DN of this person must be listed in the =/etc/grid-security/admin-list= file of the CE. Output should be: "Operation for disabling new submissions succeeded" * restart tomcat on the CREAM CE (service tomcat restart - on CE) * verify if the submission is disabled (glite-ce-allowed-submission) This can be achieved via the `<code>glite-ce-enable-submission host:port` </code>command (provided by the CREAM CLI package installed on the UI). Output should be: "Job submission to this CREAM CE is disabled" ---+++ [[https://savannah.cern.ch/bugs/?77791][Bug #77791]] CREAM installation does not fail if sudo is not installed - %RED% Not Implemented %ENDCOLOR% Try to configure via yaim a CREAM-CE where the sudo executable is not installed, The configuration should fail saying: <verbatim> ERROR: sudo probably not installed ! </verbatim> ---+++ [[https://savannah.cern.ch/bugs/?79362][Bug #79362]] location of python files provided with lcg-info-dynamic-scheduler-generic-2.3.5-0.sl5 - %RED% Not Implemented %ENDCOLOR% To verify the fix, do a: <verbatim> rpm -ql dynsched-generic </verbatim> and verify that the files are installed in =usr/lib/python2.4= and not more in =/usr/lib/python=. ---+++ [[https://savannah.cern.ch/bugs/?80295][Bug #80295]] Allow dynamic scheduler to function correctly when login shell is false - %RED% Not Implemented %ENDCOLOR% To verify the fix, log on the CREAM CE as user root and run: <verbatim> /sbin/runuser -s /bin/sh ldap -c "/usr/libexec/lcg-info-dynamic-scheduler -c /etc/lrms/scheduler.conf" </verbatim> It should return some information in ldif format ---+++ [[https://savannah.cern.ch/bugs/?80410][Bug #80410]] CREAM bulk submission CLI is desirable - %RED% Not Implemented %ENDCOLOR% To test the fix, specify multiple JDLs in the =glite-ce-job-submit= command, e.g.: <verbatim> glite-ce-job-submit --debug -a -r cream-47.pd.infn.it:8443/cream-lsf-creamtest1 jdl1.jdl jdl2.jdl jdl3.jdl </verbatim> Considering the above example, verify that 3 jobs are submitted and 3 jobids are returned. ---+++ [[https://savannah.cern.ch/bugs/?81734][Bug #81734]] removed conf file retrieve from old path that is not EMI compliant - %RED% Not Implemented %ENDCOLOR% To test the fix, create the conf file =/etc/glite_cream.conf= with the following content: <verbatim> [ CREAM_URL_PREFIX="abc://"; ] </verbatim> Try then e.g. the following command: <verbatim> glite-ce-job-list --debug cream-47.pd.infn.it </verbatim> It should report that it is trying to contact =abc://cream-47.pd.infn.it:8443//ce-cream/services/CREAM2=: <verbatim> 2012-01-13 14:44:39,028 DEBUG - Service address=[abc://cream-47.pd.infn.it:8443//ce-cream/services/CREAM2] </verbatim> Move the conf file as =/etc/VO/glite_cream.conf= and repeat the test which should give the same result Then move the conf file as =~/.glite/VO/glite_cream.conf= and repeat the test which should give the same result ---+++ [[https://savannah.cern.ch/bugs/?82206][Bug #82206]] yaim-cream-ce: BATCH_LOG_DIR missing among the required attributes - %RED% Not Implemented %ENDCOLOR% Try to configure a CREAM CE with Torque using yaim without setting =BLPARSER_WITH_UPDATER_NOTIFIER= and without setting =BATCH_LOG_DIR=. It should fail saying: <verbatim> INFO: Executing function: config_cream_blah_check ERROR: BATCH_LOG_DIR is not set ERROR: Error during the execution of function: config_cream_blah_check </verbatim> ---+++ [[https://savannah.cern.ch/bugs/?83314][Bug #83314]] Information about the RTEpublisher service should be available also in glue2 - %RED% Not Implemented %ENDCOLOR% Check if the resource BDII publishes glue 2 GLUE2ComputingEndPoint objectclasses with GLUE2EndpointInterfaceName equal to org.glite.ce.ApplicationPublisher. If the CE is configured in no cluster mode there should be one of such objectclass. If the CE is configured in cluster mode and the gLite-CLUSTER is deployed on a different node, there shouldn't be any of such objectclasses. <verbatim> ldapsearch -h <CREAM CE hostname> -x -p 2170 -b "o=glue" "(&(objectclass=GLUE2ComputingEndPoint)(GLUE2EndpointInterfaceName=org.glite.ce.ApplicationPublisher))" </verbatim> ---+++ [[https://savannah.cern.ch/bugs/?83338][Bug #83338]] endpointType (in GLUE2ServiceComplexity) hardwired to 1 in CREAM CE is not always correct - %GREEN% Implemented %ENDCOLOR% Perform the following query on the resource bdii of the CREAM CE: <verbatim ldapsearch -x -h <CREAM CE hostname> -p 2170 -b "o=glue" | grep -i endpointtype </verbatim> endpointtype should be 3 if CEMon is deployed (=USE_CEMON= is true). 2 otherwise. ---+++ [[https://savannah.cern.ch/bugs/?83474][Bug #83474]] Some problems concerning glue2 publications of CREAM CE configured in cluster mode - %RED% Not Implemented %ENDCOLOR% Configure a CREAM CE in cluster mode, with the gLite-CLUSTER configured on a different host. * Check if the resource BDII publishes glue 2 GLUE2ComputingService objectclasses. There should be one GLUE2ComputingService objectclass: <verbatim> ldapsearch -h <CREAM CE hostname> -x -p 2170 -b "o=glue" objectclass=GLUE2ComputingService </verbatim> * Check if the resource BDII publishes glue 2 GLUE2ComputingEndPoint objectclasses with GLUE2EndpointInterfaceName equal to org.glite.ce.CREAM . There should be one of such GLUE2ComputingService objectclasses: <verbatim> ldapsearch -h <CREAM CE hostname> -x -p 2170 -b "o=glue" "(&(objectclass=GLUE2ComputingEndPoint)(GLUE2EndpointInterfaceName=org.glite.ce.CREAM))" </verbatim> * Check if the resource BDII publishes glue 2 GLUE2Manager objectclasses. There shouldn't be any GLUE2Manager objectclass. <verbatim> ldapsearch -h <CREAM CE hostname> -x -p 2170 -b "o=glue" objectclass=GLUE2Manager </verbatim> * Check if the resource BDII publishes glue 2 GLUE2Share objectclasses. There shouldn't be any GLUE2Share objectclass. <verbatim> ldapsearch -h <CREAM CE hostname> -x -p 2170 -b "o=glue" objectclass=GLUE2Share </verbatim> * Check if the resource BDII publishes glue 2 GLUE2ExecutionEnvironment objectclasses. There shouldn't be any GLUE2ExecutionEnvironment objectclass. <verbatim> ldapsearch -h <CREAM CE hostname> -x -p 2170 -b "o=glue" objectclass=GLUE2ExecutionEnvironment </verbatim> * Check if the resource BDII publishes glue 2 GLUE2ComputingEndPoint objectclasses with GLUE2EndpointInterfaceName equal to org.glite.ce.ApplicationPublisher. There shouldn't be any of such objectclasses. <verbatim> ldapsearch -h <CREAM CE hostname> -x -p 2170 -b "o=glue" "(&(objectclass=GLUE2ComputingEndPoint)(GLUE2EndpointInterfaceName=org.glite.ce.ApplicationPublisher))" </verbatim> ---+++ [[https://savannah.cern.ch/bugs/index.php?83592][Bug #83592]] CREAM client doesn't allow the delegation of RFC proxies - %GREEN% Implemented %ENDCOLOR% Create a RFC proxy, e.g.: <verbatim> voms-proxy-init -voms dteam -rfc </verbatim> and then submit using =glite-ce-job-submit= a job using ISB and OSB, e.g.: <verbatim> [ executable="ssh1.sh"; inputsandbox={"file:///home/sgaravat/JDLExamples/ssh1.sh", "file:///home/sgaravat/a"}; stdoutput="out3.out"; stderror="err2.err"; outputsandbox={"out3.out", "err2.err", "ssh1.sh", "a"}; outputsandboxbasedesturi="gsiftp://localhost"; ] </verbatim> Verify that the final status is =DONE-OK= ---+++ [[https://savannah.cern.ch/bugs/index.php?83593][Bug #83593]] Problems limiting RFC proxies in CREAM - %GREEN% Implemented %ENDCOLOR% Consider the same test done for bug #83592 ---+++ [[https://savannah.cern.ch/bugs/?84308][Bug #84308]] Error on glite_cream_load_monitor if cream db is on another host - %RED% Not Implemented %ENDCOLOR% Configure a CREAM CE with the database installed on a different host than the CREAM CE. Run: <verbatim> /usr/bin/glite_cream_load_monitor /etc/glite-ce-cream-utils/glite_cream_load_monitor.conf --show </verbatim> which shouldn't report any error. ---+++ [[https://savannah.cern.ch/bugs/?86522][Bug #86522]] glite-ce-job-submit authorization error message difficoult to understand - %RED% Not Implemented %ENDCOLOR% To check this fix, try a submission towards a CREAM CE configured to use ARGUS when you are not authorized. You should see an error message like: <verbatim> $ glite-ce-job-submit -a -r emi-demo13.cnaf.infn.it:8443/cream-lsf-demo oo.jdl 2012-05-07 20:26:51,649 FATAL - CN=Massimo Sgaravatto,L=Padova,OU=Personal Certificate,O=INFN,C=IT not authorized for {http://www.gridsite.org/namespaces/delegation-2}getProxyReq </verbatim> and not like the one reported in the savannah bug. ---+++ [[https://savannah.cern.ch/bugs/?86609][Bug #86609]] yaim variable CE_OTHERDESCR not properly managed for Glue2 - %RED% Not Implemented %ENDCOLOR% Try to set the yaim variable =CE_OTHERDESCR= to: <verbatim> CE_OTHERDESCR="Cores=1" </verbatim> Perform the following ldap query on the resource bdii: <verbatim> ldapsearch -h <CREAM CE node> -x -p 2170 -b "o=glue" objectclass=GLUE2ExecutionEnvironment GLUE2EntityOtherInfo </verbatim> This should also return: <verbatim> GLUE2EntityOtherInfo: Cores=1 </verbatim> Try then to set the yaim variable =CE_OTHERDESCR= to: <verbatim> CE_OTHERDESCR="Cores=1,Benchmark=150-HEP-SPEC06 </verbatim> and reconfigure via yaim. Perform the following ldap query on the resource bdii: <verbatim> ldapsearch -h <CREAM CE node> -x -p 2170 -b "o=glue" objectclass=GLUE2ExecutionEnvironment GLUE2EntityOtherInfo </verbatim> This should also return: <verbatim> GLUE2EntityOtherInfo: Cores=1 </verbatim> Then perform the following ldap query on the resource bdii: <verbatim> ldapsearch -h <CREAM CE node> -x -p 2170 -b "o=glue" objectclass=Glue2Benchmark </verbatim> This should return something like: <verbatim> dn: GLUE2BenchmarkID=cream-47.pd.infn.it_hep-spec06,GLUE2ResourceID=cream-47.pd.infn.it,GLUE2ServiceID=cream-47.pd.infn.it_ComputingElement,GLUE2GroupID=re source,o=glue GLUE2BenchmarkExecutionEnvironmentForeignKey: cream-47.pd.infn.it GLUE2BenchmarkID: cream-47.pd.infn.it_hep-spec06 GLUE2BenchmarkType: hep-spec06 objectClass: GLUE2Entity objectClass: GLUE2Benchmark GLUE2EntityCreationTime: 2012-01-13T14:04:48Z GLUE2BenchmarkValue: 150 GLUE2EntityOtherInfo: InfoProviderName=glite-ce-glue2-benchmark-static GLUE2EntityOtherInfo: InfoProviderVersion=1.0 GLUE2EntityOtherInfo: InfoProviderHost=cream-47.pd.infn.it GLUE2BenchmarkComputingManagerForeignKey: cream-47.pd.infn.it_ComputingElement_Manager GLUE2EntityName: Benchmark hep-spec06 </verbatim> ---+++ [[https://savannah.cern.ch/bugs/?86694][Bug #86694]] A different port number than 9091 should be used for LRMS_EVENT_LISTENER - %RED% Not Implemented %ENDCOLOR% On a running CREAM CE, perform the following command: <verbatim> netstat -an | grep -i 9091 </verbatim> This shouldn't return anything. Then perform the following command: <verbatim> netstat -an | grep -i 49152 </verbatim> This should return: <verbatim> tcp 0 0 :::49152 :::* LISTEN </verbatim> [root@cream-47 ~]# netstat -an | grep -i 49153 [root@cream-47 ~]# netstat -an | grep -i 49154 [root@cream-47 ~]# netstat -an | grep -i 9091 ---+++ [[https://savannah.cern.ch/bugs/?86697][Bug #86697]] User application's exit code not recorded in the CREAM log file - %RED% Not Implemented %ENDCOLOR% Submit a job and wait for its completion. Then check the glite-ce-cream.log file on the CREAM CE. The user exit code should be reported (filed =exitCode=), e.g.: <verbatim> 13 Jan 2012 15:22:52,966 org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor - JOB CREAM124031222 STATUS CHANGED: REALLY-RUNNING => DONE-OK [failureReason=reason=0] [exitCode=23] [localUser=dteam004] [workerNode=prod-wn-001.pn.pd.infn.it] [delegationId=7a52772caaeea96628a1ff9223e67a1f6c6dde9f] </verbatim> ---+++ [[https://savannah.cern.ch/bugs/?86737][Bug #86737]] A different port number than 9909 should be used for CREAM_JOB_SENSOR - %RED% Not Implemented %ENDCOLOR% On a running CREAM CE, perform the following command: <verbatim> netstat -an | grep -i 9909 </verbatim> This shouldn't return anything. ---+++ [[https://savannah.cern.ch/bugs/?86773][Bug #86773]] wrong /etc/glite-ce-cream/cream-config.xml with multiple ARGUS servers set - %RED% Not Implemented %ENDCOLOR% To test the fix, set in the siteinfo,def: <verbatim> USE_ARGUS=yes ARGUS_PEPD_ENDPOINTS="https://cream-46.pd.infn.it:8154/authz https://cream-46-1.pd.infn.it:8154/authz" CREAM_PEPC_RESOURCEID="http://pd.infn.it/cream-47" </verbatim> i.e. 2 values for =ARGUS_PEPD_ENDPOINTS=. Then configure via yaim. In =/etc/glite-ce-cream/cream-config.xml= there should be: <verbatim> <argus-pep name="pep-client1" resource_id="http://pd.infn.it/cream-47" cert="/etc/grid-security/tomcat-cert.pem" key="/etc/grid-security/tomcat-key.pem" passwd="" mapping_class="org.glite.ce.cream.authz.argus.ActionMapping"> <endpoint url="https://cream-46.pd.infn.it:8154/authz" /> <endpoint url="https://cream-46-1.pd.infn.it:8154/authz" /> </argus-pep> </verbatim> ---+++ [[https://savannah.cern.ch/bugs/?87690][Bug #87690]] Not possible to map different queues to different clusters for CREAM configured in cluster mode - %RED% Not Implemented %ENDCOLOR% Configure via yaim a CREAM CE in cluster mode with different queues mapped to different clusters, e.g.: <verbatim> CREAM_CLUSTER_MODE=yes CE_HOST_cream_47_pd_infn_it_QUEUES="creamtest1 creamtest2" QUEUE_CREAMTEST1_CLUSTER_UniqueID=cl1id QUEUE_CREAMTEST2_CLUSTER_UniqueID=cl2id </verbatim> Then query the resource bdii of the CREAM, and check the =GlueForeignKey= attributes of the different glueCEs: they should refer to the specified clusters: <verbatim> ldapsearch -h cream-47.pd.infn.it -p 2170 -x -b o=grid objectclass=GlueCE GlueForeignKey # extended LDIF # # LDAPv3 # base <o=grid> with scope subtree # filter: objectclass=GlueCE # requesting: GlueForeignKey # # cream-47.pd.infn.it:8443/cream-lsf-creamtest2, resource, grid dn: GlueCEUniqueID=cream-47.pd.infn.it:8443/cream-lsf-creamtest2,Mds-Vo-name=r esource,o=grid GlueForeignKey: GlueClusterUniqueID=cl12d # cream-47.pd.infn.it:8443/cream-lsf-creamtest1, resource, grid dn: GlueCEUniqueID=cream-47.pd.infn.it:8443/cream-lsf-creamtest1,Mds-Vo-name=r esource,o=grid GlueForeignKey: GlueClusterUniqueID=cl1id </verbatim> ---+++ [[https://savannah.cern.ch/bugs/?87799][Bug #87799]] Add yaim variables to configure the GLUE 2 WorkingArea attributes - %RED% Not Implemented %ENDCOLOR% Set all (or some) of the following yaim variables: <verbatim> WORKING_AREA_SHARED WORKING_AREA_GUARANTEED WORKING_AREA_TOTAL WORKING_AREA_FREE WORKING_AREA_LIFETIME WORKING_AREA_MULTISLOT_TOTAL WORKING_AREA_MULTISLOT_FREE WORKING_AREA_MULTISLOT_LIFETIME </verbatim> and then configure via yaim. Then query the resource bdii of the CREAM CE and verify that the relevant attributes of the glue2 ComputingManager object are set. ---+++ [[https://savannah.cern.ch/bugs/?88078][Bug #88078]] CREAM DB names should be configurable - %RED% Not Implemented %ENDCOLOR% Configure from scratch a CREAM CE setting the yaim variables: =CREAM_DB_NAME= and =DELEGATION_DB_NAME=, e.g.: <verbatim> CREAM_DB_NAME=abc DELEGATION_DB_NAME=xyz </verbatim> and then configure via yaim. Then check if the two databases have been created: # mysql -u xxx -p Enter password: Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 7176 Server version: 5.0.77 Source distribution Type 'help;' or '\h' for help. Type '\c' to clear the buffer. mysql> show databases; +--------------------+ | Database | +--------------------+ | information_schema | | abc | | test | | xyz | +--------------------+ 4 rows in set (0.02 sec) </verbatim> Try also a job submission to verify if everything works properly. ---+++ [[https://savannah.cern.ch/bugs/?89489][Bug #89489]] yaim plugin for CREAM CE does not execute a check function due to name mismatch - %GREEN% Implemented %ENDCOLOR% Configure a CREAM CE via yaim and save the yaim output. It should contain the string: <verbatim> INFO: Executing function: config_cream_gip_scheduler_plugin_check </verbatim> ---+++ [[https://savannah.cern.ch/bugs/?89664][Bug #89664]] yaim-cream-ce doesn't manage spaces in CE_OTHERDESCR - %RED% Not Implemented %ENDCOLOR% Try to set the yaim variable =CE_OTHERDESCR= to: <verbatim> CE_OTHERDESCR="Cores=1" </verbatim> Perform the following ldap query on the resource bdii: <verbatim> ldapsearch -h <CREAM CE node> -x -p 2170 -b "o=glue" objectclass=GLUE2ExecutionEnvironment GLUE2EntityOtherInfo </verbatim> This should also return: <verbatim> GLUE2EntityOtherInfo: Cores=1 </verbatim> Try then to set the yaim variable =CE_OTHERDESCR= to: <verbatim> CE_OTHERDESCR="Cores=2, Benchmark=4-HEP-SPEC06" </verbatim> and reconfigure via yaim. Perform the following ldap query on the resource bdii: <verbatim> ldapsearch -h <CREAM CE node> -x -p 2170 -b "o=glue" objectclass=GLUE2ExecutionEnvironment GLUE2EntityOtherInfo </verbatim> This should also return: <verbatim> GLUE2EntityOtherInfo: Cores=2 </verbatim> Then perform the following ldap query on the resource bdii: <verbatim> ldapsearch -h <CREAM CE node> -x -p 2170 -b "o=glue" objectclass=Glue2Benchmark </verbatim> This should return something like: <verbatim> # cream-47.pd.infn.it_hep-spec06, cream-47.pd.infn.it, ppp, resource, glue dn: GLUE2BenchmarkID=cream-47.pd.infn.it_hep-spec06,GLUE2ResourceID=cream-47.pd.infn.it,GLUE2ServiceID=ppp,GLUE2GroupID=resource,o=glue GLUE2BenchmarkExecutionEnvironmentForeignKey: cream-47.pd.infn.it GLUE2BenchmarkID: cream-47.pd.infn.it_hep-spec06 GLUE2BenchmarkType: hep-spec06 objectClass: GLUE2Entity objectClass: GLUE2Benchmark GLUE2EntityCreationTime: 2012-01-13T17:07:52Z GLUE2BenchmarkValue: 4 GLUE2EntityOtherInfo: InfoProviderName=glite-ce-glue2-benchmark-static GLUE2EntityOtherInfo: InfoProviderVersion=1.0 GLUE2EntityOtherInfo: InfoProviderHost=cream-47.pd.infn.it GLUE2BenchmarkComputingManagerForeignKey: ppp_Manager GLUE2EntityName: Benchmark hep-spec06 </verbatim> ---+++ [[https://savannah.cern.ch/bugs/?89784][Bug #89784]] Improve client side description of authorization failure - %RED% Not Implemented %ENDCOLOR% Try to remove the lsc files for your VO and try a submission to that CE. It should return an authorization error. Then check the glite-ce-cream.log. It should report something like: <verbatim> 13 Jan 2012 18:21:21,270 org.glite.voms.PKIVerifier - Cannot find usable certificates to validate the AC. Check that the voms server host certificate is in your vomsdir directory. 13 Jan 2012 18:21:21,602 org.glite.ce.commonj.authz.gjaf.LocalUserPIP - glexec error: [gLExec]: LCAS failed, see '/var/log/glexec/lcas_lcmaps.log' for more info. 13 Jan 2012 18:21:21,603 org.glite.ce.commonj.authz.gjaf.ServiceAuthorizationChain - Failed to get the local user id via glexec: glexec error: [gLExec]: LCAS failed, see '/var/log/glexec/lcas_lcmaps.log' for more info. org.glite.ce.commonj.authz.AuthorizationException: Failed to get the local user id via glexec: glexec error: [gLExec]: LCAS failed, see '/var/log/glexec/lcas_lcmaps.log' for more info. </verbatim> ---+++ [[https://savannah.cern.ch/bugs/?91819][Bug #91819]] glite_cream_load_monitor should read the thresholds from a conf file %GREEN% Implemented %ENDCOLOR% Tested through the =limiter= test of the Robot based test-suite ---+++ [[https://savannah.cern.ch/bugs/?92102][Bug #92102]] Tomcat attributes in the CREAM CE should be configurable via yaim - %RED% Not Implemented %ENDCOLOR% Set in siteinfo.def: <verbatim> CREAM_JAVA_OPTS_HEAP="-Xms512m -Xmx1024m" </verbatim> and configure via yaim. The check /etc/tomca5.conf (in EMI2 SL5 X86_64 /etc/tomcat5/tomcat5.conf), where there should be: <verbatim> JAVA_OPTS="${JAVA_OPTS} -server -Xms512m -Xmx1024m" </verbatim> ---+++ [[https://savannah.cern.ch/bugs/?92338][Bug #92338]] CREAM load limiter should not disable job submissions when there is no swap space - %RED% Not Implemented %ENDCOLOR% To test the fix, consider a CREAM CE on a machine without swap. Verify that the limiter doesn't disable job submissions. Note: to show swap partition: cat /proc/swaps to check swap: top to disable swap: swapoff -a (or swapoff /<swap_partition>) to enable swap: swapon -a The limiter checks every 10 minutes the memory usage. ---+++ [[https://savannah.cern.ch/bugs/?93768][Bug #93768]] There's a bug in logfile handling - %RED% Not Implemented %ENDCOLOR% To verify the fix, try the =--logfile= option with e.g. the =glite-ce-job-submit= command. Berify that the log file is created in the specified path ---++ Fixes provided with CREAM 1.13.4 ---+++ [[https://savannah.cern.ch/bugs/?95480][Bug #95480]] CREAM doesn't transfert the output files remotely under well known conditions - see [[#FixFor95480][Fix for 1.14.1]] ---++ Fixes provided with CREAM 1.13.3 ---+++ [[http://savannah.cern.ch/bugs/?81561][Bug #81561]] Make JobDBAdminPurger script compliant with CREAM EMI environment. - %GREEN% Implemented %ENDCOLOR% STATUS: %GREEN% Implemented %ENDCOLOR% To test the fix, simply run on the CREAM CE as root the JobDBAdminPurger.sh. E.g.: <verbatim> # JobDBAdminPurger.sh -c /etc/glite-ce-cream/cream-config.xml -u <user> -p <passwd> -s DONE-FAILED,0 START jobAdminPurger </verbatim> It should work without reporting error messages: <verbatim> ----------------------------------------------------------- Job CREAM595579358 is going to be purged ... - Job deleted. JobId = CREAM595579358 CREAM595579358 has been purged! ----------------------------------------------------------- STOP jobAdminPurger </verbatim> Starting from EMI2 the command to run is: <verbatim> JobDBAdminPurger.sh -c /etc/glite-ce-cream/cream-config.xml -s DONE-FAILED,0 </verbatim> ---+++ [[http://savannah.cern.ch/bugs/?83238][Bug #83238]] Sometimes CREAM does not update the state of a failed job. - %GREEN% Implemented %ENDCOLOR% STATUS: %GREEN% Implemented %ENDCOLOR% To test the fix, try to kill by hand a job. The status of the job should eventually be: <verbatim> Status = [DONE-FAILED] ExitCode = [N/A] FailureReason = [Job has been terminated (got SIGTERM)] </verbatim> ---+++ [[http://savannah.cern.ch/bugs/?83749][Bug #83749]] JobDBAdminPurger cannot purge jobs if configured sandbox dir has changed. - %GREEN% Implemented %ENDCOLOR% STATUS: %RED% Not implemented %ENDCOLOR% To test the fix, submit some jobs and then reconfigure the service with a different value of =CREAM_SANDBOX_PATH=. Then try, with the =JobDBAdminPurger.sh= script, to purge some jobs submitted before the switch. It must be verified: * that the jobs have been purged from the CREAM DB (i.e. a =glite-ce-job-status= should not find them anymore) * that the relevant CREAM sandbox directories have been deleted ---+++ [[http://savannah.cern.ch/bugs/?84374][Bug #84374]] yaim-cream-ce: GlueForeignKey: GlueCEUniqueID: published using : instead of=. - %GREEN% Implemented %GREEN% STATUS: %GREEN% Implemented %ENDCOLOR% To test the fix, query the resource bdii of the CREAM-CE: <verbatim> ldapsearch -h <CREAM CE host> -x -p 2170 -b "o=grid" | grep -i foreignkey | grep -i glueceuniqueid </verbatim> Entries such as: <verbatim> GlueForeignKey: GlueCEUniqueID=cream-35.pd.infn.it:8443/cream-lsf-creamtest1 </verbatim> i.e.: <verbatim> GlueForeignKey: GlueCEUniqueID=<CREAM CE ID> </verbatim> should appear. ---+++ [[http://savannah.cern.ch/bugs/?86191][Bug #86191]] No info published by the lcg-info-dynamic-scheduler for one VOView - %GREEN% Implemented %ENDCOLOR% STATUS: %GREEN% Implemented %ENDCOLOR% To test the fix, issue the following ldapsearch query towards the resource bdii of the CREAM-CE: <verbatim> $ ldapsearch -h cream-35 -x -p 2170 -b "o=grid" | grep -i GlueCEStateWaitingJobs | grep -i 444444 </verbatim> It should not find anything ---+++ [[http://savannah.cern.ch/bugs/?87361][Bug #87361]] The attribute cream_concurrency_level should be configurable via yaim. - %GREEN% Implemented %ENDCOLOR% STATUS: %GREEN% Implemented %ENDCOLOR% To test the fix, set in =seiteinfo.def= the variable =CREAM_CONCURRENCY_LEVEL= to a certain number (n). After configuration verify that in =/etc/glite-ce-cream/cream-config.xml there is: * [EMI1] <verbatim> cream_concurrency_level="n" </verbatim> * [EMI2] <verbatim> commandworkerpoolsize="n" </verbatim> ---+++ [[http://savannah.cern.ch/bugs/?87492][Bug #87492]] CREAM doesn't handle correctly the jdl attribute "environment". - %GREEN% Implemented %ENDCOLOR% STATUS: %GREEN% Implemented %ENDCOLOR% To test the fix, submit the following JDL using =glite-ce-job-submit=: <verbatim> Environment = { "GANGA_LCG_VO='camont:/camont/Role=lcgadmin'", "LFC_HOST='lfc0448.gridpp.rl.ac.uk'", "GANGA_LOG_HANDLER='WMS'" }; executable="/bin/env"; stdoutput="out.out"; outputsandbox={"out.out"}; outputsandboxbasedesturi="gsiftp://localhost"; </verbatim> When the job is done, retrieve the output and check that in =out.out= the variables =GANGA_LCG_VO=, =LFC_HOST= and =GANGA_LOG_HANDLER= have exactly the values defined in the JDL. ---+ gLite-CLUSTER ---++ Fixes provided with glite-CLUSTER v. 2.0.1 ---+++ [[https://savannah.cern.ch/bugs/?99750][Bug #99750]] Generated configuration files conflict with CREAM ones %RED%Not implemented%ENDCOLOR% * Install emi-cream-ce and emi-cluster on the same machine * Run YAIM configurator specifying the two nodes: <verbatima>-n creamCE -n glite-CLUSTER</verbatim> * Verify that the attributes !GLUE2EndpointHealthState and !GLUE2EndpointHealthStateInfo for the CREAM GLUE2 Computing Endpoint are correctly published ---+++ [[https://savannah.cern.ch/bugs/?99824][Bug #99824]] Remove lcg prefix from template %RED%Not implemented%ENDCOLOR% Verify that the file /opt/glite/yaim/examples/siteinfo/services/glite-cluster contains the following comment:<verbatim> # The name of the job manager used by the gatekeeper # This variable has been renamed in the new infosys configuration. # The old variable name was: JOB_MANAGER # Please, define: pbs, lfs, sge or condor #CE_HOST_<hostname>_CE_InfoJobManager=my_job_manager</verbatim> ---+++ [[https://savannah.cern.ch/bugs/?100061][Bug #100061]] Wrong GLUE output format in a CREAM+Cluster installation %RED%Not implemented%ENDCOLOR% * Install emi-cream-ce and emi-cluster on the same machine * Run YAIM configurator specifying the two nodes: <verbatima>-n creamCE -n glite-CLUSTER</verbatim> * Submit some jobs to the CREAM CE * Verify that the !Glue1 attributes !GlueCEStateFreeJobSlots, !GlueCEStateRunningJobs and !GlueCEStateWaitingJobs are correctly published * Verify that the !Glue2 attributes !GLUE2ComputingShareFreeSlots, !GLUE2ComputingShareWaitingJobs and !GLUE2ComputingShareRunningJobs are correctly published * Install and configure emi-cream-ce and emi-cluster on different machines * Submit some jobs to the CREAM CE * Verify that the !Glue1 attributes !GlueCEStateFreeJobSlots, !GlueCEStateRunningJobs and !GlueCEStateWaitingJobs are correctly published by the resource BDII on the CREAM node * Verify that the !Glue2 attributes !GLUE2ComputingShareFreeSlots, !GLUE2ComputingShareWaitingJobs and !GLUE2ComputingShareRunningJobs are correctly published by the resource BDII on the cluster node ---+++ [[https://savannah.cern.ch/bugs/?100395][Bug #100395]] CE_* YAIM variables not mandatory for GLUE2 in cluster mode %RED%Not implemented%ENDCOLOR% * Install and configure a CREAM node in cluster mode, removing all the CE_* variables from the site-info.def but the CE_SMPSIZE one (use variables SUBCLUSTER_subclusterid_HOST_* instead) * Verify that YAIM configurator reports no errors ---++ Fixes provided with previous versions ---+++[[https://savannah.cern.ch/bugs/?69318][Bug #69318]] The cluster publisher needs to publish in GLUE 2 too %RED% Not implemented %ENDCOLOR% * Check if the resource BDII publishes glue 2 GLUE2ComputingService objectclasses. There should be one GLUE2ComputingService objectclass: <verbatim> ldapsearch -h <gLite-CUSTER hostname> -x -p 2170 -b "o=glue" objectclass=GLUE2ComputingService </verbatim> * Check if the resource BDII publishes glue 2 GLUE2Manager objectclasses. There should be one GLUE2Manager objectclass. <verbatim> ldapsearch -h <gLite-CUSTER hostname> -x -p 2170 -b "o=glue" objectclass=GLUE2Manager </verbatim> * Check if the resource BDII publishes glue 2 GLUE2Share objectclasses. There should be one GLUE2Share objectclass per each VOview. <verbatim> ldapsearch -h <gLite-CUSTER hostname> -x -p 2170 -b "o=glue" objectclass=GLUE2Share </verbatim> * Check if the resource BDII publishes glue 2 GLUE2ExecutionEnvironment objectclasses. There should be at least one GLUE2ExecutionEnvironment objectclass. <verbatim> ldapsearch -h <gLite-CUSTER hostname> -x -p 2170 -b "o=glue" objectclass=GLUE2ExecutionEnvironment </verbatim> * Check if the resource BDII publishes glue 2 GLUE2ComputingEndPoint objectclasses with GLUE2EndpointInterfaceName equal to org.glite.ce.ApplicationPublisher. There should be one of such objectclass. <verbatim> ldapsearch -h <gLite-CUSTER hostname> -x -p 2170 -b "o=glue" "(&(objectclass=GLUE2ComputingEndPoint)(GLUE2EndpointInterfaceName=org.glite.ce.ApplicationPublisher))" </verbatim> ---+++[[https://savannah.cern.ch/bugs/?86512][Bug #86512]] YAIM CLuster Publisher incorrectly configures GlueClusterService and GlueForeignKey for CreamCEs- %RED% Not implemented %ENDCOLOR% To test the fix issue a ldapsearch such as: <verbatim> ldapsearch -h <gLite-CLUSTER> -x -p 2170 -b "o=grid" | grep GlueClusterService </verbatim> Then issue a ldapsearch such as: <verbatim> ldapsearch -h <gLite-CLUSTER> -x -p 2170 -b "o=grid" | grep GlueForeignKey | grep -v Site </verbatim> Verify that for each returned line, the format is: <verbatim> <hostname>:8443/cream-<lrms>-<queue> </verbatim> ---+++[[https://savannah.cern.ch/bugs/?87691][Bug #87691]] Not possible to map different queues of the same CE to different clusters - %RED% Not implemented %ENDCOLOR% To test this fix, configure a gLite-CLUSTER with at least two different queues mapped to different clusters (use the yaim variables =QUEUE_<queue>_CLUSTER_UniqueID=), e.g." <verbatim> QUEUE_CREAMTEST1_CLUSTER_UniqueID=cl1id QUEUE_CREAMTEST2_CLUSTER_UniqueID=cl2id </verbatim> Then query the resource bdii of the gLite-CLUSTER and verify that: * for the GlueCluster objectclass with =GlueClusterUniqueID= equal to =cl1id=, the attributes =GlueClusterService= and =GlueForeignKey= refers to CEIds with =creamtest1= as queue * for the GlueCluster objectclass with =GlueClusterUniqueID= equal to =cl2id=, the attributes =GlueClusterService= and =GlueForeignKey= refers to CEIds with =creamtest2= as queue ---+++[[https://savannah.cern.ch/bugs/?87799][Bug #87799]] Add yaim variables to configure the GLUE 2 WorkingArea attributes - %RED% Not implemented %ENDCOLOR% Set all (or some) of the following yaim variables: <verbatim> WORKING_AREA_SHARED WORKING_AREA_GUARANTEED WORKING_AREA_TOTAL WORKING_AREA_FREE WORKING_AREA_LIFETIME WORKING_AREA_MULTISLOT_TOTAL WORKING_AREA_MULTISLOT_FREE WORKING_AREA_MULTISLOT_LIFETIME </verbatim> and then configure via yaim. Then query the resource bdii of the gLite cluster and verify that the relevant attributes of the glue2 ComputingManager object are set. ---+ CREAM Torque module ---++ Fixes provided with CREAM TORQUE module 2.0.1 ---+++ [[https://savannah.cern.ch/bugs/?95184][Bug #95184]] Missing real value for !GlueCEPolicyMaxSlotsPerJob %RED%Not Implemented%ENDCOLOR% * configure the TORQUE server so that the parameter "resources_max.procct" for a given queue is defined and greater than zero * run the script /var/lib/bdii/gip/plugin/glite-info-dynamic-ce * verify that the attribute !GlueCEPolicyMaxSlotsPerJob for the given queue reports the value of the parameter above ---+++ [[https://savannah.cern.ch/bugs/?96636][Bug #96636]] Time limits for GLUE 2 are different to GLUE 1 %RED%Not Implemented%ENDCOLOR% * run the command <verbatim>ldapsearch -x -H ldap://hostname:2170 -b o=glue '(&(objectclass=GLUE2ComputingShare))' [attributeName] </verbatim> where attributeName is one of the following arguments: !GLUE2ComputingShareMaxCPUTime, !GLUE2ComputingShareMaxWallTime, !GLUE2ComputingShareDefaultCPUTime, !GLUE2ComputingShareDefaultWallTime * verify that each value, if available, is expressed in seconds * run the command <verbatim>ldapsearch -x -H ldap://hostname:2170 -b o=grid '(&(objectclass=GLUECE))' [attributeName] </verbatim> where attributeName is one of the following arguments: !GlueCEPolicyMaxCPUTime, !GlueCEPolicyMaxObtainableCPUTime, !GlueCEPolicyMaxWallClockTime, !GlueCEPolicyMaxObtainableWallClockTime * verify that each value, if available, is expressed in minutes ---+++ [[https://savannah.cern.ch/bugs/?99639][Bug #99639]] lcg-info-dynamic-scheduler-pbs cannot parse qstat output with spurious lines %RED%Not Implemented%ENDCOLOR% * save the output of the command <verbatim>qstat -f</verbatim> in a temporary file * insert several spurious lines within the block of job data, it's better to insert also some empty lines. * run the command <verbatim>lcg-info-dynamic-scheduler-pbs -c [temporary file]</verbatim> * verify that the execution works fine ---++ Fixes from previous releases ---+++[[https://savannah.cern.ch/bugs/?17325][Bug #17325]] Default time limits not taken into account - %RED% Not implemented %ENDCOLOR% To test the fix for this bug, consider a PBS installation where for a certain queue both default and max values are specified, e.g.: <verbatim> resources_max.cput = A resources_max.walltime = B resources_default.cput = C resources_default.walltime = D </verbatim> Verify that the published value for GlueCEPolicyMaxCPUTime is C and that the published value for GlueCEPolicyMaxWallClockTime is D ---+++[[https://savannah.cern.ch/bugs/?49653][Bug #49653]] lcg-info-dynamic-pbs should check pcput in addition to cput - %RED% Not implemented %ENDCOLOR% To test the fix for this bug, consider a PBS installation where for a certain queue both cput and pcput max values are specified, e.g.: <verbatim> resources_max.cput = A resources_max.pcput = B </verbatim> Verify that the published value for GlueCEPolicyMaxCPUTime is the minimum between A an B. Then consider a PBS installation where for a certain queue both cput and pcput max and default values are specified, e.g.: <verbatim> resources_max.cput = C resources_default.cput = D resources_max.pcput = E resources_default.pcput = F </verbatim> Verify that the published value for GlueCEPolicyMaxCPUTime is the minimum between D and F. ---+++[[https://savannah.cern.ch/bugs/?76162][Bug #76162]] YAIM for APEL parsers to use the BATCH_LOG_DIR for the batch system log location - %RED% Not implemented %ENDCOLOR% To test the fix for this bug, set the yaim variable =BATCH_ACCT_DIR= and configure via yaim. Check the file =/etc/glite-apel-pbs/parser-config-yaim.xml= and verify the section: <verbatim> <Logs searchSubDirs="yes" reprocess="no"> <Dir>X</Dir> </verbatim> X should be the value specified for =BATCH_ACCT_DIR=. Then reconfigure without setting =BATCH_ACCT_DIR=. Check the file =/etc/glite-apel-pbs/parser-config-yaim.xml= and verify that the directory name is =${TORQUE_VAR_DIR}/server_priv/accounting= ---+++[[https://savannah.cern.ch/bugs/?77106][Bug #77106]] PBS info provider doesn't allow - in a queue name - %RED% Not implemented %ENDCOLOR% To test the fix, configure a CREAM CE in a PBS installation where at least a queue has a - in its name. Then log as root on the CREAM CE and run: <verbatim> /sbin/runuser -s /bin/sh ldap -c "/var/lib/bdii/gip/plugin/glite-info-dynamic-ce" </verbatim> Check if the returned information is correct. ---+ CREAM LSF module ---++[[https://savannah.cern.ch/bugs/?88720][Bug #88720]] Too many '9' in GlueCEPolicyMaxCPUTime for LSF - %RED% Not implemented %ENDCOLOR% To test the fix, query the CREAM CE resource bdii in the following way: <verbatim> ldapsearch -h <CREAM CE node> -x -p 2170 -b "o=grid" | grep GlueCEPolicyMaxCPUTime | grep 9999999999 </verbatim> This shouldn't return anything. ---++[[https://savannah.cern.ch/bugs/?89767][Bug #89767]] The LSF dynamic infoprovider shouldn't publish GlueCEStateFreeCPUs and GlueCEStateFreeJobSlots - %RED% Not implemented %ENDCOLOR% To test the fix, log as root on the CREAM CE and run: <verbatim> /sbin/runuser -s /bin/sh ldap -c "/var/lib/bdii/gip/plugin/glite-info-dynamic-ce" </verbatim> Among the returned information, there shouldn't be GlueCEStateFreeCPUs and GlueCEStateFreeJobSlots. ---++[[https://savannah.cern.ch/bugs/?89794][Bug #89794]] LSF info provider doesn't allow - in a queue name - %RED% Not implemented %ENDCOLOR% To test the fix, configure a CREAM CE in a LSF installation where at least a queue has a - in its name. Then log as root on the CREAM CE and run: <verbatim> /sbin/runuser -s /bin/sh ldap -c "/var/lib/bdii/gip/plugin/glite-info-dynamic-ce" </verbatim> Check if the returned information is correct. ---++[[https://savannah.cern.ch/bugs/index.php?90113][Bug #90113]] missing yaim check for batch system - %RED% Not implemented %ENDCOLOR% To test the fix, configure a CREAM CE without having also installed LSF. yaim installation should fail saying that there were problems with LSF installation.
Edit
|
Attach
|
PDF
|
H
istory
:
r111
|
r97
<
r96
<
r95
<
r94
|
B
acklinks
|
V
iew topic
|
More topic actions...
Topic revision: r95 - 2013-03-13
-
PaoloAndreetto
Home
Site map
CEMon web
CREAM web
Cloud web
Cyclops web
DGAS web
EgeeJra1It web
Gows web
GridOversight web
IGIPortal web
IGIRelease web
MPI web
Main web
MarcheCloud web
MarcheCloudPilotaCNAF web
Middleware web
Operations web
Sandbox web
Security web
SiteAdminCorner web
TWiki web
Training web
UserSupport web
VOMS web
WMS web
WMSMonitor web
WeNMR web
General Doc
Functional Description
Batch System Support
CREAM and Information Service
Release Notes
Known Issues
Security in CREAM
Nagios Probes to monitor CREAM and WN
Papers
Presentations
User Doc
CREAM User Guide for EMI-1
CREAM User Guide for EMI-2
CREAM User Guide for EMI-3
CREAM JDL Guide
BLAH User Guide
Troubleshooting Guide
System Administrator Doc
System Administrator Guide for CREAM (EMI-3 release)
System Administrator Guide for CREAM (EMI-2 release)
System Administrator Guide for CREAM (EMI-1 release)
The CREAM configuration file
The CEMonitor configuration file
The CREAM CE Service Reference Card (EMI-2 release)
The CREAM CE Service Reference Card (EMI-1 release)
Batch System related documentation
Troubleshooting Guide
The guide for integrating EMIR in CREAM
]
Developers Doc
CREAM Client API C++ Documentation
CREAM Client API for Python
Other Doc
Contacts
Moving to CREAM from LCG-CE
Testing
Internal Collaboration Information
Credits
CREAM Web utilities
Create New Topic
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
P
P
View
Raw View
Print version
Find backlinks
History
More topic actions
Edit
Raw edit
Attach file or image
Edit topic preference settings
Set new parent
More topic actions
Account
Log In
Edit
Attach
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback