PATCH 3179
Automatic tests:
- report #1:
- CREAM UI version: 1.12.1; CREAM testsuite version: 1.0.7
- Used event query for monitoring and BUpdater/BNotifier for status change detection
- Batch system: LSF
- All the tests complete successfully, view the reports
- report #2
- CREAM UI version: 1.11.1; CREAM testsuite version: 1.0.6
- used direct polling for monitoring and BLParser for status change detection
- Batch system: TORQUE
- All the tests complete successfully, view the reports
Since the current version of the CREAM CE does not enable CEMonitor for a standard installation, all the tests that make use of the notification mechanism have not been taken into account
Checked bugs:
- Bug #37430
: BLParser should properly filter it's log output FIXED
- Not too clear what the fix is supposed to be
- According to the developer (M. Mezzadri) the command received by the old blparser from CREAM should be reported in the blparser log file without an extra new-line
- Verified in the old blparser log file
- Bug #45364
: BLAH_JOB_CANCEL should report failure reason FIXED
- submit a job top CREAM and then cancels it using the LRMS command (e.g. qdel). Before the blparser (and therefore CREAM) realizes that the job was cancelled, issue a glite-ce-job-cancel.
- Issue a glite-ce-job-status -L 2. For the cancel command a failure )alomng with its reason) should be reported such as:
*** Command Name = [JOB_CANCEL]
Command Category = [JOB_MANAGEMENT]
Command Status = [ERROR]
Command Fail Reason = [qdel: Unknown Job Id 45299.cream-38.pd.infn.it]
Creation Time = [Fri 26 Feb 2010 18:43:27] (1267206207)
Start Scheduling Time = [Fri 26 Feb 2010 18:43:27] (1267206207)
Start Processing Time = [Fri 26 Feb 2010 18:43:27] (1267206207)
Execution Completed Time = [Fri 26 Feb 2010 18:43:30] (1267206210)
- Bug #46419
: CREAM sandbox area should be scratched when the CREAM DB is scratched FIXED
- Submit at least one job to the CE and wait for its termination, so that the sandbox area is not empty
- Increment the value of the parameters creamdb_database_version in the file /opt/glite/etc/glite-ce-cream/cream-config.xml.template
- reconfigure the node with yaim and check whether the sandbox area is empty
- Bug #47070
: [ yaim-cream ] yaim cream module should support remote mysql setup NOT TESTED
- Bug #47254
: Possible problems if the proxy used to talk with CREAM is shorter than 10 minutes FIXED
- create a voms-proxy whose lifetime is shorter than 10 minutes
- submit a simple job whose lifetime is shorter than the voms-proxy one and verify its correct termination
- Bug #47804
: Possible problems configuring blah in CREAM-CE for LSF NOT TESTED
- Bug #48786
: Load should be one of the parameter of DISABLE_SUBMISSION_POLICY in CREAM FIXED
- specify a low load level in the file /opt/glite/bin/glite_cream_load_monitor
- try to overload the node with the testsuite: cream-test-monitored-submit -r 30 -n 20000 -m 2000 -C 50 -l log4py.conf -j test.jdl -R <ceID> --sotimeout 60 --vo dteam --valid 02:00 and verify that with high load the submissions are rejected
- Bug #49497
: user proxies on CREAM do not get cleaned up FIXED
- delegate a proxy whose lifetime is shorter than the parameter delegation_purge_rate of the CREAM configuration file
- wait for the new proxy cleanup run (at least twice the delegation_purge_rate) and verify that the proxy file has been removed from the directory
- Bug #50226
: yaim-cream-ce should use config_secure_tomcat FIXED
- install the CE node from scratch
- verify the state of the trustmanager accessing the URL: https://ce-host:8443/ce-cream/services
- Bug #50723
: CREAM: check for the jobtype is not case insensitive FIXED
- submit a job specifying the parameter "jobtype=Normal" in the JDL and verify the correct execution of the job
- Bug #50875
: CREAM: reason for cancelled jobs should be reported FIXED
- submit and cancel a job using the CREAM CLI command and verify that the reason reports "Cancelled by user"
- submit and cancel a job using the LRMS command (e.g. qdel) and verify that the reason reports "Cancelled by CE admin"
- Bug #50876
: CREAM reports that the proxy expired even when the problem is in detecting the lifetime of the proxy FIXED
- force a failure for the command grid-proxy-init in the jobwrapper, for example delegating a proxy on the CE, manually renaming the corresponding delegated proxy in the sandbox area and then submitting a job using the given delegation ID.
- verify that the failure reason reported by the job status contains the message: Problem to detect the lifetime of the proxy
- Bug #51046
: CREAM: DelegProxyInfo info sometimes is wrong FIXED
- submit a job, wait for its termination and verify the correct lifetime of the proxy in the glite-ce-job-status output
- Bug #51118
: config_cream_glexec doesn't set glexec permissions right FIXED
- install a CE node from scratch and verify the permissions for /opt/glite/sbin/glexec (6555) and /opt/glite/etc/glexec.conf (640)
- Bug #51124
: catalina.out is clogged with grid-proxy-init warnings FIXED
- submit a job and check the catalina.out file
- Bug #51128
: lcas-suexec.db on CREAM CE should be named lcas-glexec.db for consistency FIXED
- install a CE node from scratch
- verify the existence of the files: /opt/glite/etc/lcas/lcas-glexec.db and /opt/glite/etc/lcmaps/lcmaps-glexec.db
- Bug #51249
: [ yaim-cream-ce ] refactor config_cream_db FIXED
- Install the node from scratch and verify all the basic operations of the CREAM service
- Bug #51310
: Wrong event timestamp FIXED
- run the consumer server (glite-ce-monitor-consumer) on the client machine
- create a subscription for the topic CREAM_JOBS on the CE specifying the URL of the consumer server above
- submit a job and verify the validity of the field TIMESTAMP of any event
- Bug #51311
: Wrong event timestamp generated by the CREAM Job Sensor FIXED
- Bug #51313
: CEMon must not notify the expired events CANNOT REPRODUCE
- Bug #51705
: glexec-wrapper.sh should be removed from CREAM RPM FIXED
- check the content of glite-ce-cream rpm
- Bug #51706
: yaim-cream-ce: remove "lcg" prefix from JOB_MANAGER FIXED
- change the value of JOB_MANAGER in the siteinfo.def e.g. from lsf to lcglsf
- configure the node with YAIM and verify that in the resource BDII the string lsf (and not lcglsf) appears in the glueeeuniqueids
- Bug #51892
: Exception when using java.text.DateFormat.parse FIXED
- try to overload the node with the testsuite: cream-test-monitored-submit -r 30 -n 20000 -m 2000 -C 50 -l log4py.conf -j test.jdl -R --sotimeout 60 --vo dteam --valid 02:00 and verify that with high load the submissions are rejected
- verify the log of the CREAM service
- Bug #51928
: BLAH crashes if the cerequirements classad attribute is malformed FIXED
- submit a job specifying a malformed cerequirements parameter
- verify that the job is executed and the parameter is ignored
- Bug #51978
: CREAM can be slow to start FIXED
- submit a big bunch of long-lived jobs, for example using cream-test-monitored-submit -r 30 -n 2000 -m 2000 -C 100 --sotimeout 60 -j long.jdl -R <ce_id> where long.jdl is "[executable="/bin/sleep";arguments="3600";]"
- when all the jobs have been submitted restart the service and verify the startup time.
- verify in the CREAM and BLAHP logs that the jobs are checked one by one at startup, instead of polling all jobs from a given timestamp
- Bug #51993
: Proxy renewal not very efficient for multiple jobs having the same delegationid FIXED
- stress the renewal mechanism with a single short delegated proxy, for example with the following test: cream-test-monitored-submit -r 30 -n 2000 -m 2000 -C 50 --sotimeout 60 -j long.jdl -R <ce_id> --vo <vo_name> --valid <00:20>
- Bug #52020
: [ yaim-cream-ce ] Support use of file (besides syslog) for glexec logging NOT TESTED
- Bug #52050
: misleading error message "The problem seems to be related to glexec FIXED
- The CREAM service does not make use of glexec anymore, and therefore this error message can't appear anymore
- Bug #52051
: CEMon must remove all expired subscriptions on start-up FIXED
- create a subscription for the topic CREAM_JOBS on the CE with a short lifetime
- shutdown the service and wait for the expiration of the subscription
- restart the service and verify that the subscription does not exist anymore in the directory /opt/glite/var/cemonitor/subscription
- Bug #52052
: Sometimes the getInfo() operation does not report the right list of topics FIXED
- enable or disable the CE sensor removing or adding the corresponding tag in the file /opt/glite/etc/glite-ce-monitor/cemonitor-config.xml
- wait for cemonitor to reload the configuration (usually 10m)
- verify the availability of the topic using the command glite-ce-monitor-gettopics
- Bug #52268
: BLAH leaves files in /tmp when CErequirements is set FIXED
- submit a job specifying a simple CE requirements (e.g. cerequirements="other.GlueHostMainMemoryRAMSize > 2000")
- verify that, after the execution of the job, in the tmp directory no files ce-req-file-* are left
- Bug #52577
: [ yaim-cream-ce ] create CREAM_GLEXEC_USER_HOME variableNOT TESTED
- Bug #52651
: CREAM file descriptor overuse FIXED
- try to overload the node with the testsuite: cream-test-monitored-submit -r 30 -n 20000 -m 2000 -C 50 -l log4py.conf -j test.jdl -R --sotimeout 60 --vo dteam --valid 02:00 and verify that with high load the submissions are rejected
- seek "too many open files" in the CREAM log
- Bug #52719
: Blah doesn't set the 'executable' flag if a local jobwrapper is found FIXED
- Submitted a job to a CREAM CE
- Checked the BLAH wrapper: the chmod u+x of the CREAM JobWrapper is done in all cases (even if the job is going to be run on the WN via a local jobwrapper)
- Bug #52942
: Missing description for ISB/OSB error in jobwrapper FIXED
- submit a job with an unreachable host in the inputsandbox or in the outputsandboxbasedesturi parameter
- verify that the output of the glite-ce-job-status contains the full description of the failure
- Bug #53459
: [CREAM] Provide method to improve the detection of job status changes by ICE FIXED
- run the "monitored" part of the testsuite
, the latest version of the testsuite makes use of the "event query" mechanism for keeping track of the job status.
- Bug #53499
: CREAM job wrapper template should be put outside the jar FIXED
- check whether the file /opt/glite/share/webapps/ce-cream.war contains the file WEB-INF/jobwrapper.tpl
- Bug #54812
: lsf_submit.sh job requirement FIXED
- Created (and chmoded +x) the file /opt/glite/bin/lsf_local_submit_attributes.sh on the CREAM CE with the following content:
#!/bin/sh
echo "BSUB -n 2"
-
- Submitted a job to that CE, without specifying in the JDL the cerequirements attribute
- Checked (via bjobs -l) that the -n 2 directive was used (which means that the lsf_local_submit_attributes.sh was run)
- Bug #54900
: [ glite-yaim-cream-ce ] config_cream_tomcat_user should not add tomcat to VO FIXED
- check the membership of any VO group
- Bug #54949
: Some job can remain in running state when BLParser is restarted for both lsf and pbs HOPEFULLY FIXED
- Not easy to reproduce
- Submitted several jobs (logged in different batch system log files) to a CREAM CE configured with the old blparser
- Restarted CREAM
- Didn't notice problems in getting the status of these jobs
- Bug #55078
: Possible final state not considered in BLParserPBS and BUpdaterPBS CANNOT REPRODUCE
- To test the fix it would be necessary to have a scenario for which in the Torque log file for a certain job the event "Job Run..." is followed by the event "dequeuing from"
- Not able to reproduce such scenario
- Bug #55420
: Allow admin to purge CREAM jobs in a non terminal status FIXED
- temporary disconnect any WN from the CE, e.g. shutting down the mom server in a TORQUE installation
- submit a job
- on the CE with administrator privileges run the command: /opt/glite/sbin/JobDBAdminPurger.sh -u -p -s 2 as described in the wiki page
- verify with glite-ce-job-list that the job has been purged from the database
- verify that the sandbox directory of that job has been removed from /opt/glite/var/cream_sandbox
- remove manually the job from the batch system and reconnect all the WN
- Bug #55438
: BUpdater problems in updating job state with AssignFinalState for all batch system FIXED
- Submitted 3 jobs lasting 2 hours to a CREAM CE with only 2 job slots.
- For all the jobs the right events were logged by the bnotifier (i.e. it didn't log status=4 with failurereason=999)
- Bug #55531
: BUpdaterPBS should consider lines like "unable to run job" FIXED
- Bug #55565
: BLAH configuration attribute blah_disable_wn_proxy_renewal fails to disable proxy renewal. FIXED
- Verified issuing a BLAH_JOB_REFRESH proxy for a running job
- Moreover the BLAH proxy renewal operation is not used anymore (the proxy on the CE is renewed by CREAM and no more by BLAH)
- Bug #56075
: Job failure reasons missing in the CREAM log file FIXED
- submit a job with an unreachable host in the inputsandbox or in the outputsandboxbasedesturi parameter
- verify that in the log file appears the message: failureReason=Cannot move ISB (): error: globus_xio: Unable to connect to xxxx:2811 globus_xio: globus_libc_getaddrinfo failed.globus_common: Name or service not known
- Bug #56339
: [blah] "service glite-ce-blparser restart" does not always work FIXED
- try the command /opt/glite/etc/init.d/glite-ce-blparser restart and verify the correct behaviour of the script
- Bug #56367
: CREAM RPM depends on C libs FIXED
- check if the package of glite-ce-cream contains any elf executable
- Bug #56518
: BLAH blparser doesn't start after boot of the machine FIXED
- install the CE node from scratch specifying the parameter BLPARSER_WITH_UPDATER_NOTIFIER=false in the yaim configuration for creamCE
- reboot the machine and verify that the blparser_master is running
- Bug #56697
: CREAM logging must be improved when CREAM register operation fails FIXED
- force the service to fail a register operation, e.g. temporary renaming the sandbox directory
- verify that the log reports at least the JobID and the reason of the failure
- Bug #57210
: BLAH condor_submit script doesn't recognize certain options. CANNOT REPRODUCE
- Not possible to test the fix since we don't have CREAM based CEs with Condor as batch system
- Bug #57307
: condor_submit.sh does not support the handling of "local" attributes CANNOT REPRODUCE
- Not possible to test the fix since we don't have CREAM based CEs with Condor as batch system
- Bug #57820
: [yaim-cream-ce] CREAM-CE publishes GlueServiceDataValue incomplete FIXED
- run the infoprovider: /opt/glite/etc/gip/provider/glite-info-provider-service-cream-wrapper | grep GlueServiceDataValue
- verify that 3 different values are returned for the GlueServiceDataValue: the version, the DN and the host name of the CE
- Bug #58103
: Cream database Query performance FIXED
- Internal improvement
- run a set of stress-tests and verify the performance
- Bug #58109
: Wrong value for the "service version" property FIXED
- verify the property using the command glite-ce-service-info
- Bug #58119
: CREAM CE: publish Production instead of Special as default value for GlueCEStateStatus FIXED
- verify with /opt/glite/libexec/glite-info-wrapper | grep -i gluecestatestatus
- Bug #58423
: RFE: support for ISB/OSB transfers from/to gridftp servers running using credentials NOT TESTED
- Bug #58659
: NullPointerException from getStatus FIXED
- try to overload the node with the testsuite: cream-test-monitored-submit -r 30 -n 20000 -m 2000 -C 50 -l log4py.conf -j test.jdl -R --sotimeout 60 --vo dteam --valid 02:00 and verify that the log of the testsuite does not report any NullPointerException
- Bug #58792
: JobRegister fails, because cream_sandbox directory doesn't exist FIXED
- temporary rename the directory /opt/glite/var/cream_sandbox without turning off the service
- submit a job and verify that the failure reports "cannot create the job's working directory!"
- Bug #58941
: [yaim-cream-ce] lcmaps confs for glexec and gridftp are not fully synchronized NOT TESTED
- Bug #59005
: Possible problem with hold/resumed jobs in BUpdaterLSF FIXED
- Verified as reported here
- Bug #59329
: Proxy symlinks left in the registry area until purged FIXED
- submit a job and verify the verify the existence of the related symlink in the directory /opt/glite/var/blah/user_blah_job_registry.bjr/registry.proxydir
- when the job terminates verify that the symlink has been removed by blah.
- Bug #59686
: Possible crash of BUpdarePBS due to wrong malloc FIXED
- Define the parameter pbs_spoolpath in the file /opt/glite/etc/blah.config
- run the BUpdaterPBS daemon and verify its liveness
- Bug #59862
: [ yaim-cream-ce ] broken -v functionality NOT TESTED
- Bug #59962
: Sometimes the CREAM initialization fails with "UserId = ADMINISTRATOR is not enable for that operation"CANNOT REPRODUCE
- Bug #60831
: Error log message: "CREAM_JOB_SENSOR_HOST parameter not specified" FIXED
- verify that the parameter "CREAM_JOB_SENSOR_HOST" is not defined in the file /opt/glite/etc/glite-ce-cream/cream-config.xml
- submit several jobs
- verify that the log of the CREAM service does not report the error above
- Bug #61322
: CREAM jw doesn't set GLITE_WMS_RB_BROKERINFO FIXED
- submit a job and verify that the jobwrapper script, contained into the sandbox area for that job, defines correctly the __brokerinfo variable
- Bug #61401
: config_cream_blah and config_cream_clean don't take into account GLITE_LOCATION_LOGNOT TESTED
- Bug #61402
: [yaim-cream-ce] does not use GLITE_LOCATION_VAR/LOG is some cases.NOT TESTED
- Bug #61407
: Set CE_ID in the cream jw FIXED
- submit a job and verify that the jobwrapper script, contained into the sandbox area for that job, defines correctly the CE_ID variable
- Bug #61493
: [ yaim-cream-ce ] glexec_get_account policy order is wrong NOT TESTED
- Bug #61604
: yaim-cream-ce should not install config_gip_software_plugin FIXED
- verify that the glite-yaim-cream-ce package does not contain the file config_gip_software_plugin but it contains config_cream_gip_software_plugin instead
- Bug #61730
: CREAM jw: GLITE_WMS_LOG_DESTINATION should always be set with the FQDN FIXED
- submit a job and verify that the jobwrapper script, contained into the sandbox area for that job, defines a FQDN in the __ce_hostname variable
- Bug #61761
: CEMon must guarantee the notification rate FIXED
- enable the "CE Sensor" plugin
- create a subscription for the topic published by the sensor above with a running consumer: glite-ce-monitor-subscribe --cert <user_proxy> --key <user_proxy> --topic CE_MONITOR --dialects ISM_CLASSAD_GLUE_1.2 --consumer-url <consumer_url> --rate 10 --duration 600 <cemonitor_url>
- create on ore more subscriptions to non-existing consumer URL or to a fake blocking one (e.g. using nc -l -p <consumer port>) specifying the same rate as above
- verify that the notification rate for the first consumer is correct
- Bug #61790
: Problems in CREAM CE when there are "strange" characters in the subject certificate FIXED
- Verified submitting a job to a Torque CREAM CE with a proxy with subject: /DC=gov/DC=fnal/O=Fermilab/OU=Robots/CN=lcgcaf/CN=cdf/CN=Donatella Lucchesi/CN=UID:lucchesi
- With the same proxy there were problems before (see https://gus.fzk.de/ws/ticket_info.php?ticket=54767
)
- Bug #62070
: Possible problem with notification time in BNotifier HOPEFULLY FIXED
- Not possible to reproduce it according to the developer (M. Mezzadri)
- Bug #62207
: [ yaim-cream ] Enable Glue 2.0 publishing FIXED
- send the following query to the CE BDII: ldapsearch -x -h $(hostname) -p 2170 -b o=glue and verify that it returns GLUE2 schema and information
- Bug #62436
: Possible problem with updater if job remain queued too long FIXED
- Fixed as reported here
: 3 jobs lasting 2 hours were submitted to a CREAM CE with only 2 job slots. For the third one the BNotifier logged the right events (i.e. it didn't log status=4 with failurereason=999)
- Bug #62565
: yaim-cream-ce requires BLPARSER_HOST even if the new blparser has to be configuredNOT TESTED
- Bug #62776
: Yaim config for CREAM CE erroneously requires tomcat in glexec group FIXED
- Bug #62893
: Possible proxy renewal problem in the CREAM jw FIXED
- try to overload the node with the testsuite: cream-test-monitored-submit -r 30 -n 20000 -m 2000 -C 50 -l log4py.conf -j test.jdl -R <ceID> --sotimeout 60 --vo dteam --valid 00:30
- verify that no proxy related issues occur
- Bug #63398
: CREAM jw: removal of token should be retried in case of failure FIXED
- submit the following jdl:
[
environment= {"__token_file=gsiftp://host/path"};
executable="/bin/sleep";
arguments="30";
]
specifying existing host and path first and verify that the job terminate successfully; the owner of the token must be the mapped-user.
- submit the jdl above but specifying a fake host and/or path and verify that the job status reports 3 different failed attempts for taking the token:
"/opt/edg/libexec/edg-gridftp-base-rm: error globus_ftp_client: the server responded with an error 500 500-Command failed : System error in unlink: No such file or directory 500-A system call failed: No such file or directory 500 End"
- Bug #63874
: CREAM sandbox dir creation program should not attempt creation of parent directories.NOT TESTED
--
AlessioGianelle - 2010-02-05