PATCH 3179

Automatic tests:

Checked bugs:

  • Bug #17949: BLAH should operate with no access whatsoever to the batch system logs NOT TESTED

  • Bug #37430: BLParser should properly filter it's log output NOT TESTED
    • Not too clear what the fix is supposed to be
    • According to the developer (M. Mezzadri) the command received by the old blparser from CREAM should be reported in the blparser log file without an extra new-line
    • Verified in the old blparser log file

  • Bug #45364: BLAH_JOB_CANCEL should report failure reason FIXED
    • submit and cancel a job using the LRMS command (e.g. qdel) and verify that the reason reports "Cancelled by CE admin"

  • Bug #46419: CREAM sandbox area should be scratched when the CREAM DB is scratched FIXED
    • Submit at least one job to the CE and wait for its termination, so that the sandbox area is not empty
    • Increment the value of the parameters creamdb_database_version and/or delegationdb_database_version in the file /opt/glite/etc/glite-ce-cream/cream-config.xml.template
    • reconfigure the node with yaim and check whether the sandbox area is empty

  • Bug #47070: [ yaim-cream ] yaim cream module should support remote mysql setup NOT TESTED

  • Bug #47254: Possible problems if the proxy used to talk with CREAM is shorter than 10 minutes FIXED
    • create a voms-proxy whose lifetime is shorter than 10 minutes
    • submit a simple job whose lifetime is shorter than the voms-proxy one and verify its correct termination

  • Bug #47804: Possible problems configuring blah in CREAM-CE for LSF NOT TESTED

  • Bug #48786: Load should be one of the parameter of DISABLE_SUBMISSION_POLICY in CREAM FIXED
    • specify a low load level in the file /opt/glite/bin/glite_cream_load_monitor
    • try to overload the node with the testsuite: cream-test-monitored-submit -r 30 -n 20000 -m 2000 -C 50 -l log4py.conf -j test.jdl -R --sotimeout 60 --vo dteam --valid 02:00 and verify that with high load the submissions are rejected

  • Bug #49497: user proxies on CREAM do not get cleaned up FIXED
    • delegate a proxy whose lifetime is shorter than the parameter delegation_purge_rate of the CREAM configuration file
    • wait for the new proxy cleanup run (at least twice the delegation_purge_rate) and verify that the proxy file has been removed from the directory

  • Bug #50226: yaim-cream-ce should use config_secure_tomcat FIXED
    • install the CE node from scratch
    • verify the state of the trustmanager accessing the URL: https://ce-host:8443/ce-cream/services

  • Bug #50723: CREAM: check for the jobtype is not case insensitive FIXED
    • submit a job specifying the parameter "jobtype=Normal" in the JDL and verify the correct execution of the job

  • Bug #50875: CREAM: reason for cancelled jobs should be reported FIXED
    • submit and cancel a job using the CREAM CLI command and verify that the reason reports "Cancelled by user"
    • submit and cancel a job using the LRMS command (e.g. qdel) and verify that the reason reports "Cancelled by CE admin"

  • Bug #50876: CREAM reports that the proxy expired even when the problem is in detecting the lifetime of the proxy FIXED
    • force a failure for the command grid-proxy-init in the jobwrapper, for example delegating a proxy on the CE, manually renaming the corresponding delegated proxy in the sandbox area and then submitting a job using the given delegation ID.
    • verify that the failure reason reported by the job status contains the message: Problem to detect the lifetime of the proxy

  • Bug #51046: CREAM: DelegProxyInfo info sometimes is wrong FIXED
    • submit a job, wait for its termination and verify the correct lifetime of the proxy

  • Bug #51118: config_cream_glexec doesn't set glexec permissions right FIXED
    • install a CE node from scratch and verify the permissions for /opt/glite/sbin/glexec (6555) and /opt/glite/etc/glexec.conf (640)

  • Bug #51124: catalina.out is clogged with grid-proxy-init warnings FIXED
    • submit a job and check the catalina.out file

  • Bug #51128: lcas-suexec.db on CREAM CE should be named lcas-glexec.db for consistency FIXED
    • install a CE node from scratch
    • verify the existence of the files: /opt/glite/etc/lcas/lcas-glexec.db and /opt/glite/etc/lcmaps/lcmaps-glexec.db

* Bug #51249: [ yaim-cream-ce ] refactor config_cream_db NOT TESTED * Bug #51249: [ yaim-cream-ce ] refactor config_cream_db FIXED * Install the node from scratch and verify all the basic operations of the CREAM service

  • Bug #51310: Wrong event timestamp FIXED
    • run the consumer server (glite-ce-monitor-consumer) on the client machine
    • create a subscription for the topic CREAM_JOBS on the CE specifying the URL of the consumer server above
    • submit a job and verify the validity of the field TIMESTAMP of any event

  • Bug #51311: Wrong event timestamp generated by the CREAM Job Sensor FIXED

  • Bug #51313: CEMon must not notify the expired events CANNOT REPRODUCE

  • Bug #51705: glexec-wrapper.sh should be removed from CREAM RPM FIXED
    • check the content of glite-ce-cream rpm

  • Bug #51706: yaim-cream-ce: remove "lcg" prefix from JOB_MANAGER FIXED
    • change the value of JOB_MANAGER in the siteinfo.def
    • configure the node with YAIM and verify that the resource BDII publishes this new value in the GlueCeUniqueids

  • Bug #51892: Exception when using java.text.DateFormat.parse FIXED
    • try to overload the node with the testsuite: cream-test-monitored-submit -r 30 -n 20000 -m 2000 -C 50 -l log4py.conf -j test.jdl -R --sotimeout 60 --vo dteam --valid 02:00 and verify that with high load the submissions are rejected
    • verify the log of the CREAM service

  • Bug #51928: BLAH crashes if the cerequirements classad attribute is malformed FIXED
    • submit a job specifying a malformed cerequirements parameter
    • verify that the job is executed and the parameter is ignored

  • Bug #51978: CREAM can be slow to start FIXED
    • submit a big bunch of long-lived jobs, for example using cream-test-monitored-submit -r 30 -n 2000 -m 2000 -C 100 --sotimeout 60 -j long.jdl -R <ce_id> where long.jdl is "[executable="/bin/sleep";arguments="3600";]"
    • when all the jobs have been submitted restart the service and verify the startup time.
    • verify in the CREAM and BLAHP logs that the jobs are checked one by one at startup, instead of polling all jobs from a given timestamp

  • Bug #51993: Proxy renewal not very efficient for multiple jobs having the same delegationid NOT TESTED

  • Bug #52020: [ yaim-cream-ce ] Support use of file (besides syslog) for glexec logging NOT TESTED

  • Bug #52050: misleading error message "The problem seems to be related to glexec INVALID
    • The CREAM service does not make use of glexec anymore

  • Bug #52051: CEMon must remove all expired subscriptions on start-up FIXED
    • create a subscription for the topic CREAM_JOBS on the CE with a short lifetime
    • shutdown the service and wait for the expiration of the subscription
    • restart the service and verify that the subscription does not exist anymore in the directory /opt/glite/var/cemonitor/subscription

  • Bug #52052: Sometimes the getInfo() operation does not report the right list of topics FIXED
    • enable or disable the CE sensor removing or adding the corresponding tag in the file /opt/glite/etc/glite-ce-monitor/cemonitor-config.xml
    • wait for cemonitor to reload the configuration (usually 10m)
    • verify the availability of the topic using the command glite-ce-monitor-gettopics

  • Bug #52268: BLAH leaves files in /tmp when CErequirements is set FIXED
    • submit a job specifying a simple CE requirements (e.g. cerequirements="other.GlueHostMainMemoryRAMSize > 2000")
    • verify that, after the execution of the job, in the tmp directory no files ce-req-file-* are left

  • Bug #52577: [ yaim-cream-ce ] create CREAM_GLEXEC_USER_HOME variableNOT TESTED

  • Bug #52651: CREAM file descriptor overuse FIXED
    • try to overload the node with the testsuite: cream-test-monitored-submit -r 30 -n 20000 -m 2000 -C 50 -l log4py.conf -j test.jdl -R --sotimeout 60 --vo dteam --valid 02:00 and verify that with high load the submissions are rejected
    • seek "too many open files" in the CREAM log

  • Bug #52719: Blah doesn't set the 'executable' flag if a local jobwrapper is found NOT TESTED

  • Bug #52942: Missing description for ISB/OSB error in jobwrapper FIXED
    • submit a job with an unreachable host in the inputsandbox or in the outputsandboxbasedesturi parameter
    • verify that the output of the glite-ce-job-status contains the full description of the failure

  • Bug #53124: blparser_master could crash if some variable in blparser.conf are not set NOT TESTED

  • Bug #53459: [CREAM] Provide method to improve the detection of job status changes by ICE NOT TESTED

  • Bug #53499: CREAM job wrapper template should be put outside the jar FIXED
    • check wheter the file /opt/glite/share/webapps/ce-cream.war contains the file WEB-INF/jobwrapper.tpl

  • Bug #54812: lsf_submit.sh job requirement NOT TESTED

  • Bug #54900: [ glite-yaim-cream-ce ] config_cream_tomcat_user should not add tomcat to VO FIXED
    • check the membership of any VO group

  • Bug #54949: Some job can remain in running state when BLParser is restarted for both lsf and pbs NOT TESTED

  • Bug #55078: Possible final state not considered in BLParserPBS and BUpdaterPBS NOT TESTED

  • Bug #55420: Allow admin to purge CREAM jobs in a non terminal status FIXED
    • temporary disconnect any WN from the CE, e.g. shutting down the mom server in a TORQUE installation
    • submit a job
    • on the CE with administrator privileges run the command: /opt/glite/sbin/JobDBAdminPurger.sh -u -p -s 2 as described in the wiki page
    • verify with glite-ce-job-list that the job has been purged from the database
    • verify that the sandbox directory of that job has been removed from /opt/glite/var/cream_sandbox
    • remove manually the job from the batch system and reconnect all the WN

  • Bug #55438: BUpdater problems in updating job state with AssignFinalState for all batch system NOT TESTED

  • Bug #55565: BLAH configuration attribute blah_disable_wn_proxy_renewal fails to disable proxy renewal. FIXED
    • Verified issuing a BLAH_JOB_REFRESH proxy for a running job
    • Moreover the BLAH proxy renewal operation is not used anymore (the proxy on the CE is renewed by CREAM and no more by BLAH)

  • Bug #56075: Job failure reasons missing in the CREAM log file FIXED
    • submit a job with an unreachable host in the inputsandbox or in the outputsandboxbasedesturi parameter
    • verify that in the log file appears the message: failureReason=Cannot move ISB (): error: globus_xio: Unable to connect to xxxx:2811 globus_xio: globus_libc_getaddrinfo failed.globus_common: Name or service not known

  • Bug #56339: [blah] "service glite-ce-blparser restart" does not always work FIXED
    • try the command /opt/glite/etc/init.d/glite-ce-blparser restart and verify the correct behaviour of the script

  • Bug #56367: CREAM RPM depends on C libs FIXED
    • check if the package of glite-ce-cream contains any elf executable

  • Bug #56518: BLAH blparser doesn't start after boot of the machine FIXED
    • install the CE node from scratch specifying the parameter BLPARSER_WITH_UPDATER_NOTIFIER=false in the yaim configuration for creamCE
    • reboot the machine and verify that the blparser_master is running

  • Bug #56697: CREAM logging must be improved when CREAM register operation fails FIXED
    • force the service to fail a register operation, e.g. temporary renaming the sandbox directory
    • verify that the log reports at least the JobID and the reason of the failure

  • Bug #57210: BLAH condor_submit script doesn't recognize certain options. CANNOT REPRODUCE
    • Not possible to test the fix since we don't have CREAM based CEs with Condor as batch system

  • Bug #57307: condor_submit.sh does not support the handling of "local" attributes CANNOT REPRODUCE
    • Not possible to test the fix since we don't have CREAM based CEs with Condor as batch system

  • Bug #57820: [yaim-cream-ce] CREAM-CE publishes GlueServiceDataValue incomplete NOT TESTED

  • Bug #58103: Cream database Query performance FIXED
    • Internal improvement
    • run a set of stress-tests and verify the performance

  • Bug #58109: Wrong value for the "service version" property FIXED
    • verify the property using the command glite-ce-service-info

  • Bug #58119: CREAM CE: publish Production instead of Special as default value for GlueCEStateStatus NOT TESTED

  • Bug #59423: RFE: support for ISB/OSB transfers from/to gridftp servers running using credentials NOT TESTED

  • Bug #58659: NullPointerException from getStatus FIXED
    • try to overload the node with the testsuite: cream-test-monitored-submit -r 30 -n 20000 -m 2000 -C 50 -l log4py.conf -j test.jdl -R --sotimeout 60 --vo dteam --valid 02:00 and verify that the log of the testsuite does not report any NullPointerException

  • Bug #58792: JobRegister fails, because cream_sandbox directory doesn't exist FIXED
    • temporary rename the directory /opt/glite/var/cream_sandbox without turning off the service
    • submit a job and verify that the failure reports "cannot create the job's working directory!"

  • Bug #58941: [yaim-cream-ce] lcmaps confs for glexec and gridftp are not fully synchronized NOT TESTED

  • Bug #59005: Possible problem with hold/resumed jobs in BUpdaterLSF FIXED
    • Verified as reported here

  • Bug #59329: Proxy symlinks left in the registry area until purged FIXED
    • submit a job and verify the verify the existence of the related symlink in the directory /opt/glite/var/blah/user_blah_job_registry.bjr/registry.proxydir
    • when the job terminates verify that the symlink has been removed by blah.

  • Bug #59686: Possible crash of BUpdarePBS due to wrong malloc FIXED
    • Define the parameter pbs_spoolpath in the file /opt/glite/etc/blah.config
    • run the BUpdaterPBS daemon and verify its liveness

  • Bug #59962: Sometimes the CREAM initialization fails with "UserId = ADMINISTRATOR is not enable for that operation"CANNOT REPRODUCE

  • Bug #60831: Error log message: "CREAM_JOB_SENSOR_HOST parameter not specified" FIXED
    • verify that the parameter "CREAM_JOB_SENSOR_HOST" is not defined in the file /opt/glite/etc/glite-ce-cream/cream-config.xml
    • submit several jobs
    • verify that the log of the CREAM service does not report the error above

  • Bug #61322: CREAM jw doesn't set GLITE_WMS_RB_BROKERINFO FIXED
    • submit a job and verify that the jobwrapper script, contained into the sandbox area for that job, defines correctly the __brokerinfo variable

  • Bug #61407: Set CE_ID in the cream jw FIXED
    • submit a job and verify that the jobwrapper script, contained into the sandbox area for that job, defines correctly the CE_ID variable

  • Bug #61493: [ yaim-cream-ce ] glexec_get_account policy order is wrong NOT TESTED

  • Bug #61604: yaim-cream-ce should not install config_gip_software_plugin FIXED
    • verify that the glite-yaim-cream-ce package does not contain the file config_gip_software_plugin but it contains config_cream_gip_software_plugin instead

  • Bug #61730: CREAM jw: GLITE_WMS_LOG_DESTINATION should always be set with the FQDN FIXED
    • submit a job and verify that the jobwrapper script, contained into the sandbox area for that job, defines a FQDN in the __ce_hostname variable

  • Bug #61761: CEMon must guarantee the notification rate FIXED
    • enable the "CE Sensor" plugin
    • create a subscription for the topic published by the sensor above with a running consumer: glite-ce-monitor-subscribe --cert <user_proxy> --key <user_proxy> --topic CE_MONITOR --dialects ISM_CLASSAD_GLUE_1.2 --consumer-url <consumer_url> --rate 10 --duration 600 <cemonitor_url>
    • create on ore more subscriptions to non-existing consumer URL or to a fake blocking one (e.g. using nc -l -p <consumer port>) specifying the same rate as above
    • verify that the notification rate for the first consumer is correct

  • Bug #61790: Problems in CREAM CE when there are "strange" characters in the subject certificate FIXED
    • Verified submitting a job to a Torque CREAM CE with a proxy with subject: /DC=gov/DC=fnal/O=Fermilab/OU=Robots/CN=lcgcaf/CN=cdf/CN=Donatella Lucchesi/CN=UID:lucchesi
    • With the same proxy there were problems before (see https://gus.fzk.de/ws/ticket_info.php?ticket=54767)

  • Bug #62070: Possible problem with notification time in BNotifier HOPEFULLY FIXED
    • Not possible to reproduce it according to the developer (M. Mezzadri)

  • Bug #62207: [ yaim-cream ] Enable Glue 2.0 publishing NOT TESTED

  • Bug #62436: Possible problem with updater if job remain queued too long FIXED
    • Fixed as reported here: 3 jobs lasting 2 hours were submitted to a CREAM CE with only 2 job slots. For the third one the BNotifier logged the right events (i.e. it didn't log status=4 with failurereason=999)

  • Bug #62776: Yaim config for CREAM CE erroneously requires tomcat in glexec group NOT TESTED

  • Bug #62893: Possible proxy renewal problem in the CREAM jw NOT TESTED

  • Bug #63398: CREAM jw: removal of token should be retried in case of failure NOT TESTED

-- AlessioGianelle - 2010-02-05


This topic: EgeeJra1It > CreamTestsP3179
Topic revision: r25 - 2010-02-26 - MassimoSgaravatto
 
This site is powered by the TWiki collaboration platformCopyright © 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback