Testing report: IGIRTC-82

Summary

  • Product: BLAH 1.16.6-3
  • Release Task: Task #30327
  • ETICS Subsystem Configuration Name: emi-blahp_R_1_16_6_3, emi-cream-ce_R_1_13_9_3
  • VCS Tag: glite-ce-blahp_R_1_16_6_3
  • EMI Major Release: EMI 1 (Kebnekaise)
  • Platform: SL5 epel
  • Author: Sergio Traldi
  • Testing report: Testing Report file
  • Certification report: Certification Report file
  • Outcome: "Certified*

Deployment tests

Clean Installation

Upgrade Installation

LSF CE

Unit Tests

Not Available.

System tests

Functionality tests

BLParser test

Old BLParser
Tests.Check Notifications For Normally Finished Jobs

[traldi@cert-25 blah_testing]$ pybot tests/check_notifications_for_normally_finished_jobs.html
.......
Command's output printed.
 dn = /C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Sergio Traldi/CN=proxy
The files of this testsuite will be stored under: /tmp/tmpLGXrZc.cream_testing/
==============================================================================
Check Notifications For Normally Finished Jobs :: Test that notifications a...
==============================================================================
Set Log Level :: Set the log level used for the test suite. This c... | PASS |
------------------------------------------------------------------------------
check_notifications_for_normally_finished_jobs                        | PASS |
------------------------------------------------------------------------------
Check Notifications For Normally Finished Jobs :: Test that notifi... | PASS |
2 critical tests, 2 passed, 0 failed
2 tests total, 2 passed, 0 failed
==============================================================================

Tests.Check Notifications For Cancelled Jobs

[traldi@cert-25 blah_testing]$ pybot tests/check_notifications_for_cancelled_jobs.html 
....

Command's output printed.
 dn = /C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Sergio Traldi/CN=proxy
The files of this testsuite will be stored under: /tmp/tmpPZ9vT7.cream_testing/
==============================================================================
Check Notifications For Cancelled Jobs :: Test that notifications are sent ...
==============================================================================
Set Log Level :: Set the log level used for the test suite. This c... | PASS |
------------------------------------------------------------------------------
check_notifications_for_cancelled_jobs                                | PASS |
------------------------------------------------------------------------------
Check Notifications For Cancelled Jobs :: Test that notifications ... | PASS |
2 critical tests, 2 passed, 0 failed
2 tests total, 2 passed, 0 failed
==============================================================================

Tests.Check Notifications For Suspended Resumed Jobs

[traldi@cert-25 blah_testing]$ pybot tests/check_notifications_for_suspended_resumed_jobs.html 
...
Command's output printed.
 dn = /C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Sergio Traldi/CN=proxy
The files of this testsuite will be stored under: /tmp/tmpdDqobY.cream_testing/
==============================================================================
Check Notifications For Suspended Resumed Jobs :: Test that notifications a...
==============================================================================
Set Log Level :: Set the log level used for the test suite. This c... | PASS |
------------------------------------------------------------------------------
check_notifications_for_suspended_resumed_jobs                        | FAIL |
_error: Expected status should be in ['HELD'] for job https://cream-29.pd.infn.it:8443/CREAM168494002 was actually DONE-OK
------------------------------------------------------------------------------
Check Notifications For Suspended Resumed Jobs :: Test that notifi... | FAIL |
2 critical tests, 1 passed, 1 failed
2 tests total, 1 passed, 1 failed
==============================================================================

  • Job which is suspended and then resumed

New BLParser

Tests.Check Notifications For Normally Finished Jobs

[traldi@cert-25 blah_testing]$ pybot tests/check_notifications_for_normally_finished_jobs.html
.......
Command's output printed.
 dn = /C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Sergio Traldi/CN=proxy
The files of this testsuite will be stored under: /tmp/tmpLwQyCp.cream_testing/
==============================================================================
Check Notifications For Normally Finished Jobs :: Test that notifications a...
==============================================================================
Set Log Level :: Set the log level used for the test suite. This c... | PASS |
------------------------------------------------------------------------------
check_notifications_for_normally_finished_jobs                        | PASS |
------------------------------------------------------------------------------
Check Notifications For Normally Finished Jobs :: Test that notifi... | PASS |
2 critical tests, 2 passed, 0 failed
2 tests total, 2 passed, 0 failed
==============================================================================

Tests.Check Notifications For Cancelled Jobs

[traldi@cert-25 blah_testing]$ pybot tests/check_notifications_for_cancelled_jobs.html 
....
Command's output printed.
 dn = /C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Sergio Traldi/CN=proxy
The files of this testsuite will be stored under: /tmp/tmpddI5WS.cream_testing/
==============================================================================
Check Notifications For Cancelled Jobs :: Test that notifications are sent ...
==============================================================================
Set Log Level :: Set the log level used for the test suite. This c... | PASS |
------------------------------------------------------------------------------
check_notifications_for_cancelled_jobs                                | PASS |
------------------------------------------------------------------------------
Check Notifications For Cancelled Jobs :: Test that notifications ... | PASS |
2 critical tests, 2 passed, 0 failed
2 tests total, 2 passed, 0 failed
==============================================================================

Tests.Check Notifications For Suspended Resumed Jobs

[traldi@cert-25 blah_testing]$ pybot tests/check_notifications_for_suspended_resumed_jobs.html 
...
Command's output printed.
 dn = /C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Sergio Traldi/CN=proxy
The files of this testsuite will be stored under: /tmp/tmpq63eyr.cream_testing/
==============================================================================
Check Notifications For Suspended Resumed Jobs :: Test that notifications a...
==============================================================================
Set Log Level :: Set the log level used for the test suite. This c... | PASS |
------------------------------------------------------------------------------
check_notifications_for_suspended_resumed_jobs                        | PASS |
------------------------------------------------------------------------------
Check Notifications For Suspended Resumed Jobs :: Test that notifi... | PASS |
2 critical tests, 2 passed, 0 failed
2 tests total, 2 passed, 0 failed
==============================================================================

Regression tests

Verification attached bugs

Bug #94712: Due to a timestamp problem bupdater for LSF can leave job in IDLE state FIXED NOT CERTIFIED

Bug #94414: BLParserLSF could crash if a suspend on an idle job is done FIXED

[root@cream-29 ~]# date
Fri Jun 29 15:39:23 CEST 2012

[root@cream-29 ~]# /etc/init.d/glite-ce-blahparser status
BNotifier (pid 12162) is running...
BUpdaterLSF (pid 12167) is running...

[root@cream-29 ~]# date
Fri Jun 29 15:40:06 CEST 2012

[root@cream-29 ~]# /etc/init.d/glite-ce-blahparser status
BNotifier (pid 12162) is running...
BUpdaterLSF (pid 12167) is running...

  • On UI side send 10 jobs and sustpend one in held:
[traldi@cert-25 ~]$ for ((i=0;i<23;i++)) do glite-ce-job-submit -d -r cream-29.pd.infn.it:8443/cream-lsf-cert -a sleep.jdl; done 

[traldi@cert-25 ~]$ glite-ce-job-status https://cream-29.pd.infn.it:8443/CREAM867401632

******  JobID=[https://cream-29.pd.infn.it:8443/CREAM867401632]
        Status        = [IDLE]


[traldi@cert-25 ~]$ glite-ce-job-suspend https://cream-29.pd.infn.it:8443/CREAM867401632

Are you sure you want to suspend specified job(s) [y/n]: y
[traldi@cert-25 ~]$ glite-ce-job-status https://cream-29.pd.infn.it:8443/CREAM867401632

******  JobID=[https://cream-29.pd.infn.it:8443/CREAM867401632]
        Status        = [IDLE]


[traldi@cert-25 ~]$ glite-ce-job-status https://cream-29.pd.infn.it:8443/CREAM867401632

******  JobID=[https://cream-29.pd.infn.it:8443/CREAM867401632]
        Status        = [HELD]


[traldi@cert-25 ~]$ date
Fri Jun 29 15:39:30 CEST 2012

[traldi@cert-25 ~]$ glite-ce-job-status https://cream-29.pd.infn.it:8443/CREAM867401632

******  JobID=[https://cream-29.pd.infn.it:8443/CREAM867401632]
        Status        = [HELD]


[traldi@cert-25 ~]$ glite-ce-job-resume https://cream-29.pd.infn.it:8443/CREAM867401632

Are you sure you want to resume specified job(s) [y/n]: y

[traldi@cert-25 ~]$ date
Fri Jun 29 15:39:51 CEST 2012

[traldi@cert-25 ~]$ glite-ce-job-status https://cream-29.pd.infn.it:8443/CREAM867401632

******  JobID=[https://cream-29.pd.infn.it:8443/CREAM867401632]
        Status        = [DONE-OK]
        ExitCode      = [0]

Bug #94519: Updater for LSF can misidentify killed jobs as finished FIXED not certified

Bug #95392: Heavy usage of 'bjobsinfo' still hurts LSF FIXED not certified

Verification old bugs

Bug #91037: BUpdaterLSF should use bjobs to detect final job state FIXED

  • Change debug_level and restart the services:
[root@cream-29 ~]# sed -i 's/bupdater_debug_level=2/bupdater_debug_level=3/' /etc/blah.config 
[root@cream-29 ~]# mv /var/log/cream/glite-ce-bupdater.log /var/log/cream/glite-ce-bupdater.log.old
[root@cream-29 ~]# service gLite restart
STOPPING SERVICES
*** glite-ce-blahparser:
Shutting down BNotifier:                                   [  OK  ]
Shutting down BUpdaterLSF:                                 [  OK  ]

*** glite-lb-locallogger:
Stopping glite-lb-logd ... done
Stopping glite-lb-interlogd ... done

*** tomcat5:
Stopping tomcat5:                                          [  OK  ]

STARTING SERVICES
*** tomcat5:
Starting tomcat5:                                          [  OK  ]

*** glite-lb-locallogger:
Starting glite-lb-logd ...This is LocalLogger, part of Workload Management System in EU DataGrid & EGEE.
 done
Starting glite-lb-interlogd ...
Message from syslogd@ at Mon Jul  2 09:51:23 2012 ...
cream-29 syslog[29043]: FATAL    CONTROL - Failed to get GSI credentials. Exiting.  
 done

*** glite-ce-blahparser:
Starting BNotifier: /usr/bin/BNotifier: Error creating and binding socket: Address already in use
                                                           [FAILED]
Starting BUpdaterLSF:                                      [  OK  ]

  • Submit a job and wait for its completation:
[traldi@cert-25 ~]$ glite-ce-job-submit -r cream-29.pd.infn.it:8443/cream-lsf-cert -a sleep.jdl
https://cream-29.pd.infn.it:8443/CREAM299057094
[traldi@cert-25 ~]$ glite-ce-job-status https://cream-29.pd.infn.it:8443/CREAM299057094

******  JobID=[https://cream-29.pd.infn.it:8443/CREAM299057094]
        Status        = [REALLY-RUNNING]


[traldi@cert-25 ~]$ glite-ce-job-status https://cream-29.pd.infn.it:8443/CREAM299057094

******  JobID=[https://cream-29.pd.infn.it:8443/CREAM299057094]
        Status        = [DONE-OK]
        ExitCode      = [0]

[root@cream-29 ~]# grep 299057094 /var/log/cream/glite-ce-bnotifier.log
2012-07-02 09:51:49 Sent for Cream:[BatchJobId="668220"; JobStatus=1; ChangeTime="2012-07-02 09:51:46"; ClientJobId="299057094"; BlahJobName="cre29_299057094";]
2012-07-02 09:51:54 Sent for Cream:[BatchJobId="668220"; JobStatus=2; ChangeTime="2012-07-02 09:51:48"; WorkerNode="prod-wn-001"; ClientJobId="299057094"; BlahJobName="cre29_299057094";]
2012-07-02 09:52:09 Sent for Cream:[BatchJobId="668220"; JobStatus=4; ChangeTime="2012-07-02 09:51:58"; JwExitCode=0; Reason="reason=0"; ClientJobId="299057094"; BlahJobName="cre29_299057094";]
  • Verify if bhist has been called:
[root@cream-29 ~]# grep bhist /var/log/cream/glite-ce-bupdater.log
2012-07-02 09:50:49 /usr/bin/BUpdaterLSF: key bupdater_use_bhist_time_constraint not found - using the default:no
2012-07-02 09:50:49 /usr/bin/BUpdaterLSF: key bupdater_use_bhist_for_killed not found - using the default:yes
2012-07-02 09:51:26 /usr/bin/BUpdaterLSF: key bupdater_use_bhist_time_constraint not found - using the default:no
2012-07-02 09:51:26 /usr/bin/BUpdaterLSF: key bupdater_use_bhist_for_killed not found - using the default:yes

Bug #92281: Purge of registry can cause registry corruption FIXED not certified

It is not possible to replicate the problem sistematically.

Bug #92774: BLParserLSF could crash searching in old logs FIXED not certified

Using the old LSF BLParser it usually crashed, so the fact that the functional tests passed should be a good signal that the bug has been fixed.

Bug #89859: There is a memory leak in the updater for LSF, PBS and Condor FIXED

Submit 1000 jobs, one every 3 seconds monitoring the Used RSS memory of the /usr/bin/BUpdaterLSF process:

DatiRSS.png

Test PASSED

-- SergioTraldi - 2012-06-28

Edit | Attach | PDF | History: r7 < r6 < r5 < r4 < r3 | Backlinks | Raw View | More topic actions
Topic revision: r7 - 2012-07-03 - SergioTraldi
 
This site is powered by the TWiki collaboration platformCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback