Difference: Task30327 (1 vs. 7)

Revision 72012-07-03 - SergioTraldi

Line: 1 to 1
 
META TOPICPARENT name="TestingBlah"

Testing report: IGIRTC-82

Line: 8 to 8
 
Changed:
<
<
  • ETICS Subsystem Configuration Name: emi-blahp_R_1_16_6_3
>
>
  • ETICS Subsystem Configuration Name: emi-blahp_R_1_16_6_3, emi-cream-ce_R_1_13_9_3
 
  • VCS Tag: glite-ce-blahp_R_1_16_6_3
  • EMI Major Release: EMI 1 (Kebnekaise)
  • Platform: SL5 epel
Line: 360 to 360
 
META FILEATTACHMENT attachment="lsf_update_confold.txt" attr="h" comment="" date="1340974181" name="lsf_update_confold.txt" path="lsf_update_confold.txt" size="44806" user="SergioTraldi" version="1"
META FILEATTACHMENT attachment="logFile_NormallyFinishedOld.txt" attr="h" comment="" date="1340974678" name="logFile_NormallyFinishedOld.txt" path="logFile_NormallyFinishedOld.txt" size="164969" user="SergioTraldi" version="1"
META FILEATTACHMENT attachment="DatiRSS.png" attr="h" comment="" date="1341239980" name="DatiRSS.png" path="DatiRSS.png" size="33327" user="SergioTraldi" version="1"
Changed:
<
<
META FILEATTACHMENT attachment="BLAH-EMI1Update_Certification_Report_Task30327.txt" attr="h" comment="" date="1341301840" name="BLAH-EMI1Update_Certification_Report_Task30327.txt" path="BLAH-EMI1Update_Certification_Report_Task30327.txt" size="2768" user="SergioTraldi" version="1"
META FILEATTACHMENT attachment="BLAH-EMI1Update_Test_Report_Task30327.txt" attr="h" comment="" date="1341301840" name="BLAH-EMI1Update_Test_Report_Task30327.txt" path="BLAH-EMI1Update_Test_Report_Task30327.txt" size="4415" user="SergioTraldi" version="1"
>
>
META FILEATTACHMENT attachment="BLAH-EMI1Update_Certification_Report_Task30327.txt" attr="h" comment="" date="1341307131" name="BLAH-EMI1Update_Certification_Report_Task30327.txt" path="BLAH-EMI1Update_Certification_Report_Task30327.txt" size="2794" user="SergioTraldi" version="1"
META FILEATTACHMENT attachment="BLAH-EMI1Update_Test_Report_Task30327.txt" attr="h" comment="" date="1341307131" name="BLAH-EMI1Update_Test_Report_Task30327.txt" path="BLAH-EMI1Update_Test_Report_Task30327.txt" size="4441" user="SergioTraldi" version="1"

Revision 62012-07-03 - SergioTraldi

Line: 1 to 1
 
META TOPICPARENT name="TestingBlah"

Testing report: IGIRTC-82

Line: 8 to 8
 
Changed:
<
<
  • ETICS Subsystem Configuration Name: emi-cream-ce_R_1_13_9_3
  • VCS Tag: -, -
>
>
  • ETICS Subsystem Configuration Name: emi-blahp_R_1_16_6_3
  • VCS Tag: glite-ce-blahp_R_1_16_6_3
 
  • EMI Major Release: EMI 1 (Kebnekaise)
  • Platform: SL5 epel
  • Author: Sergio Traldi
Line: 360 to 360
 
META FILEATTACHMENT attachment="lsf_update_confold.txt" attr="h" comment="" date="1340974181" name="lsf_update_confold.txt" path="lsf_update_confold.txt" size="44806" user="SergioTraldi" version="1"
META FILEATTACHMENT attachment="logFile_NormallyFinishedOld.txt" attr="h" comment="" date="1340974678" name="logFile_NormallyFinishedOld.txt" path="logFile_NormallyFinishedOld.txt" size="164969" user="SergioTraldi" version="1"
META FILEATTACHMENT attachment="DatiRSS.png" attr="h" comment="" date="1341239980" name="DatiRSS.png" path="DatiRSS.png" size="33327" user="SergioTraldi" version="1"
Changed:
<
<
META FILEATTACHMENT attachment="BLAH-EMI1Update_Certification_Report_Task30327.txt" attr="h" comment="" date="1341241318" name="BLAH-EMI1Update_Certification_Report_Task30327.txt" path="BLAH-EMI1Update_Certification_Report_Task30327.txt" size="2796" user="SergioTraldi" version="1"
META FILEATTACHMENT attachment="BLAH-EMI1Update_Test_Report_Task30327.txt" attr="h" comment="" date="1341241318" name="BLAH-EMI1Update_Test_Report_Task30327.txt" path="BLAH-EMI1Update_Test_Report_Task30327.txt" size="4375" user="SergioTraldi" version="1"
>
>
META FILEATTACHMENT attachment="BLAH-EMI1Update_Certification_Report_Task30327.txt" attr="h" comment="" date="1341301840" name="BLAH-EMI1Update_Certification_Report_Task30327.txt" path="BLAH-EMI1Update_Certification_Report_Task30327.txt" size="2768" user="SergioTraldi" version="1"
META FILEATTACHMENT attachment="BLAH-EMI1Update_Test_Report_Task30327.txt" attr="h" comment="" date="1341301840" name="BLAH-EMI1Update_Test_Report_Task30327.txt" path="BLAH-EMI1Update_Test_Report_Task30327.txt" size="4415" user="SergioTraldi" version="1"

Revision 52012-07-02 - SergioTraldi

Line: 1 to 1
 
META TOPICPARENT name="TestingBlah"

Testing report: IGIRTC-82

Line: 13 to 13
 
  • EMI Major Release: EMI 1 (Kebnekaise)
  • Platform: SL5 epel
  • Author: Sergio Traldi
Changed:
<
<
  • Testing report:
  • Certification report:
>
>
 
  • Outcome: "Certified*

Deployment tests

Line: 337 to 337
  Using the old LSF BLParser it usually crashed, so the fact that the functional tests passed should be a good signal that the bug has been fixed.

Added:
>
>
Bug #89859: There is a memory leak in the updater for LSF, PBS and Condor FIXED

Submit 1000 jobs, one every 3 seconds monitoring the Used RSS memory of the /usr/bin/BUpdaterLSF process:

DatiRSS.png

Test PASSED

 

-- SergioTraldi - 2012-06-28

Line: 352 to 359
 
META FILEATTACHMENT attachment="lsf_update_confnew.txt" attr="h" comment="" date="1340974181" name="lsf_update_confnew.txt" path="lsf_update_confnew.txt" size="44841" user="SergioTraldi" version="1"
META FILEATTACHMENT attachment="lsf_update_confold.txt" attr="h" comment="" date="1340974181" name="lsf_update_confold.txt" path="lsf_update_confold.txt" size="44806" user="SergioTraldi" version="1"
META FILEATTACHMENT attachment="logFile_NormallyFinishedOld.txt" attr="h" comment="" date="1340974678" name="logFile_NormallyFinishedOld.txt" path="logFile_NormallyFinishedOld.txt" size="164969" user="SergioTraldi" version="1"
Added:
>
>
META FILEATTACHMENT attachment="DatiRSS.png" attr="h" comment="" date="1341239980" name="DatiRSS.png" path="DatiRSS.png" size="33327" user="SergioTraldi" version="1"
META FILEATTACHMENT attachment="BLAH-EMI1Update_Certification_Report_Task30327.txt" attr="h" comment="" date="1341241318" name="BLAH-EMI1Update_Certification_Report_Task30327.txt" path="BLAH-EMI1Update_Certification_Report_Task30327.txt" size="2796" user="SergioTraldi" version="1"
META FILEATTACHMENT attachment="BLAH-EMI1Update_Test_Report_Task30327.txt" attr="h" comment="" date="1341241318" name="BLAH-EMI1Update_Test_Report_Task30327.txt" path="BLAH-EMI1Update_Test_Report_Task30327.txt" size="4375" user="SergioTraldi" version="1"

Revision 42012-07-02 - SergioTraldi

Line: 1 to 1
 
META TOPICPARENT name="TestingBlah"

Testing report: IGIRTC-82

Line: 260 to 260
 
Bug #95392: Heavy usage of 'bjobsinfo' still hurts LSF FIXED not certified
Added:
>
>

Verification old bugs

Bug #91037: BUpdaterLSF should use bjobs to detect final job state FIXED

  • Change debug_level and restart the services:
[root@cream-29 ~]# sed -i 's/bupdater_debug_level=2/bupdater_debug_level=3/' /etc/blah.config 
[root@cream-29 ~]# mv /var/log/cream/glite-ce-bupdater.log /var/log/cream/glite-ce-bupdater.log.old
[root@cream-29 ~]# service gLite restart
STOPPING SERVICES
*** glite-ce-blahparser:
Shutting down BNotifier:                                   [  OK  ]
Shutting down BUpdaterLSF:                                 [  OK  ]

*** glite-lb-locallogger:
Stopping glite-lb-logd ... done
Stopping glite-lb-interlogd ... done

*** tomcat5:
Stopping tomcat5:                                          [  OK  ]

STARTING SERVICES
*** tomcat5:
Starting tomcat5:                                          [  OK  ]

*** glite-lb-locallogger:
Starting glite-lb-logd ...This is LocalLogger, part of Workload Management System in EU DataGrid & EGEE.
 done
Starting glite-lb-interlogd ...
Message from syslogd@ at Mon Jul  2 09:51:23 2012 ...
cream-29 syslog[29043]: FATAL    CONTROL - Failed to get GSI credentials. Exiting.  
 done

*** glite-ce-blahparser:
Starting BNotifier: /usr/bin/BNotifier: Error creating and binding socket: Address already in use
                                                           [FAILED]
Starting BUpdaterLSF:                                      [  OK  ]

  • Submit a job and wait for its completation:
[traldi@cert-25 ~]$ glite-ce-job-submit -r cream-29.pd.infn.it:8443/cream-lsf-cert -a sleep.jdl
https://cream-29.pd.infn.it:8443/CREAM299057094
[traldi@cert-25 ~]$ glite-ce-job-status https://cream-29.pd.infn.it:8443/CREAM299057094

******  JobID=[https://cream-29.pd.infn.it:8443/CREAM299057094]
        Status        = [REALLY-RUNNING]


[traldi@cert-25 ~]$ glite-ce-job-status https://cream-29.pd.infn.it:8443/CREAM299057094

******  JobID=[https://cream-29.pd.infn.it:8443/CREAM299057094]
        Status        = [DONE-OK]
        ExitCode      = [0]

[root@cream-29 ~]# grep 299057094 /var/log/cream/glite-ce-bnotifier.log
2012-07-02 09:51:49 Sent for Cream:[BatchJobId="668220"; JobStatus=1; ChangeTime="2012-07-02 09:51:46"; ClientJobId="299057094"; BlahJobName="cre29_299057094";]
2012-07-02 09:51:54 Sent for Cream:[BatchJobId="668220"; JobStatus=2; ChangeTime="2012-07-02 09:51:48"; WorkerNode="prod-wn-001"; ClientJobId="299057094"; BlahJobName="cre29_299057094";]
2012-07-02 09:52:09 Sent for Cream:[BatchJobId="668220"; JobStatus=4; ChangeTime="2012-07-02 09:51:58"; JwExitCode=0; Reason="reason=0"; ClientJobId="299057094"; BlahJobName="cre29_299057094";]
  • Verify if bhist has been called:
[root@cream-29 ~]# grep bhist /var/log/cream/glite-ce-bupdater.log
2012-07-02 09:50:49 /usr/bin/BUpdaterLSF: key bupdater_use_bhist_time_constraint not found - using the default:no
2012-07-02 09:50:49 /usr/bin/BUpdaterLSF: key bupdater_use_bhist_for_killed not found - using the default:yes
2012-07-02 09:51:26 /usr/bin/BUpdaterLSF: key bupdater_use_bhist_time_constraint not found - using the default:no
2012-07-02 09:51:26 /usr/bin/BUpdaterLSF: key bupdater_use_bhist_for_killed not found - using the default:yes

Bug #92281: Purge of registry can cause registry corruption FIXED not certified

It is not possible to replicate the problem sistematically.

Bug #92774: BLParserLSF could crash searching in old logs FIXED not certified

Using the old LSF BLParser it usually crashed, so the fact that the functional tests passed should be a good signal that the bug has been fixed.

  -- SergioTraldi - 2012-06-28

Revision 32012-06-29 - SergioTraldi

Line: 1 to 1
 
META TOPICPARENT name="TestingBlah"

Testing report: IGIRTC-82

Line: 41 to 41
 

Functionality tests

Deleted:
<
<

Test submission

  • Test result for LSF is available here PASSED
 

BLParser test

Old BLParser
Added:
>
>
Tests.Check Notifications For Normally Finished Jobs
 
Deleted:
<
<
 
Changed:
<
<
Tests.Check Notifications For Cancelled Jobs :: Test that notifications are... ========================================================================== Set Log Level :: Set the log level used for the test suite. This c... | PASS |
check_notifications_for_cancelled_jobs | PASS |
Tests.Check Notifications For Cancelled Jobs :: Test that notifica... | PASS | 2 critical tests, 2 passed, 0 failed 2 tests total, 2 passed, 0 failed
>
>
[traldi@cert-25 blah_testing]$ pybot tests/check_notifications_for_normally_finished_jobs.html ....... Command's output printed. dn = /C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Sergio Traldi/CN=proxy The files of this testsuite will be stored under: /tmp/tmpLGXrZc.cream_testing/
 ==========================================================================
Changed:
<
<
Tests.Check Notifications For Normally Finished Jobs :: Test that notificat...
>
>
Check Notifications For Normally Finished Jobs :: Test that notifications a...
 ========================================================================== Set Log Level :: Set the log level used for the test suite. This c... | PASS |
check_notifications_for_normally_finished_jobs | PASS |
Changed:
<
<
Tests.Check Notifications For Normally Finished Jobs :: Test that ... | PASS |
>
>
Check Notifications For Normally Finished Jobs :: Test that notifi... | PASS |
 2 critical tests, 2 passed, 0 failed 2 tests total, 2 passed, 0 failed ==========================================================================
Deleted:
<
<
Tests.Check Notifications For Suspended Resumed Jobs :: Test that notificat... ========================================================================== Set Log Level :: Set the log level used for the test suite. This c... | PASS |
check_notifications_for_suspended_resumed_jobs | FAIL | _error: Expected status should be in ['HELD'] for job https://cream-29.pd.infn.it:8443/CREAM705156955 was actually IDLE
Tests.Check Notifications For Suspended Resumed Jobs :: Test that ... | FAIL | 2 critical tests, 1 passed, 1 failed 2 tests total, 1 passed, 1 failed ========================================================================== Tests | FAIL | 6 critical tests, 5 passed, 1 failed 6 tests total, 5 passed, 1 failed ========================================================================== Output: /home/ale/blah/italiangrid-cream_blah_testsuites-09156c3_ver2/output.xml Log: /home/ale/blah/italiangrid-cream_blah_testsuites-09156c3_ver2/log.html Report: /home/ale/blah/italiangrid-cream_blah_testsuites-09156c3_ver2/report.html
 
Changed:
<
<

>
>
Tests.Check Notifications For Cancelled Jobs
 
Changed:
<
<
Tests.Check Notifications For Cancelled Jobs :: Test that notifications are...
>
>
[traldi@cert-25 blah_testing]$ pybot tests/check_notifications_for_cancelled_jobs.html ....

Command's output printed. dn = /C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Sergio Traldi/CN=proxy The files of this testsuite will be stored under: /tmp/tmpPZ9vT7.cream_testing/ ========================================================================== Check Notifications For Cancelled Jobs :: Test that notifications are sent ...

 ========================================================================== Set Log Level :: Set the log level used for the test suite. This c... | PASS |
check_notifications_for_cancelled_jobs | PASS |
Changed:
<
<
Tests.Check Notifications For Cancelled Jobs :: Test that notifica... | PASS |
>
>
Check Notifications For Cancelled Jobs :: Test that notifications ... | PASS |
 2 critical tests, 2 passed, 0 failed 2 tests total, 2 passed, 0 failed ==========================================================================
Changed:
<
<
Tests.Check Notifications For Normally Finished Jobs :: Test that notificat... ========================================================================== Set Log Level :: Set the log level used for the test suite. This c... | PASS |
check_notifications_for_normally_finished_jobs | PASS |
Tests.Check Notifications For Normally Finished Jobs :: Test that ... | PASS | 2 critical tests, 2 passed, 0 failed 2 tests total, 2 passed, 0 failed
>
>

Tests.Check Notifications For Suspended Resumed Jobs

[traldi@cert-25 blah_testing]$ pybot tests/check_notifications_for_suspended_resumed_jobs.html 
...
Command's output printed.
 dn = /C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Sergio Traldi/CN=proxy
The files of this testsuite will be stored under: /tmp/tmpdDqobY.cream_testing/
 ==========================================================================
Changed:
<
<
Tests.Check Notifications For Suspended Resumed Jobs :: Test that notificat...
>
>
Check Notifications For Suspended Resumed Jobs :: Test that notifications a...
 ========================================================================== Set Log Level :: Set the log level used for the test suite. This c... | PASS |
check_notifications_for_suspended_resumed_jobs | FAIL |
Changed:
<
<
_error: Expected status should be in ['HELD'] for job https://cream-41.pd.infn.it:8443/CREAM324312963 was actually DONE-OK
>
>
_error: Expected status should be in ['HELD'] for job https://cream-29.pd.infn.it:8443/CREAM168494002 was actually DONE-OK
 
Changed:
<
<
Tests.Check Notifications For Suspended Resumed Jobs :: Test that ... | FAIL |
>
>
Check Notifications For Suspended Resumed Jobs :: Test that notifi... | FAIL |
 2 critical tests, 1 passed, 1 failed 2 tests total, 1 passed, 1 failed ==========================================================================
Deleted:
<
<
Tests | FAIL | 6 critical tests, 5 passed, 1 failed 6 tests total, 5 passed, 1 failed ==========================================================================
 
Added:
>
>
 
  • Job which is suspended and then resumed
Deleted:
<
<
 

New BLParser
Changed:
<
<
>
>
Tests.Check Notifications For Normally Finished Jobs
 
Changed:
<
<
Tests.Check Notifications For Cancelled Jobs :: Test that notifications are...
>
>
[traldi@cert-25 blah_testing]$ pybot tests/check_notifications_for_normally_finished_jobs.html ....... Command's output printed. dn = /C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Sergio Traldi/CN=proxy The files of this testsuite will be stored under: /tmp/tmpLwQyCp.cream_testing/
 ==========================================================================
Changed:
<
<
Set Log Level :: Set the log level used for the test suite. This c... | PASS |
check_notifications_for_cancelled_jobs | PASS |
Tests.Check Notifications For Cancelled Jobs :: Test that notifica... | PASS | 2 critical tests, 2 passed, 0 failed 2 tests total, 2 passed, 0 failed ========================================================================== Tests.Check Notifications For Normally Finished Jobs :: Test that notificat...
>
>
Check Notifications For Normally Finished Jobs :: Test that notifications a...
 ========================================================================== Set Log Level :: Set the log level used for the test suite. This c... | PASS |
check_notifications_for_normally_finished_jobs | PASS |
Changed:
<
<
Tests.Check Notifications For Normally Finished Jobs :: Test that ... | PASS | 2 critical tests, 2 passed, 0 failed 2 tests total, 2 passed, 0 failed ========================================================================== Tests.Check Notifications For Suspended Resumed Jobs :: Test that notificat... ========================================================================== Set Log Level :: Set the log level used for the test suite. This c... | PASS |
check_notifications_for_suspended_resumed_jobs | PASS |
Tests.Check Notifications For Suspended Resumed Jobs :: Test that ... | PASS |
>
>
Check Notifications For Normally Finished Jobs :: Test that notifi... | PASS |
 2 critical tests, 2 passed, 0 failed 2 tests total, 2 passed, 0 failed ==========================================================================
Deleted:
<
<
Tests | PASS | 6 critical tests, 6 passed, 0 failed 6 tests total, 6 passed, 0 failed ==========================================================================
 
Changed:
<
<
>
>
Tests.Check Notifications For Cancelled Jobs
 
Changed:
<
<
Tests.Check Notifications For Cancelled Jobs :: Test that notifications are...
>
>
[traldi@cert-25 blah_testing]$ pybot tests/check_notifications_for_cancelled_jobs.html .... Command's output printed. dn = /C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Sergio Traldi/CN=proxy The files of this testsuite will be stored under: /tmp/tmpddI5WS.cream_testing/ ========================================================================== Check Notifications For Cancelled Jobs :: Test that notifications are sent ...
 ========================================================================== Set Log Level :: Set the log level used for the test suite. This c... | PASS |
check_notifications_for_cancelled_jobs | PASS |
Changed:
<
<
Tests.Check Notifications For Cancelled Jobs :: Test that notifica... | PASS |
>
>
Check Notifications For Cancelled Jobs :: Test that notifications ... | PASS |
 2 critical tests, 2 passed, 0 failed 2 tests total, 2 passed, 0 failed ==========================================================================
Changed:
<
<
Tests.Check Notifications For Normally Finished Jobs :: Test that notificat... ========================================================================== Set Log Level :: Set the log level used for the test suite. This c... | PASS |
check_notifications_for_normally_finished_jobs | PASS |
Tests.Check Notifications For Normally Finished Jobs :: Test that ... | PASS | 2 critical tests, 2 passed, 0 failed 2 tests total, 2 passed, 0 failed
>
>

Tests.Check Notifications For Suspended Resumed Jobs

[traldi@cert-25 blah_testing]$ pybot tests/check_notifications_for_suspended_resumed_jobs.html 
...
Command's output printed.
 dn = /C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Sergio Traldi/CN=proxy
The files of this testsuite will be stored under: /tmp/tmpq63eyr.cream_testing/
 ==========================================================================
Changed:
<
<
Tests.Check Notifications For Suspended Resumed Jobs :: Test that notificat...
>
>
Check Notifications For Suspended Resumed Jobs :: Test that notifications a...
 ========================================================================== Set Log Level :: Set the log level used for the test suite. This c... | PASS |
check_notifications_for_suspended_resumed_jobs | PASS |
Changed:
<
<
Tests.Check Notifications For Suspended Resumed Jobs :: Test that ... | PASS |
>
>
Check Notifications For Suspended Resumed Jobs :: Test that notifi... | PASS |
 2 critical tests, 2 passed, 0 failed 2 tests total, 2 passed, 0 failed ==========================================================================
Deleted:
<
<
Tests | PASS | 6 critical tests, 6 passed, 0 failed 6 tests total, 6 passed, 0 failed ==========================================================================
 
Added:
>
>
 

Regression tests

Verification attached bugs

Changed:
<
<
Bug #89527: BLAHP produced -W stage(in/out) directives are incompatible with Torque 2.5.8 FIXED

Content of file to check here.

Bug #91037: BUpdaterLSF should use bjobs to detect final job state FIXED

  • Change debug_level and restart the services:
[root@cream-29 ~]# sed -i 's/bupdater_debug_level=2/bupdater_debug_level=3/' /etc/blah.config 
[root@cream-29 ~]# mv /var/log/cream/glite-ce-bupdater.log /var/log/cream/glite-ce-bupdater.log.old
[root@cream-29 ~]# service gLite restart
STOPPING SERVICES
*** glite-ce-blahparser:
Shutting down BNotifier:                                   [  OK  ]
Shutting down BUpdaterLSF:                                 [  OK  ]

*** glite-lb-locallogger:
Stopping glite-lb-logd ... done
Stopping glite-lb-interlogd ... done

*** tomcat5:
Stopping tomcat5:                                          [  OK  ]

STARTING SERVICES
*** tomcat5:
Starting tomcat5:                                          [  OK  ]

*** glite-lb-locallogger:
Starting glite-lb-logd ...This is LocalLogger, part of Workload Management System in EU DataGrid & EGEE.
 done
Starting glite-lb-interlogd ... done

*** glite-ce-blahparser:
Starting BNotifier: /usr/bin/BNotifier: Error creating and binding socket: Address already in use
                                                           [FAILED]
Starting BUpdaterLSF:                                      [  OK  ]
  • Submit a job and wait for its completation:
[ale@cream-12 UI]$ glite-ce-job-submit -a -r cream-29.pd.infn.it:8443/cream-lsf-cert cream.jdl
https://cream-29.pd.infn.it:8443/CREAM239301025
[ale@cream-12 UI]$ glite-ce-job-status https://cream-29.pd.infn.it:8443/CREAM239301025

******  JobID=[https://cream-29.pd.infn.it:8443/CREAM239301025]
   Status        = [DONE-OK]
   ExitCode      = [0]

[root@cream-29 ~]# grep 239301025 /var/log/cream/glite-ce-bnotifier.log
2012-03-22 17:05:36 Sent for Cream:[BatchJobId="622199"; JobStatus=4; ChangeTime="2012-03-22 17:05:22"; JwExitCode=0; Reason="reason=0"; ClientJobId="239301025"; BlahJobName="cre29_239301025";]
  • Verify if bhist has been called:
[root@cream-29 ~]# grep bhist /var/log/cream/glite-ce-bupdater.log
2012-03-22 17:00:43 /usr/bin/BUpdaterLSF: key bupdater_use_bhist_time_constraint not found - using the default:no
2012-03-22 17:00:43 /usr/bin/BUpdaterLSF: key bupdater_use_bhist_for_killed not found - using the default:no
2012-03-22 17:01:26 /usr/bin/BUpdaterLSF: key bupdater_use_bhist_time_constraint not found - using the default:no
2012-03-22 17:01:26 /usr/bin/BUpdaterLSF: key bupdater_use_bhist_for_killed not found - using the default:no
2012-03-22 17:03:54 /usr/bin/BUpdaterLSF: key bupdater_use_bhist_time_constraint not found - using the default:no
2012-03-22 17:03:54 /usr/bin/BUpdaterLSF: key bupdater_use_bhist_for_killed not found - using the default:no
2012-03-22 17:04:51 /usr/bin/BUpdaterLSF: key bupdater_use_bhist_time_constraint not found - using the default:no
2012-03-22 17:04:51 /usr/bin/BUpdaterLSF: key bupdater_use_bhist_for_killed not found - using the default:no

Bug #92281: Purge of registry can cause registry corruption FIXED not certified

It is not possible to replicate the problem sistematically.

Bug #92774: BLParserLSF could crash searching in old logs FIXED not certified

Using the old LSF BLParser it usually crashed, so the fact that the functional tests passed should be a good signal that the bug has been fixed.

Verification old bugs

Submitted 5000 jobs to a CREAM CE configured using the new blparser, and with job_registry_use_mmap=yes.

Monitored the used RSS of the blahpd processes. At the end the maximum value between all the process is 4560.

Test PASSED

>
>
Bug #94712: Due to a timestamp problem bupdater for LSF can leave job in IDLE state FIXED NOT CERTIFIED
 
Deleted:
<
<
 
Changed:
<
<
Configure /etc/blah.config:
[root@cream-29 ~]# tail -4 /etc/blah.config
# Verify fix for bug #77776
lsf_batch_caching_enabled=yes
batch_command_caching_filter=/usr/bin/runcmd.pl

Where runcmd.pl is:

#!/usr/bin/perl
#---------------------#
#  PROGRAM:  argv.pl  #
#---------------------#

$numArgs = $#ARGV + 1;
open (MYFILE, '>>/tmp/xyz');
foreach $argnum (0 .. $#ARGV) {
    print MYFILE "$ARGV[$argnum] ";
}
print MYFILE "\n";
close (MYFILE); 

Restart the services and submit 10 jobs to the CE.

>
>
Bug #94414: BLParserLSF could crash if a suspend on an idle job is done FIXED
 
Added:
>
>
 
Changed:
<
<
[root@cream-29 cream]# cat /tmp/xyz /opt/lsf/7.0/linux2.6-glibc2.3-x86_64/bin/bjobs -u all -l -a /opt/lsf/7.0/linux2.6-glibc2.3-x86_64/bin/bhist -u all -d -l -n 10 /opt/lsf/7.0/linux2.6-glibc2.3-x86_64/bin/bjobs -u all -l -a /opt/lsf/7.0/linux2.6-glibc2.3-x86_64/bin/bhist -u all -d -l -n 10 /opt/lsf/7.0/linux2.6-glibc2.3-x86_64/bin/bjobs -u all -l -a /opt/lsf/7.0/linux2.6-glibc2.3-x86_64/bin/bhist -u all -d -l -n 10 /opt/lsf/7.0/linux2.6-glibc2.3-x86_64/bin/bjobs -u all -l -a /opt/lsf/7.0/linux2.6-glibc2.3-x86_64/bin/bhist -u all -d -l -n 10

Test PASSED

>
>
[root@cream-29 ~]# date Fri Jun 29 15:39:23 CEST 2012
 
Added:
>
>
[root@cream-29 ~]# /etc/init.d/glite-ce-blahparser status BNotifier (pid 12162) is running... BUpdaterLSF (pid 12167) is running...
 
Changed:
<
<
>
>
[root@cream-29 ~]# date Fri Jun 29 15:40:06 CEST 2012
 
Changed:
<
<
[root@cream-29 cream]# ls -l /var/blah
total 8
-rw-r--r-- 1 tomcat tomcat    0 Mar 23 15:49 blah_bnotifier.pid
-rw-r--r-- 1 tomcat tomcat    4 Mar 23 15:53 blah_bupdater.pid
drwxrwx--t 4 tomcat tomcat 4096 Mar 23 16:07 user_blah_job_registry.bjr
[root@cream-29 cream]#  ls -l /var/blah/user_blah_job_registry.bjr/
total 14388
-rw-rw-r-- 1 tomcat tomcat 11377096 Mar 23 17:04 registry
-rw-r--r-- 1 tomcat tomcat  3066960 Mar 23 16:07 registry.by_blah_index
-rw-rw-rw- 1 tomcat tomcat        0 Mar 23 17:04 registry.locktest
drwxrwx-wt 2 tomcat tomcat     4096 Mar 23 17:04 registry.npudir
drwxrwx-wt 2 tomcat tomcat   253952 Mar 23 17:04 registry.proxydir
-rw-r--r-- 1 tomcat tomcat       99 Mar 23 15:49 registry.subjectlist
[root@cream-29 cream]# ls -l /var/blah/user_blah_job_registry.bjr/registry.npudir
total 8
-rw-rw-r-- 1 dteam017 dteam 856 Mar 23 17:04 npu_d4X1Ao
-rw-rw-r-- 1 dteam017 dteam 856 Mar 23 17:04 npu_jZXJGF
[root@cream-29 cream]# ls -l /var/blah/user_blah_job_registry.bjr/registry.proxydir/ | head -10
total 2448
lrwxrwxrwx 1 dteam017 dteam 198 Mar 23 16:41 proxy_637551_RUGJ92 -> /var/glite/cream_sandbox/dteam/_C_IT_O_INFN_OU_Personal_Certificate_L_Padova_CN_Alessio_Gianelle_dteam_Role_NULL_Capability_NULL_dteam017/proxy/f9099a3af228b82c323e26c1f9f494aefdd1b43910396930026260
lrwxrwxrwx 1 dteam017 dteam 198 Mar 23 16:41 proxy_637552_LrF5tC -> /var/glite/cream_sandbox/dteam/_C_IT_O_INFN_OU_Personal_Certificate_L_Padova_CN_Alessio_Gianelle_dteam_Role_NULL_Capability_NULL_dteam017/proxy/23a5e3de27a7d82748cabae2d5dccec329853fc610396930026260
lrwxrwxrwx 1 dteam017 dteam 198 Mar 23 16:41 proxy_637553_WvebYm -> /var/glite/cream_sandbox/dteam/_C_IT_O_INFN_OU_Personal_Certificate_L_Padova_CN_Alessio_Gianelle_dteam_Role_NULL_Capability_NULL_dteam017/proxy/751c2ef4ec29986f4bf63a602fbe626d7d1b3cab10396930026260
lrwxrwxrwx 1 dteam017 dteam 198 Mar 23 16:41 proxy_637554_4vdiJs -> /var/glite/cream_sandbox/dteam/_C_IT_O_INFN_OU_Personal_Certificate_L_Padova_CN_Alessio_Gianelle_dteam_Role_NULL_Capability_NULL_dteam017/proxy/4966974580d533d4d42a4635158382566ec6c2f610396930026260
lrwxrwxrwx 1 dteam017 dteam 198 Mar 23 16:41 proxy_637555_QX5Bi6 -> /var/glite/cream_sandbox/dteam/_C_IT_O_INFN_OU_Personal_Certificate_L_Padova_CN_Alessio_Gianelle_dteam_Role_NULL_Capability_NULL_dteam017/proxy/785cbd414da9ee5e79b633477b59a86071dcf50f10396930026260
lrwxrwxrwx 1 dteam017 dteam 198 Mar 23 16:41 proxy_637556_jTHBm1 -> /var/glite/cream_sandbox/dteam/_C_IT_O_INFN_OU_Personal_Certificate_L_Padova_CN_Alessio_Gianelle_dteam_Role_NULL_Capability_NULL_dteam017/proxy/df2c50573906838f8894b361c2a14860c3e4f16210396930026260
lrwxrwxrwx 1 dteam017 dteam 198 Mar 23 16:41 proxy_637557_DbWTbB -> /var/glite/cream_sandbox/dteam/_C_IT_O_INFN_OU_Personal_Certificate_L_Padova_CN_Alessio_Gianelle_dteam_Role_NULL_Capability_NULL_dteam017/proxy/6fd59e85358187345a4635505c9af7fe32ec61a310396930026260
lrwxrwxrwx 1 dteam017 dteam 198 Mar 23 16:41 proxy_637558_hhAHKX -> /var/glite/cream_sandbox/dteam/_C_IT_O_INFN_OU_Personal_Certificate_L_Padova_CN_Alessio_Gianelle_dteam_Role_NULL_Capability_NULL_dteam017/proxy/ced55ce67190ef6b2d8c56907926aed2f1a4ddfd10396930026260
lrwxrwxrwx 1 dteam017 dteam 198 Mar 23 16:41 proxy_637559_WjeEfP -> /var/glite/cream_sandbox/dteam/_C_IT_O_INFN_OU_Personal_Certificate_L_Padova_CN_Alessio_Gianelle_dteam_Role_NULL_Capability_NULL_dteam017/proxy/1d197ae0e6d8681258517525f8fc568b56bdf17810396930026260
>
>
[root@cream-29 ~]# /etc/init.d/glite-ce-blahparser status BNotifier (pid 12162) is running... BUpdaterLSF (pid 12167) is running...
 
Changed:
<
<
Test PASSED

>
>
  • On UI side send 10 jobs and sustpend one in held:
 
Changed:
<
<
[root@cream-29 cream]# su - tomcat -sh-3.2$ chown tomcat.tomcat /tmp/proxy -sh-3.2$ ls -l /tmp/proxy -rw------- 1 tomcat tomcat 6501 Mar 26 10:52 /tmp/proxy

-sh-3.2$ /usr/bin/blahpd $GahpVersion: 1.16.5 Mar 31 2008 INFN\ blahpd\ (poly,new_esc_format) $ BLAH_SET_SUDO_ID dteam001 S Sudo\ mode\ on blah_job_submit 1 [cmd="/bin/cp";Args="fstab\ fstab.out";TransferInput="/home/dteam001/dir1/fstab";TransferOutput="fstab.out";TransferOutputRemaps="fstab.out=/home/dteam001/dir1/fstab.out";gridtype="lsf";queue="cert";x509userproxy="/tmp/proxy"] S results S 1 1 0 No\ error lsf/20120326/642806 Connection closed by remote host

[root@cream-29 cream]# ls -l /home/dteam001/dir1/ total 8 -rw-r--r-- 1 dteam001 dteam 618 Mar 23 17:11 fstab -rw-r--r-- 1 dteam001 dteam 618 Mar 26 10:54 fstab.out

Test PASSED

>
>
[traldi@cert-25 ~]$ for ((i=0;i<23;i++)) do glite-ce-job-submit -d -r cream-29.pd.infn.it:8443/cream-lsf-cert -a sleep.jdl; done
 
Changed:
<
<
>
>
[traldi@cert-25 ~]$ glite-ce-job-status https://cream-29.pd.infn.it:8443/CREAM867401632
 
Added:
>
>
**** JobID=[https://cream-29.pd.infn.it:8443/CREAM867401632] Status = [IDLE]
 
Deleted:
<
<
Content of report file here.
==============================================================================
Bug 81824 :: Regression test of bug #81824 yaim-cream-ce should manage the ...
==============================================================================
Set Log Level :: Set the log level used for the test suite. This c... | PASS |
------------------------------------------------------------------------------
bug_81824                                                             | PASS |
------------------------------------------------------------------------------
Bug 81824 :: Regression test of bug #81824 yaim-cream-ce should ma... | PASS |
2 critical tests, 2 passed, 0 failed
2 tests total, 2 passed, 0 failed
==============================================================================

Test PASSED

 
Added:
>
>
[traldi@cert-25 ~]$ glite-ce-job-suspend https://cream-29.pd.infn.it:8443/CREAM867401632
 
Changed:
<
<
>
>
Are you sure you want to suspend specified job(s) [y/n]: y [traldi@cert-25 ~]$ glite-ce-job-status https://cream-29.pd.infn.it:8443/CREAM867401632
 
Changed:
<
<
Try first with direct submission:
[ale@cream-12 UI]$ glite-ce-job-submit -a -r cream-29.pd.infn.it:8443/cream-lsf-creamtest1 cream.jdl
https://cream-29.pd.infn.it:8443/CREAM499441859
>
>
**** JobID=[https://cream-29.pd.infn.it:8443/CREAM867401632] Status = [IDLE]
 
Deleted:
<
<
[root@cream-29 cream]# grep CREAM499441859 /var/log/cream/accounting/blahp.log-20120326 "timestamp=2012-03-26 09:00:36" "userDN=/C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Alessio Gianelle" "userFQAN=/dteam/Role=NULL/Capability=NULL" "userFQAN=/dteam/NGI_IT/Role=NULL/Capability=NULL" "ceID=cream-29.pd.infn.it:8443/cream-lsf-creamtest1" "jobID=CREAM499441859" "lrmsID=642810" "localUser=18239" "clientID=cre29_499441859"
 
Changed:
<
<
Try submission through a WMS:
[ale@cream-12 UI]$ glite-wms-job-submit -a -e https://emi-demo11.cnaf.infn.it:7443/glite_wms_wmproxy_server -r cream-29.pd.infn.it:8443/cream-lsf-creamtest1 test.jdl 
>
>
[traldi@cert-25 ~]$ glite-ce-job-status https://cream-29.pd.infn.it:8443/CREAM867401632
 
Changed:
<
<
Connecting to the service https://emi-demo11.cnaf.infn.it:7443/glite_wms_wmproxy_server
>
>
**** JobID=[https://cream-29.pd.infn.it:8443/CREAM867401632] Status = [HELD]
 
Changed:
<
<
================== glite-wms-job-submit Success ==================
>
>
[traldi@cert-25 ~]$ date Fri Jun 29 15:39:30 CEST 2012
 
Changed:
<
<
The job has been successfully submitted to the WMProxy Your job identifier is:
>
>
[traldi@cert-25 ~]$ glite-ce-job-status https://cream-29.pd.infn.it:8443/CREAM867401632
 
Changed:
<
<
https://emi-demo11.cnaf.infn.it:9000/kKD1V4dLlSdALEzPA_1qxg
>
>
**** JobID=[https://cream-29.pd.infn.it:8443/CREAM867401632] Status = [HELD]
 
Deleted:
<
<
======================================================================
 
Added:
>
>
[traldi@cert-25 ~]$ glite-ce-job-resume https://cream-29.pd.infn.it:8443/CREAM867401632
 
Added:
>
>
Are you sure you want to resume specified job(s) [y/n]: y
 
Changed:
<
<
[ale@cream-12 UI]$ glite-wms-job-logging-info -v 2 --event Transfer https://emi-demo11.cnaf.infn.it:9000/kKD1V4dLlSdALEzPA_1qxg | grep "Dest jobid" - Dest jobid = unavailable - Dest jobid = https://cream-29.pd.infn.it:8443/CREAM581241790
>
>
[traldi@cert-25 ~]$ date Fri Jun 29 15:39:51 CEST 2012
 
Added:
>
>
[traldi@cert-25 ~]$ glite-ce-job-status https://cream-29.pd.infn.it:8443/CREAM867401632
 
Changed:
<
<
[root@cream-29 cream]# grep kKD1V4dLlSdALEzPA_1qxg /var/log/cream/accounting/blahp.log-20120326 "timestamp=2012-03-26 09:05:37" "userDN=/C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Alessio Gianelle" "userFQAN=/dteam/Role=NULL/Capability=NULL" "userFQAN=/dteam/NGI_IT/Role=NULL/Capability=NULL" "ceID=cream-29.pd.infn.it:8443/cream-lsf-creamtest1" "jobID=https://emi-demo11.cnaf.infn.it:9000/kKD1V4dLlSdALEzPA_1qxg" "lrmsID=642811" "localUser=18239" "clientID=cre29_581241790"

Test PASSED

[root@cream-29 cream]#  cat /etc/logrotate.d/blahp-logrotate | grep rotate
        rotate 365

Test PASSED

[ale@cream-12 UI]$ glite-ce-job-submit -a -r cream-29.pd.infn.it:8443/cream-lsf-creamtest1 cream.jdl
https://cream-29.pd.infn.it:8443/CREAM670118061

[root@cream-29 cream]# grep 670118061  /var/log/cream/glite-ce-bnotifier.log
2012-03-26 11:12:00 Sent for Cream:[BatchJobId="642813"; JobStatus=2; ChangeTime="2012-03-26 11:11:57"; WorkerNode="prod-wn-001"; ClientJobId="670118061"; BlahJobName="cre29_670118061";]
2012-03-26 11:12:20 Sent for Cream:[BatchJobId="642813"; JobStatus=4; ChangeTime="2012-03-26 11:12:08"; JwExitCode=0; Reason="reason=0"; ClientJobId="670118061"; BlahJobName="cre29_670118061";]

Test PASSED

[root@cream-29 cream]# su - tomcat
-sh-3.2$  /usr/bin/blahpd
$GahpVersion: 1.16.5 Mar 31 2008 INFN\ blahpd\ (poly,new_esc_format) $
BLAH_SET_SUDO_ID dteam001
S Sudo\ mode\ on
BLAH_JOB_SUBMIT 1 [Cmd="/bin/echo";Args="$HOSTNAME";Out="/tmp/stdout_l15367";In="/dev/null";GridType="lsf";Queue="creamtest1";x509userproxy="/tmp/proxy";Iwd="/tmp";TransferOutput="output_file";TransferOutputRemaps="output_file=/tmp/stdout_l15367";GridResource="blah"]
S
results
S 1
1 0 No\ error lsf/20120326/642815
Connection closed by remote host
-sh-3.2$  cat /tmp/stdout_l15367 
prod-wn-001
-sh-3.2$ logout

[root@cream-29 cream]# bhist -w -l 642815 | grep Dispatched
Mon Mar 26 11:18:42: Dispatched to <prod-wn-001>;

Test PASSED

[root@cream-29 ~]# ps ax | grep BLParserLSF
  754 pts/1    S+     0:00 grep BLParserLSF
31468 ?        Sl     0:00 /usr/bin/BLParserLSF -d 1 -l /var/log/cream/glite-lsfparser.log -b /opt/lsf/7.0/linux2.6-glibc2.3-x86_64/bin -c /opt/lsf/conf -p 33333 -m 56565
>
>
**** JobID=[https://cream-29.pd.infn.it:8443/CREAM867401632] Status = [DONE-OK] ExitCode = [0]
 
Added:
>
>
Bug #94519: Updater for LSF can misidentify killed jobs as finished FIXED not certified
 
Changed:
<
<
Test PASSED

Submit 1000 jobs, one every 3 seconds monitoring the Used RSS memory of the /usr/bin/BUpdaterLSF process:

lsf.png

Submit 1000 jobs, one every 3 seconds monitoring the Used RSS memory of the /usr/bin/BUpdaterPBS process:

pbs.png

Test PASSED

>
>
Bug #95392: Heavy usage of 'bjobsinfo' still hurts LSF FIXED not certified
 

-- SergioTraldi - 2012-06-28 \ No newline at end of file

Added:
>
>
META FILEATTACHMENT attachment="clean_conf_log.txt" attr="h" comment="" date="1340974181" name="clean_conf_log.txt" path="clean_conf_log.txt" size="69607" user="SergioTraldi" version="1"
META FILEATTACHMENT attachment="clean_install_log.txt" attr="h" comment="" date="1340974181" name="clean_install_log.txt" path="clean_install_log.txt" size="142428" user="SergioTraldi" version="1"
META FILEATTACHMENT attachment="logFile_CancelledNew.html" attr="h" comment="" date="1340974181" name="logFile_CancelledNew.html" path="logFile_CancelledNew.html" size="164921" user="SergioTraldi" version="1"
META FILEATTACHMENT attachment="logFile_CancelledOld.html" attr="h" comment="" date="1340974181" name="logFile_CancelledOld.html" path="logFile_CancelledOld.html" size="164921" user="SergioTraldi" version="1"
META FILEATTACHMENT attachment="logFile_NormallyFinishedNew.html" attr="h" comment="" date="1340974181" name="logFile_NormallyFinishedNew.html" path="logFile_NormallyFinishedNew.html" size="164969" user="SergioTraldi" version="1"
META FILEATTACHMENT attachment="logFile_NormallyFinishedOld.htlm" attr="h" comment="" date="1340974181" name="logFile_NormallyFinishedOld.htlm" path="logFile_NormallyFinishedOld.htlm" size="164969" user="SergioTraldi" version="1"
META FILEATTACHMENT attachment="logFile_SuspendedResumedNew.html" attr="h" comment="" date="1340974181" name="logFile_SuspendedResumedNew.html" path="logFile_SuspendedResumedNew.html" size="165137" user="SergioTraldi" version="1"
META FILEATTACHMENT attachment="lsf_update.txt" attr="h" comment="" date="1340974181" name="lsf_update.txt" path="lsf_update.txt" size="62684" user="SergioTraldi" version="1"
META FILEATTACHMENT attachment="lsf_update_confnew.txt" attr="h" comment="" date="1340974181" name="lsf_update_confnew.txt" path="lsf_update_confnew.txt" size="44841" user="SergioTraldi" version="1"
META FILEATTACHMENT attachment="lsf_update_confold.txt" attr="h" comment="" date="1340974181" name="lsf_update_confold.txt" path="lsf_update_confold.txt" size="44806" user="SergioTraldi" version="1"
META FILEATTACHMENT attachment="logFile_NormallyFinishedOld.txt" attr="h" comment="" date="1340974678" name="logFile_NormallyFinishedOld.txt" path="logFile_NormallyFinishedOld.txt" size="164969" user="SergioTraldi" version="1"

Revision 22012-06-29 - SergioTraldi

Line: 1 to 1
 
META TOPICPARENT name="TestingBlah"
Deleted:
<
<
 

Testing report: IGIRTC-82

Line: 23 to 22
 

Clean Installation

Changed:
<
<
  • Configuration log file The first configuration attempt fails because pbs server does not start, it must be started manually and then re-configured.
>
>
 

Upgrade Installation

Line: 33 to 32
 
Deleted:
<
<

PBS CE

(S)GE CE

 

Unit Tests

Changed:
<
<
Not Available. The plan is to provide some unit tests starting with EMI-2.
>
>
Not Available.
 

System tests

Line: 57 to 44
 

Test submission

  • Test result for LSF is available here PASSED
Changed:
<
<
  • Test result for PBS is available here PASSED
  • Test result for SGE is available here PASSED with WARNING \The Job Cancel Test failed. Anaklysis of problm: it is a delay in state detection by bnotifier, sending a job sleeping for 10min it finishes in DONE-OK:
cat test.jdl
[
executable="/bin/sleep" ;
arguments = "600";
stdoutput="out3.out";
stderror="err3.err";
]

$ glite-ce-job-submit -a -r emitestbed21.cnaf.infn.it:8443/cream-sge-emitesters  -a /home/bertocco/gridka.jdl
https://emitestbed21.cnaf.infn.it:8443/CREAM499803862

$ glite-ce-job-cancel https://emitestbed21.cnaf.infn.it:8443/CREAM499803862

Are you sure you want to cancel specified job(s) [y/n]: y

$ glite-ce-job-status https://emitestbed21.cnaf.infn.it:8443/CREAM499803862

******  JobID=[https://emitestbed21.cnaf.infn.it:8443/CREAM499803862]
   Status        = [CANCELLED]
   ExitCode      = []
   Description   = [Cancelled by user]

Checking the batch system another strange thing: the job seems to remain in the batch system and finish regularly in 10min, after cancellation:

# qstat -u '*'
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
-----------------------------------------------------------------------------------------------------------------
    105 0.55500 cream_4998 tst22        r     03/27/2012 17:34:06 emitesters@emitestbed13.cnaf.i     1        

# qstat -j 105
==============================================================
job_number:                 105
exec_file:                  job_scripts/105
submission_time:            Tue Mar 27 17:33:57 2012
owner:                      tst22
uid:                        61022
group:                      testers
gid:                        6100
sge_o_home:                 /home/tst22
sge_o_log_name:             tst22
sge_o_path:                 /opt/SGE/bin/linux-x64:/usr/bin:/bin
sge_o_shell:                /bin/sh
sge_o_workdir:              /var/tmp
sge_o_host:                 emitestbed21
account:                    sge
mail_list:                  tst22@emitestbed21.cnaf.infn.it
notify:                     FALSE
job_name:                   cream_499803862
jobshare:                   0
hard_queue_list:            emitesters
shell_list:                 NONE:/bin/bash
env_list:                   SGE_stagein=CREAM499803862_jobWrapper.sh.61022.6563.1332862437@emitestbed21.cnaf.infn.it:/var/cream_sandbox/testers/_C_IT_O_INFN_OU_Personal_Certificate_L_Padova_CN_Sara_Bertocco_testers_eu_emi_eu_Role_NULL_Capability_NULL_tst22/49/CREAM499803862/CREAM499803862_jobWrapper.sh@@@cream_499803862.proxy@emitestbed21.cnaf.infn.it:/var/cream_sandbox/testers/_C_IT_O_INFN_OU_Personal_Certificate_L_Padova_CN_Sara_Bertocco_testers_eu_emi_eu_Role_NULL_Capability_NULL_tst22/proxy/eef209cdc40518c452fc0ad35b03b9a2ba1d519b11141130366494,SGE_stageout=out_cream_499803862_StandardOutput@emitestbed21.cnaf.infn.it:/var/cream_sandbox/testers/_C_IT_O_INFN_OU_Personal_Certificate_L_Padova_CN_Sara_Bertocco_testers_eu_emi_eu_Role_NULL_Capability_NULL_tst22/49/CREAM499803862/StandardOutput@@@err_cream_499803862_StandardError@emitestbed21.cnaf.infn.it:/var/cream_sandbox/testers/_C_IT_O_INFN_OU_Personal_Certificate_L_Padova_CN_Sara_Bertocco_testers_eu_emi_eu_Role_NULL_Capability_NULL_tst22/49/CREAM499803862/StandardError
script_file:                /tmp/cream_499803862
usage    1:                 cpu=00:00:00, mem=0.00448 GBs, io=0.00110, vmem=404.660M, maxvmem=404.660M
scheduling info:            queue instance "emitesters@emitestbed13.cnaf.infn.it" dropped because it is full

# qacct -j 105
==============================================================
qname        emitesters          
hostname     emitestbed13.cnaf.infn.it
group        testers             
owner        tst22               
project      NONE                
department   defaultdepartment   
jobname      cream_499803862     
jobnumber    105                 
taskid       undefined
account      sge                 
priority     0                   
qsub_time    Tue Mar 27 17:33:57 2012
start_time   Tue Mar 27 17:34:07 2012
end_time     Tue Mar 27 17:44:08 2012
granted_pe   NONE                
slots        1                   
failed       0    
exit_status  0                   
ru_wallclock 601          
ru_utime     0.615        
ru_stime     0.578        
ru_maxrss    2788                
ru_ixrss     0                   
ru_ismrss    0                   
ru_idrss     0                   
ru_isrss     0                   
ru_minflt    43062               
ru_majflt    0                   
ru_nswap     0                   
ru_inblock   0                   
ru_oublock   0                   
ru_msgsnd    0                   
ru_msgrcv    0                   
ru_nsignals  0                   
ru_nvcsw     270                 
ru_nivcsw    344                 
cpu          1.193        
mem          0.004             
io           0.001             
iow          0.000             
maxvmem      404.660M
arid         undefined
>
>
 

BLParser test

Revision 12012-06-28 - SergioTraldi

Line: 1 to 1
Added:
>
>
META TOPICPARENT name="TestingBlah"

Testing report: IGIRTC-82

Summary

  • Product: BLAH 1.16.6-3
  • Release Task: Task #30327
  • ETICS Subsystem Configuration Name: emi-cream-ce_R_1_13_9_3
  • VCS Tag: -, -
  • EMI Major Release: EMI 1 (Kebnekaise)
  • Platform: SL5 epel
  • Author: Sergio Traldi
  • Testing report:
  • Certification report:
  • Outcome: "Certified*

Deployment tests

Clean Installation

Upgrade Installation

LSF CE

PBS CE

(S)GE CE

Unit Tests

Not Available. The plan is to provide some unit tests starting with EMI-2.

System tests

Functionality tests

Test submission

  • Test result for LSF is available here PASSED
  • Test result for PBS is available here PASSED
  • Test result for SGE is available here PASSED with WARNING \The Job Cancel Test failed. Anaklysis of problm: it is a delay in state detection by bnotifier, sending a job sleeping for 10min it finishes in DONE-OK:
cat test.jdl
[
executable="/bin/sleep" ;
arguments = "600";
stdoutput="out3.out";
stderror="err3.err";
]

$ glite-ce-job-submit -a -r emitestbed21.cnaf.infn.it:8443/cream-sge-emitesters  -a /home/bertocco/gridka.jdl
https://emitestbed21.cnaf.infn.it:8443/CREAM499803862

$ glite-ce-job-cancel https://emitestbed21.cnaf.infn.it:8443/CREAM499803862

Are you sure you want to cancel specified job(s) [y/n]: y

$ glite-ce-job-status https://emitestbed21.cnaf.infn.it:8443/CREAM499803862

******  JobID=[https://emitestbed21.cnaf.infn.it:8443/CREAM499803862]
   Status        = [CANCELLED]
   ExitCode      = []
   Description   = [Cancelled by user]

Checking the batch system another strange thing: the job seems to remain in the batch system and finish regularly in 10min, after cancellation:

# qstat -u '*'
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
-----------------------------------------------------------------------------------------------------------------
    105 0.55500 cream_4998 tst22        r     03/27/2012 17:34:06 emitesters@emitestbed13.cnaf.i     1        

# qstat -j 105
==============================================================
job_number:                 105
exec_file:                  job_scripts/105
submission_time:            Tue Mar 27 17:33:57 2012
owner:                      tst22
uid:                        61022
group:                      testers
gid:                        6100
sge_o_home:                 /home/tst22
sge_o_log_name:             tst22
sge_o_path:                 /opt/SGE/bin/linux-x64:/usr/bin:/bin
sge_o_shell:                /bin/sh
sge_o_workdir:              /var/tmp
sge_o_host:                 emitestbed21
account:                    sge
mail_list:                  tst22@emitestbed21.cnaf.infn.it
notify:                     FALSE
job_name:                   cream_499803862
jobshare:                   0
hard_queue_list:            emitesters
shell_list:                 NONE:/bin/bash
env_list:                   SGE_stagein=CREAM499803862_jobWrapper.sh.61022.6563.1332862437@emitestbed21.cnaf.infn.it:/var/cream_sandbox/testers/_C_IT_O_INFN_OU_Personal_Certificate_L_Padova_CN_Sara_Bertocco_testers_eu_emi_eu_Role_NULL_Capability_NULL_tst22/49/CREAM499803862/CREAM499803862_jobWrapper.sh@@@cream_499803862.proxy@emitestbed21.cnaf.infn.it:/var/cream_sandbox/testers/_C_IT_O_INFN_OU_Personal_Certificate_L_Padova_CN_Sara_Bertocco_testers_eu_emi_eu_Role_NULL_Capability_NULL_tst22/proxy/eef209cdc40518c452fc0ad35b03b9a2ba1d519b11141130366494,SGE_stageout=out_cream_499803862_StandardOutput@emitestbed21.cnaf.infn.it:/var/cream_sandbox/testers/_C_IT_O_INFN_OU_Personal_Certificate_L_Padova_CN_Sara_Bertocco_testers_eu_emi_eu_Role_NULL_Capability_NULL_tst22/49/CREAM499803862/StandardOutput@@@err_cream_499803862_StandardError@emitestbed21.cnaf.infn.it:/var/cream_sandbox/testers/_C_IT_O_INFN_OU_Personal_Certificate_L_Padova_CN_Sara_Bertocco_testers_eu_emi_eu_Role_NULL_Capability_NULL_tst22/49/CREAM499803862/StandardError
script_file:                /tmp/cream_499803862
usage    1:                 cpu=00:00:00, mem=0.00448 GBs, io=0.00110, vmem=404.660M, maxvmem=404.660M
scheduling info:            queue instance "emitesters@emitestbed13.cnaf.infn.it" dropped because it is full

# qacct -j 105
==============================================================
qname        emitesters          
hostname     emitestbed13.cnaf.infn.it
group        testers             
owner        tst22               
project      NONE                
department   defaultdepartment   
jobname      cream_499803862     
jobnumber    105                 
taskid       undefined
account      sge                 
priority     0                   
qsub_time    Tue Mar 27 17:33:57 2012
start_time   Tue Mar 27 17:34:07 2012
end_time     Tue Mar 27 17:44:08 2012
granted_pe   NONE                
slots        1                   
failed       0    
exit_status  0                   
ru_wallclock 601          
ru_utime     0.615        
ru_stime     0.578        
ru_maxrss    2788                
ru_ixrss     0                   
ru_ismrss    0                   
ru_idrss     0                   
ru_isrss     0                   
ru_minflt    43062               
ru_majflt    0                   
ru_nswap     0                   
ru_inblock   0                   
ru_oublock   0                   
ru_msgsnd    0                   
ru_msgrcv    0                   
ru_nsignals  0                   
ru_nvcsw     270                 
ru_nivcsw    344                 
cpu          1.193        
mem          0.004             
io           0.001             
iow          0.000             
maxvmem      404.660M
arid         undefined

BLParser test

Old BLParser

Tests.Check Notifications For Cancelled Jobs :: Test that notifications are...
==============================================================================
Set Log Level :: Set the log level used for the test suite. This c... | PASS |
------------------------------------------------------------------------------
check_notifications_for_cancelled_jobs                                | PASS |
------------------------------------------------------------------------------
Tests.Check Notifications For Cancelled Jobs :: Test that notifica... | PASS |
2 critical tests, 2 passed, 0 failed
2 tests total, 2 passed, 0 failed
==============================================================================
Tests.Check Notifications For Normally Finished Jobs :: Test that notificat...
==============================================================================
Set Log Level :: Set the log level used for the test suite. This c... | PASS |
------------------------------------------------------------------------------
check_notifications_for_normally_finished_jobs                        | PASS |
------------------------------------------------------------------------------
Tests.Check Notifications For Normally Finished Jobs :: Test that ... | PASS |
2 critical tests, 2 passed, 0 failed
2 tests total, 2 passed, 0 failed
==============================================================================
Tests.Check Notifications For Suspended Resumed Jobs :: Test that notificat...
==============================================================================
Set Log Level :: Set the log level used for the test suite. This c... | PASS |
------------------------------------------------------------------------------
check_notifications_for_suspended_resumed_jobs                        | FAIL |
_error: Expected status should be in ['HELD'] for job https://cream-29.pd.infn.it:8443/CREAM705156955 was actually IDLE
------------------------------------------------------------------------------
Tests.Check Notifications For Suspended Resumed Jobs :: Test that ... | FAIL |
2 critical tests, 1 passed, 1 failed
2 tests total, 1 passed, 1 failed
==============================================================================
Tests                                                                 | FAIL |
6 critical tests, 5 passed, 1 failed
6 tests total, 5 passed, 1 failed
==============================================================================
Output:  /home/ale/blah/italiangrid-cream_blah_testsuites-09156c3_ver2/output.xml
Log:     /home/ale/blah/italiangrid-cream_blah_testsuites-09156c3_ver2/log.html
Report:  /home/ale/blah/italiangrid-cream_blah_testsuites-09156c3_ver2/report.html

Tests.Check Notifications For Cancelled Jobs :: Test that notifications are...
==============================================================================
Set Log Level :: Set the log level used for the test suite. This c... | PASS |
------------------------------------------------------------------------------
check_notifications_for_cancelled_jobs                                | PASS |
------------------------------------------------------------------------------
Tests.Check Notifications For Cancelled Jobs :: Test that notifica... | PASS |
2 critical tests, 2 passed, 0 failed
2 tests total, 2 passed, 0 failed
==============================================================================
Tests.Check Notifications For Normally Finished Jobs :: Test that notificat...
==============================================================================
Set Log Level :: Set the log level used for the test suite. This c... | PASS |
------------------------------------------------------------------------------
check_notifications_for_normally_finished_jobs                        | PASS |
------------------------------------------------------------------------------
Tests.Check Notifications For Normally Finished Jobs :: Test that ... | PASS |
2 critical tests, 2 passed, 0 failed
2 tests total, 2 passed, 0 failed
==============================================================================
Tests.Check Notifications For Suspended Resumed Jobs :: Test that notificat...
==============================================================================
Set Log Level :: Set the log level used for the test suite. This c... | PASS |
------------------------------------------------------------------------------
check_notifications_for_suspended_resumed_jobs                        | FAIL |
_error: Expected status should be in ['HELD'] for job https://cream-41.pd.infn.it:8443/CREAM324312963 was actually DONE-OK
------------------------------------------------------------------------------
Tests.Check Notifications For Suspended Resumed Jobs :: Test that ... | FAIL |
2 critical tests, 1 passed, 1 failed
2 tests total, 1 passed, 1 failed
==============================================================================
Tests                                                                 | FAIL |
6 critical tests, 5 passed, 1 failed
6 tests total, 5 passed, 1 failed
==============================================================================
  • Job which is suspended and then resumed

New BLParser

Tests.Check Notifications For Cancelled Jobs :: Test that notifications are...
==============================================================================
Set Log Level :: Set the log level used for the test suite. This c... | PASS |
------------------------------------------------------------------------------
check_notifications_for_cancelled_jobs                                | PASS |
------------------------------------------------------------------------------
Tests.Check Notifications For Cancelled Jobs :: Test that notifica... | PASS |
2 critical tests, 2 passed, 0 failed
2 tests total, 2 passed, 0 failed
==============================================================================
Tests.Check Notifications For Normally Finished Jobs :: Test that notificat...
==============================================================================
Set Log Level :: Set the log level used for the test suite. This c... | PASS |
------------------------------------------------------------------------------
check_notifications_for_normally_finished_jobs                        | PASS |
------------------------------------------------------------------------------
Tests.Check Notifications For Normally Finished Jobs :: Test that ... | PASS |
2 critical tests, 2 passed, 0 failed
2 tests total, 2 passed, 0 failed
==============================================================================
Tests.Check Notifications For Suspended Resumed Jobs :: Test that notificat...
==============================================================================
Set Log Level :: Set the log level used for the test suite. This c... | PASS |
------------------------------------------------------------------------------
check_notifications_for_suspended_resumed_jobs                        | PASS |
------------------------------------------------------------------------------
Tests.Check Notifications For Suspended Resumed Jobs :: Test that ... | PASS |
2 critical tests, 2 passed, 0 failed
2 tests total, 2 passed, 0 failed
==============================================================================
Tests                                                                 | PASS |
6 critical tests, 6 passed, 0 failed
6 tests total, 6 passed, 0 failed
==============================================================================

Tests.Check Notifications For Cancelled Jobs :: Test that notifications are...
==============================================================================
Set Log Level :: Set the log level used for the test suite. This c... | PASS |
------------------------------------------------------------------------------
check_notifications_for_cancelled_jobs                                | PASS |
------------------------------------------------------------------------------
Tests.Check Notifications For Cancelled Jobs :: Test that notifica... | PASS |
2 critical tests, 2 passed, 0 failed
2 tests total, 2 passed, 0 failed
==============================================================================
Tests.Check Notifications For Normally Finished Jobs :: Test that notificat...
==============================================================================
Set Log Level :: Set the log level used for the test suite. This c... | PASS |
------------------------------------------------------------------------------
check_notifications_for_normally_finished_jobs                        | PASS |
------------------------------------------------------------------------------
Tests.Check Notifications For Normally Finished Jobs :: Test that ... | PASS |
2 critical tests, 2 passed, 0 failed
2 tests total, 2 passed, 0 failed
==============================================================================
Tests.Check Notifications For Suspended Resumed Jobs :: Test that notificat...
==============================================================================
Set Log Level :: Set the log level used for the test suite. This c... | PASS |
------------------------------------------------------------------------------
check_notifications_for_suspended_resumed_jobs                        | PASS |
------------------------------------------------------------------------------
Tests.Check Notifications For Suspended Resumed Jobs :: Test that ... | PASS |
2 critical tests, 2 passed, 0 failed
2 tests total, 2 passed, 0 failed
==============================================================================
Tests                                                                 | PASS |
6 critical tests, 6 passed, 0 failed
6 tests total, 6 passed, 0 failed
==============================================================================

Regression tests

Verification attached bugs

Bug #89527: BLAHP produced -W stage(in/out) directives are incompatible with Torque 2.5.8 FIXED

Content of file to check here.

Bug #91037: BUpdaterLSF should use bjobs to detect final job state FIXED

  • Change debug_level and restart the services:
[root@cream-29 ~]# sed -i 's/bupdater_debug_level=2/bupdater_debug_level=3/' /etc/blah.config 
[root@cream-29 ~]# mv /var/log/cream/glite-ce-bupdater.log /var/log/cream/glite-ce-bupdater.log.old
[root@cream-29 ~]# service gLite restart
STOPPING SERVICES
*** glite-ce-blahparser:
Shutting down BNotifier:                                   [  OK  ]
Shutting down BUpdaterLSF:                                 [  OK  ]

*** glite-lb-locallogger:
Stopping glite-lb-logd ... done
Stopping glite-lb-interlogd ... done

*** tomcat5:
Stopping tomcat5:                                          [  OK  ]

STARTING SERVICES
*** tomcat5:
Starting tomcat5:                                          [  OK  ]

*** glite-lb-locallogger:
Starting glite-lb-logd ...This is LocalLogger, part of Workload Management System in EU DataGrid & EGEE.
 done
Starting glite-lb-interlogd ... done

*** glite-ce-blahparser:
Starting BNotifier: /usr/bin/BNotifier: Error creating and binding socket: Address already in use
                                                           [FAILED]
Starting BUpdaterLSF:                                      [  OK  ]
  • Submit a job and wait for its completation:
[ale@cream-12 UI]$ glite-ce-job-submit -a -r cream-29.pd.infn.it:8443/cream-lsf-cert cream.jdl
https://cream-29.pd.infn.it:8443/CREAM239301025
[ale@cream-12 UI]$ glite-ce-job-status https://cream-29.pd.infn.it:8443/CREAM239301025

******  JobID=[https://cream-29.pd.infn.it:8443/CREAM239301025]
   Status        = [DONE-OK]
   ExitCode      = [0]

[root@cream-29 ~]# grep 239301025 /var/log/cream/glite-ce-bnotifier.log
2012-03-22 17:05:36 Sent for Cream:[BatchJobId="622199"; JobStatus=4; ChangeTime="2012-03-22 17:05:22"; JwExitCode=0; Reason="reason=0"; ClientJobId="239301025"; BlahJobName="cre29_239301025";]
  • Verify if bhist has been called:
[root@cream-29 ~]# grep bhist /var/log/cream/glite-ce-bupdater.log
2012-03-22 17:00:43 /usr/bin/BUpdaterLSF: key bupdater_use_bhist_time_constraint not found - using the default:no
2012-03-22 17:00:43 /usr/bin/BUpdaterLSF: key bupdater_use_bhist_for_killed not found - using the default:no
2012-03-22 17:01:26 /usr/bin/BUpdaterLSF: key bupdater_use_bhist_time_constraint not found - using the default:no
2012-03-22 17:01:26 /usr/bin/BUpdaterLSF: key bupdater_use_bhist_for_killed not found - using the default:no
2012-03-22 17:03:54 /usr/bin/BUpdaterLSF: key bupdater_use_bhist_time_constraint not found - using the default:no
2012-03-22 17:03:54 /usr/bin/BUpdaterLSF: key bupdater_use_bhist_for_killed not found - using the default:no
2012-03-22 17:04:51 /usr/bin/BUpdaterLSF: key bupdater_use_bhist_time_constraint not found - using the default:no
2012-03-22 17:04:51 /usr/bin/BUpdaterLSF: key bupdater_use_bhist_for_killed not found - using the default:no

Bug #92281: Purge of registry can cause registry corruption FIXED not certified

It is not possible to replicate the problem sistematically.

Bug #92774: BLParserLSF could crash searching in old logs FIXED not certified

Using the old LSF BLParser it usually crashed, so the fact that the functional tests passed should be a good signal that the bug has been fixed.

Verification old bugs

Submitted 5000 jobs to a CREAM CE configured using the new blparser, and with job_registry_use_mmap=yes.

Monitored the used RSS of the blahpd processes. At the end the maximum value between all the process is 4560.

Test PASSED

Configure /etc/blah.config:

[root@cream-29 ~]# tail -4 /etc/blah.config
# Verify fix for bug #77776
lsf_batch_caching_enabled=yes
batch_command_caching_filter=/usr/bin/runcmd.pl

Where runcmd.pl is:

#!/usr/bin/perl
#---------------------#
#  PROGRAM:  argv.pl  #
#---------------------#

$numArgs = $#ARGV + 1;
open (MYFILE, '>>/tmp/xyz');
foreach $argnum (0 .. $#ARGV) {
    print MYFILE "$ARGV[$argnum] ";
}
print MYFILE "\n";
close (MYFILE); 

Restart the services and submit 10 jobs to the CE.

[root@cream-29 cream]# cat /tmp/xyz 
/opt/lsf/7.0/linux2.6-glibc2.3-x86_64/bin/bjobs -u all -l -a 
/opt/lsf/7.0/linux2.6-glibc2.3-x86_64/bin/bhist -u all -d -l -n 10 
/opt/lsf/7.0/linux2.6-glibc2.3-x86_64/bin/bjobs -u all -l -a 
/opt/lsf/7.0/linux2.6-glibc2.3-x86_64/bin/bhist -u all -d -l -n 10 
/opt/lsf/7.0/linux2.6-glibc2.3-x86_64/bin/bjobs -u all -l -a 
/opt/lsf/7.0/linux2.6-glibc2.3-x86_64/bin/bhist -u all -d -l -n 10 
/opt/lsf/7.0/linux2.6-glibc2.3-x86_64/bin/bjobs -u all -l -a 
/opt/lsf/7.0/linux2.6-glibc2.3-x86_64/bin/bhist -u all -d -l -n 10 

Test PASSED

[root@cream-29 cream]# ls -l /var/blah
total 8
-rw-r--r-- 1 tomcat tomcat    0 Mar 23 15:49 blah_bnotifier.pid
-rw-r--r-- 1 tomcat tomcat    4 Mar 23 15:53 blah_bupdater.pid
drwxrwx--t 4 tomcat tomcat 4096 Mar 23 16:07 user_blah_job_registry.bjr
[root@cream-29 cream]#  ls -l /var/blah/user_blah_job_registry.bjr/
total 14388
-rw-rw-r-- 1 tomcat tomcat 11377096 Mar 23 17:04 registry
-rw-r--r-- 1 tomcat tomcat  3066960 Mar 23 16:07 registry.by_blah_index
-rw-rw-rw- 1 tomcat tomcat        0 Mar 23 17:04 registry.locktest
drwxrwx-wt 2 tomcat tomcat     4096 Mar 23 17:04 registry.npudir
drwxrwx-wt 2 tomcat tomcat   253952 Mar 23 17:04 registry.proxydir
-rw-r--r-- 1 tomcat tomcat       99 Mar 23 15:49 registry.subjectlist
[root@cream-29 cream]# ls -l /var/blah/user_blah_job_registry.bjr/registry.npudir
total 8
-rw-rw-r-- 1 dteam017 dteam 856 Mar 23 17:04 npu_d4X1Ao
-rw-rw-r-- 1 dteam017 dteam 856 Mar 23 17:04 npu_jZXJGF
[root@cream-29 cream]# ls -l /var/blah/user_blah_job_registry.bjr/registry.proxydir/ | head -10
total 2448
lrwxrwxrwx 1 dteam017 dteam 198 Mar 23 16:41 proxy_637551_RUGJ92 -> /var/glite/cream_sandbox/dteam/_C_IT_O_INFN_OU_Personal_Certificate_L_Padova_CN_Alessio_Gianelle_dteam_Role_NULL_Capability_NULL_dteam017/proxy/f9099a3af228b82c323e26c1f9f494aefdd1b43910396930026260
lrwxrwxrwx 1 dteam017 dteam 198 Mar 23 16:41 proxy_637552_LrF5tC -> /var/glite/cream_sandbox/dteam/_C_IT_O_INFN_OU_Personal_Certificate_L_Padova_CN_Alessio_Gianelle_dteam_Role_NULL_Capability_NULL_dteam017/proxy/23a5e3de27a7d82748cabae2d5dccec329853fc610396930026260
lrwxrwxrwx 1 dteam017 dteam 198 Mar 23 16:41 proxy_637553_WvebYm -> /var/glite/cream_sandbox/dteam/_C_IT_O_INFN_OU_Personal_Certificate_L_Padova_CN_Alessio_Gianelle_dteam_Role_NULL_Capability_NULL_dteam017/proxy/751c2ef4ec29986f4bf63a602fbe626d7d1b3cab10396930026260
lrwxrwxrwx 1 dteam017 dteam 198 Mar 23 16:41 proxy_637554_4vdiJs -> /var/glite/cream_sandbox/dteam/_C_IT_O_INFN_OU_Personal_Certificate_L_Padova_CN_Alessio_Gianelle_dteam_Role_NULL_Capability_NULL_dteam017/proxy/4966974580d533d4d42a4635158382566ec6c2f610396930026260
lrwxrwxrwx 1 dteam017 dteam 198 Mar 23 16:41 proxy_637555_QX5Bi6 -> /var/glite/cream_sandbox/dteam/_C_IT_O_INFN_OU_Personal_Certificate_L_Padova_CN_Alessio_Gianelle_dteam_Role_NULL_Capability_NULL_dteam017/proxy/785cbd414da9ee5e79b633477b59a86071dcf50f10396930026260
lrwxrwxrwx 1 dteam017 dteam 198 Mar 23 16:41 proxy_637556_jTHBm1 -> /var/glite/cream_sandbox/dteam/_C_IT_O_INFN_OU_Personal_Certificate_L_Padova_CN_Alessio_Gianelle_dteam_Role_NULL_Capability_NULL_dteam017/proxy/df2c50573906838f8894b361c2a14860c3e4f16210396930026260
lrwxrwxrwx 1 dteam017 dteam 198 Mar 23 16:41 proxy_637557_DbWTbB -> /var/glite/cream_sandbox/dteam/_C_IT_O_INFN_OU_Personal_Certificate_L_Padova_CN_Alessio_Gianelle_dteam_Role_NULL_Capability_NULL_dteam017/proxy/6fd59e85358187345a4635505c9af7fe32ec61a310396930026260
lrwxrwxrwx 1 dteam017 dteam 198 Mar 23 16:41 proxy_637558_hhAHKX -> /var/glite/cream_sandbox/dteam/_C_IT_O_INFN_OU_Personal_Certificate_L_Padova_CN_Alessio_Gianelle_dteam_Role_NULL_Capability_NULL_dteam017/proxy/ced55ce67190ef6b2d8c56907926aed2f1a4ddfd10396930026260
lrwxrwxrwx 1 dteam017 dteam 198 Mar 23 16:41 proxy_637559_WjeEfP -> /var/glite/cream_sandbox/dteam/_C_IT_O_INFN_OU_Personal_Certificate_L_Padova_CN_Alessio_Gianelle_dteam_Role_NULL_Capability_NULL_dteam017/proxy/1d197ae0e6d8681258517525f8fc568b56bdf17810396930026260

Test PASSED

[root@cream-29 cream]# su - tomcat
-sh-3.2$ chown tomcat.tomcat /tmp/proxy 
-sh-3.2$ ls -l /tmp/proxy 
-rw------- 1 tomcat tomcat 6501 Mar 26 10:52 /tmp/proxy
        
-sh-3.2$ /usr/bin/blahpd
$GahpVersion: 1.16.5 Mar 31 2008 INFN\ blahpd\ (poly,new_esc_format) $
BLAH_SET_SUDO_ID dteam001
S Sudo\ mode\ on
blah_job_submit 1 [cmd="/bin/cp";Args="fstab\ fstab.out";TransferInput="/home/dteam001/dir1/fstab";TransferOutput="fstab.out";TransferOutputRemaps="fstab.out=/home/dteam001/dir1/fstab.out";gridtype="lsf";queue="cert";x509userproxy="/tmp/proxy"]
S
results
S 1
1 0 No\ error lsf/20120326/642806
Connection closed by remote host


[root@cream-29 cream]# ls -l /home/dteam001/dir1/ 
total 8
-rw-r--r-- 1 dteam001 dteam 618 Mar 23 17:11 fstab
-rw-r--r-- 1 dteam001 dteam 618 Mar 26 10:54 fstab.out

Test PASSED

Content of report file here.

==============================================================================
Bug 81824 :: Regression test of bug #81824 yaim-cream-ce should manage the ...
==============================================================================
Set Log Level :: Set the log level used for the test suite. This c... | PASS |
------------------------------------------------------------------------------
bug_81824                                                             | PASS |
------------------------------------------------------------------------------
Bug 81824 :: Regression test of bug #81824 yaim-cream-ce should ma... | PASS |
2 critical tests, 2 passed, 0 failed
2 tests total, 2 passed, 0 failed
==============================================================================

Test PASSED

Try first with direct submission:

[ale@cream-12 UI]$ glite-ce-job-submit -a -r cream-29.pd.infn.it:8443/cream-lsf-creamtest1 cream.jdl
https://cream-29.pd.infn.it:8443/CREAM499441859

[root@cream-29 cream]# grep CREAM499441859  /var/log/cream/accounting/blahp.log-20120326 
"timestamp=2012-03-26 09:00:36" "userDN=/C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Alessio Gianelle" "userFQAN=/dteam/Role=NULL/Capability=NULL" "userFQAN=/dteam/NGI_IT/Role=NULL/Capability=NULL" "ceID=cream-29.pd.infn.it:8443/cream-lsf-creamtest1" "jobID=CREAM499441859" "lrmsID=642810" "localUser=18239" "clientID=cre29_499441859"

Try submission through a WMS:

[ale@cream-12 UI]$ glite-wms-job-submit -a -e https://emi-demo11.cnaf.infn.it:7443/glite_wms_wmproxy_server -r cream-29.pd.infn.it:8443/cream-lsf-creamtest1 test.jdl 

Connecting to the service https://emi-demo11.cnaf.infn.it:7443/glite_wms_wmproxy_server


====================== glite-wms-job-submit Success ======================

The job has been successfully submitted to the WMProxy
Your job identifier is:

https://emi-demo11.cnaf.infn.it:9000/kKD1V4dLlSdALEzPA_1qxg

==========================================================================



[ale@cream-12 UI]$ glite-wms-job-logging-info -v 2 --event Transfer  https://emi-demo11.cnaf.infn.it:9000/kKD1V4dLlSdALEzPA_1qxg  | grep "Dest jobid"
- Dest jobid                 =    unavailable
- Dest jobid                 =    https://cream-29.pd.infn.it:8443/CREAM581241790


[root@cream-29 cream]# grep kKD1V4dLlSdALEzPA_1qxg  /var/log/cream/accounting/blahp.log-20120326 
"timestamp=2012-03-26 09:05:37" "userDN=/C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Alessio Gianelle" "userFQAN=/dteam/Role=NULL/Capability=NULL" "userFQAN=/dteam/NGI_IT/Role=NULL/Capability=NULL" "ceID=cream-29.pd.infn.it:8443/cream-lsf-creamtest1" "jobID=https://emi-demo11.cnaf.infn.it:9000/kKD1V4dLlSdALEzPA_1qxg" "lrmsID=642811" "localUser=18239" "clientID=cre29_581241790"

Test PASSED

[root@cream-29 cream]#  cat /etc/logrotate.d/blahp-logrotate | grep rotate
        rotate 365

Test PASSED

[ale@cream-12 UI]$ glite-ce-job-submit -a -r cream-29.pd.infn.it:8443/cream-lsf-creamtest1 cream.jdl
https://cream-29.pd.infn.it:8443/CREAM670118061

[root@cream-29 cream]# grep 670118061  /var/log/cream/glite-ce-bnotifier.log
2012-03-26 11:12:00 Sent for Cream:[BatchJobId="642813"; JobStatus=2; ChangeTime="2012-03-26 11:11:57"; WorkerNode="prod-wn-001"; ClientJobId="670118061"; BlahJobName="cre29_670118061";]
2012-03-26 11:12:20 Sent for Cream:[BatchJobId="642813"; JobStatus=4; ChangeTime="2012-03-26 11:12:08"; JwExitCode=0; Reason="reason=0"; ClientJobId="670118061"; BlahJobName="cre29_670118061";]

Test PASSED

[root@cream-29 cream]# su - tomcat
-sh-3.2$  /usr/bin/blahpd
$GahpVersion: 1.16.5 Mar 31 2008 INFN\ blahpd\ (poly,new_esc_format) $
BLAH_SET_SUDO_ID dteam001
S Sudo\ mode\ on
BLAH_JOB_SUBMIT 1 [Cmd="/bin/echo";Args="$HOSTNAME";Out="/tmp/stdout_l15367";In="/dev/null";GridType="lsf";Queue="creamtest1";x509userproxy="/tmp/proxy";Iwd="/tmp";TransferOutput="output_file";TransferOutputRemaps="output_file=/tmp/stdout_l15367";GridResource="blah"]
S
results
S 1
1 0 No\ error lsf/20120326/642815
Connection closed by remote host
-sh-3.2$  cat /tmp/stdout_l15367 
prod-wn-001
-sh-3.2$ logout

[root@cream-29 cream]# bhist -w -l 642815 | grep Dispatched
Mon Mar 26 11:18:42: Dispatched to <prod-wn-001>;

Test PASSED

[root@cream-29 ~]# ps ax | grep BLParserLSF
  754 pts/1    S+     0:00 grep BLParserLSF
31468 ?        Sl     0:00 /usr/bin/BLParserLSF -d 1 -l /var/log/cream/glite-lsfparser.log -b /opt/lsf/7.0/linux2.6-glibc2.3-x86_64/bin -c /opt/lsf/conf -p 33333 -m 56565

Test PASSED

Submit 1000 jobs, one every 3 seconds monitoring the Used RSS memory of the /usr/bin/BUpdaterLSF process:

lsf.png

Submit 1000 jobs, one every 3 seconds monitoring the Used RSS memory of the /usr/bin/BUpdaterPBS process:

pbs.png

Test PASSED

-- SergioTraldi - 2012-06-28

 
This site is powered by the TWiki collaboration platformCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback