Tags:
create new tag
,
view all tags
---++ Pre-certification of WMS 3.3.6 ---+++ https://savannah.cern.ch/task/?27731 ---++++ Repository http://etics-repository.cern.ch/repository/pm/volatile/repomd/id/4d429a92-e1ae-42e9-95e4-b2d5338349a5/sl5_x86_64_gcc412EPEL updating from a WMS 3.3.5 Tests: ---++++ SVG #4073 OK ---++++ SVG #4039 OK ---++++ BUG https://savannah.cern.ch/bugs/?92657 The pre-certification consists of simply submit a job to the WMS and scan the syslog file /var/log/message to see if the WMProxy and Manager logged the relevant information required by this bug. Simply log as root on the WMS machine and execute the command: <verbatim> tail -f /var/log/messages|egrep "wmproxy|manager" </verbatim> then log into an UI and submit a job (whatever JDL you like) to the WMS. 2 log lines should appear after few seconds in the console running the tail command: <verbatim> May 18 14:23:07 devel11 glite_wms_wmproxy_server[32565]: submission from lxgrid05.pd.infn.it, DN=/C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Alvise Dorigo, FQAN=/dteam/Role=NULL/Capability=NULL, userid=18702 for jobid=https://devel07.cnaf.infn.it:9000/rkYSfEe5IqsDc17_UpPu3Q May 18 14:23:10 devel11 glite-wms-workload_manager: jobid https://devel07.cnaf.infn.it:9000/rkYSfEe5IqsDc17_UpPu3Q was matched to destination creamce.gina.sara.nl:8443/cream-pbs-infra </verbatim> Note in particular the DN,FQAN,JobID information and the UI's hostname. ---++++ BUG https://savannah.cern.ch/bugs/?92922 The pre-certification I did was in 2 phases: verification of the bug with EMI1 installation, installation of the new RPM (that will be in the next EMI1 update 17) and verification that the bug disappeared. In order to reproduce the bug it is sufficient to use this JDL: <verbatim> [ Executable = "/bin/touch" ; Arguments = "/foo" ; Retrycount = 2; usertags = [ exe = "touch" ]; VirtualOrganisation="dteam"; requirements = ! RegExp("cream.*", other.GlueCEUniqueID); ] </verbatim> and submit it to a WMS EMI1 that have not the fix. Note that this bug occurs when the job lands on a NON-CREAM CE (this is why the requirements attribute specification in the JDL). This should be the result: <verbatim> glite-wms-job-status https://devel09.cnaf.infn.it:9000/U... ======================= glite-wms-job-status Success ===================== BOOKKEEPING INFORMATION: Status info for the Job : https://devel09.cnaf.infn.it:9000/U... Current Status: Done(Success) Exit code: 0 Status Reason: Job terminated successfully Destination: ce01.dur.scotgrid.ac.uk:2119/jobmanager-lcgpbs-q7d Submitted: Wed May 16 16:19:38 2012 CEST ========================================================================== </verbatim> Even if the job should return an exit code 1 (cannot create a file in /, permission denied), the Exit Code reported by the status is 0 as shown above. After applying the fix to this EMI1 WMS, the same JDL should produce the expected exit code (1) <verbatim> glite-wms-job-status https://devel07.cnaf.infn.it:9000/f... ======================= glite-wms-job-status Success ===================== BOOKKEEPING INFORMATION: Status info for the Job : https://devel07.cnaf.infn.it:9000/f... Current Status: Done(Exit Code !=0) Exit code: 1 Status Reason: Warning: job exit code != 0 Submitted: Wed May 16 16:54:25 2012 CEST ========================================================================== </verbatim> ---++++ BUG https://savannah.cern.ch/bugs/?93673 The bug is "Hopefully fixed". Triggering the problem is very difficult. ---++++ Submission to ARC: BUGS https://savannah.cern.ch/bugs/?92742 and https://savannah.cern.ch/bugs/?92924 - Use a CMS proxy and submit to korundi.grid.helsinki.fi *ONLY*: <verbatim> ======================= glite-wms-job-status Success ===================== BOOKKEEPING INFORMATION: Status info for the Job : https://devel11.cnaf.infn.it:9000/pLl1nyepSih7NYrivm8T1A Current Status: Done (Success) Exit code: 0 Status Reason: Job terminated successfully Destination: korundi.grid.helsinki.fi:2811/nordugrid-GE-mgrid Submitted: Tue Jun 5 22:37:22 2012 CEST ========================================================================== </verbatim> - Check if the gridmanager stays alive for reasonable amounts of time (occasional crashes are "normal" on WMS nodes). <verbatim> [root@devel11 mcecchi]# grep STARTING /var/local/condor/log/GridmanagerLog.glite |tail 05/31 14:59:45 ** condor_gridmanager (CONDOR_GRIDMANAGER) STARTING UP 05/31 15:09:57 ** condor_gridmanager (CONDOR_GRIDMANAGER) STARTING UP 05/31 15:10:08 ** condor_gridmanager (CONDOR_GRIDMANAGER) STARTING UP 05/31 16:13:20 ** condor_gridmanager (CONDOR_GRIDMANAGER) STARTING UP 05/31 16:13:30 ** condor_gridmanager (CONDOR_GRIDMANAGER) STARTING UP 05/31 16:23:42 ** condor_gridmanager (CONDOR_GRIDMANAGER) STARTING UP 05/31 16:24:00 ** condor_gridmanager (CONDOR_GRIDMANAGER) STARTING UP 06/01 15:58:27 ** condor_gridmanager (CONDOR_GRIDMANAGER) STARTING UP 06/05 13:06:53 ** condor_gridmanager (CONDOR_GRIDMANAGER) STARTING UP 06/05 22:19:07 ** condor_gridmanager (CONDOR_GRIDMANAGER) STARTING UP </verbatim> also check that the grid_monitor.sh script is properly read: <verbatim> [root@devel11 mcecchi]# locate grid_monitor.sh|xargs ls -lu -rwxr-xr-x 1 root root 42728 Jun 1 13:20 /opt/condor-7.4.2/libexec/glite/grid_monitor.sh -rwxr-xr-x 1 root root 38151 Jun 5 23:07 /opt/condor-7.4.2/sbin/grid_monitor.sh </verbatim> ---++++ Generic test of job submission Submission of 3 jobs (single, collection, dag) and one cancel with NON CREAM CE destination (use 'requirements = ! RegExp("cream.*", other.GlueCEUniqueID);' in the JDL). ---+++++ Single job <verbatim> $ cat ~/JDLs/WMS/wms_submission_non_cream_true.jdl [ Executable = "/bin/true"; Arguments = ""; myproxyserver=""; requirements = ! RegExp("cream.*", other.GlueCEUniqueID); RetryCount = 0; ShallowRetryCount = 1; ] $ glite-wms-job-submit -a -c ~/JDLs/WMS/wmp_devel11.conf ~/JDLs/WMS/wms_submission_non_cream_true.jdl Connecting to the service https://devel11.cnaf.infn.it:7443/glite_wms_wmproxy_server ====================== glite-wms-job-submit Success ====================== The job has been successfully submitted to the WMProxy Your job identifier is: https://devel07.cnaf.infn.it:9000/mq4J3d8lBxgBj_TA34Aa8g ========================================================================== $ glite-wms-job-status https://devel07.cnaf.infn.it:9000/mq4J3d8lBxgBj_TA34Aa8g ======================= glite-wms-job-status Success ===================== BOOKKEEPING INFORMATION: Status info for the Job : https://devel07.cnaf.infn.it:9000/mq4J3d8lBxgBj_TA34Aa8g Current Status: Done(Success) Exit code: 0 Status Reason: Job terminated successfully Destination: dangus.itpa.lt:2119/jobmanager-lcgpbs-short Submitted: Fri May 25 09:29:43 2012 CEST ========================================================================== </verbatim> ---+++++ Job cancellation <verbatim> $ glite-wms-job-submit -a -c ~/JDLs/WMS/wmp_devel11.conf ~/JDLs/WMS/wms_sottomissione_non_cream_true.jdl Connecting to the service https://devel11.cnaf.infn.it:7443/glite_wms_wmproxy_server ====================== glite-wms-job-submit Success ====================== The job has been successfully submitted to the WMProxy Your job identifier is: https://devel07.cnaf.infn.it:9000/CFUZC1XPny7j5Dd596torw ========================================================================== $ glite-wms-job-status https://devel07.cnaf.infn.it:9000/CFUZC1XPny7j5Dd596torw ======================= glite-wms-job-status Success ===================== BOOKKEEPING INFORMATION: Status info for the Job : https://devel07.cnaf.infn.it:9000/CFUZC1XPny7j5Dd596torw Current Status: Scheduled Status Reason: Job successfully submitted to Globus Destination: egee02.grid.hku.hk:2119/jobmanager-lcgpbs-dteam Submitted: Fri May 25 09:50:50 2012 CEST ========================================================================== $ glite-wms-job-cancel https://devel07.cnaf.infn.it:9000/CFUZC1XPny7j5Dd596torw Are you sure you want to remove specified job(s) [y/n]y : y Connecting to the service https://devel11.cnaf.infn.it:7443/glite_wms_wmproxy_server ============================= glite-wms-job-cancel Success ============================= The cancellation request has been successfully submitted for the following job(s): - https://devel07.cnaf.infn.it:9000/CFUZC1XPny7j5Dd596torw ======================================================================================== $ glite-wms-job-status https://devel07.cnaf.infn.it:9000/CFUZC1XPny7j5Dd596torw ======================= glite-wms-job-status Success ===================== BOOKKEEPING INFORMATION: Status info for the Job : https://devel07.cnaf.infn.it:9000/CFUZC1XPny7j5Dd596torw Current Status: Cancelled Destination: egee02.grid.hku.hk:2119/jobmanager-lcgpbs-dteam Submitted: Fri May 25 09:50:50 2012 CEST ========================================================================== </verbatim> ---+++++ Collection <verbatim> $ cat /home/dorigoa/JDLs/WMS/coll1.jdl [ type = "collection"; nodes = { [ file ="/home/dorigoa/JDLs/WMS/coll/job.jdl" ; ], [ file ="/home/dorigoa/JDLs/WMS/coll/job.jdl" ; ], [ file ="/home/dorigoa/JDLs/WMS/coll/job.jdl" ; ], [ file ="/home/dorigoa/JDLs/WMS/coll/job.jdl" ; ], [ file ="/home/dorigoa/JDLs/WMS/coll/job.jdl" ; ] }; ] $ cat /home/dorigoa/JDLs/WMS/coll/job.jdl [ Executable = "/bin/ls" ; Arguments = "/tmp" ; RetryCount = 2 ; Stdoutput = "std.out" ; StdError = "std.err" ; OutputSandbox = { "std.out" ,"std.err"} ; InputSandbox = { "data/pippo" }; rank = 1 ; ShallowRetryCount = 2; usertags = [ exe = "ls" ]; requirements = !RegExp("cream.*", other.GlueCEUniqueID); ] $ glite-wms-job-submit -a -c ~/JDLs/WMS/wmp_devel11.conf ~/JDLs/Alessio/UI/jdl/coll1.jdl Connecting to the service https://devel11.cnaf.infn.it:7443/glite_wms_wmproxy_server ====================== glite-wms-job-submit Success ====================== The job has been successfully submitted to the WMProxy Your job identifier is: https://devel07.cnaf.infn.it:9000/NpR759bu84k_72RM5_v-kw ========================================================================== $ glite-wms-job-status https://devel07.cnaf.infn.it:9000/NpR759bu84k_72RM5_v-kw ======================= glite-wms-job-status Success ===================== BOOKKEEPING INFORMATION: Status info for the Job : https://devel07.cnaf.infn.it:9000/NpR759bu84k_72RM5_v-kw Current Status: Done(Success) Exit code: 0 Submitted: Fri May 25 11:14:54 2012 CEST ========================================================================== - Nodes information for: Status info for the Job : https://devel07.cnaf.infn.it:9000/FRKaSvQ-fX0Q5Qd-BFuIGg Current Status: Done(Success) Logged Reason(s): - - Job terminated successfully Exit code: 0 Status Reason: Job terminated successfully Destination: ce1.grid.lebedev.ru:2119/jobmanager-lcgpbs-dteam Submitted: Fri May 25 11:14:54 2012 CEST ========================================================================== Status info for the Job : https://devel07.cnaf.infn.it:9000/SfrHHxw4E3-KZKRx4-hZEQ Current Status: Done(Success) Logged Reason(s): - - Job terminated successfully Exit code: 0 Status Reason: Job terminated successfully Destination: ce01.athena.hellasgrid.gr:2119/jobmanager-pbs-dteam Submitted: Fri May 25 11:14:54 2012 CEST ========================================================================== Status info for the Job : https://devel07.cnaf.infn.it:9000/fDTaOZAeRkvB0Rgt6ahu5Q Current Status: Done(Success) Exit code: 0 Status Reason: Job terminated successfully Destination: ce.reef.man.poznan.pl:2119/jobmanager-pbs-dteam Submitted: Fri May 25 11:14:54 2012 CEST ========================================================================== Status info for the Job : https://devel07.cnaf.infn.it:9000/ly9bq3Y764NQPh73HhjA-Q Current Status: Done(Success) Logged Reason(s): - - Job terminated successfully Exit code: 0 Status Reason: Job terminated successfully Destination: ce01.kallisto.hellasgrid.gr:2119/jobmanager-pbs-dteam Submitted: Fri May 25 11:14:54 2012 CEST ========================================================================== Status info for the Job : https://devel07.cnaf.infn.it:9000/qy5ih7me0ABi2AzO2RZzqA Current Status: Done(Success) Exit code: 0 Status Reason: Job terminated successfully Destination: ceprod03.grid.hep.ph.ic.ac.uk:2119/jobmanager-sge-long Submitted: Fri May 25 11:14:54 2012 CEST ========================================================================== </verbatim> ---+++++ DAG <verbatim> $ cat ~/JDLs/WMS/dag10nodi.jdl [ SignificantAttributes = { "Requirements", "Rank" }; type = "dag"; requirements = !RegExp("cream.*", other.GlueCEUniqueID); nodes = [ nodeA = [ node_type = "edg-jdl"; file ="/home/dorigoa/JDLs/WMS/env.jdl" ; ]; nodeB = [ node_type = "edg-jdl"; file ="/home/dorigoa/JDLs/WMS/sleep.jdl" ; ]; nodeC = [ node_type = "edg-jdl"; file ="/home/dorigoa/JDLs/WMS/touch.jdl" ; ]; nodeD = [ node_type = "edg-jdl"; file ="/home/dorigoa/JDLs/WMS/ls.jdl" ; ]; nodeE = [ node_type = "edg-jdl"; file ="/home/dorigoa/JDLs/WMS/ls.jdl" ; ]; nodeF = [ node_type = "edg-jdl"; file ="/home/dorigoa/JDLs/WMS/env.jdl" ; ]; nodeG = [ node_type = "edg-jdl"; file ="/home/dorigoa/JDLs/WMS/echo.jdl" ; ]; nodeH = [ node_type = "edg-jdl"; file ="/home/dorigoa/JDLs/WMS/random.jdl" ; ]; nodeI = [ node_type = "edg-jdl"; file ="/home/dorigoa/JDLs/WMS/ls.jdl" ; ]; nodeL = [ node_type = "edg-jdl"; file ="/home/dorigoa/JDLs/WMS/cat.jdl" ; ]; nodeM = [ node_type = "edg-jdl"; file ="/home/dorigoa/JDLs/WMS/cat.jdl" ; ]; dependencies = { { nodeA, { nodeB, nodeC, nodeD, nodeE } }, { nodeD, nodeF }, { nodeD, nodeG }, { nodeE, nodeH }, { nodeF, nodeI }, { nodeF, nodeL }, { nodeL, nodeM } } ]; ] glite-wms-job-submit -a -c ~/JDLs/WMS/wmp_devel11.conf ~/JDLs/WMS/dag10nodi.jdl Connecting to the service https://devel11.cnaf.infn.it:7443/glite_wms_wmproxy_server ====================== glite-wms-job-submit Success ====================== The job has been successfully submitted to the WMProxy Your job identifier is: https://devel07.cnaf.infn.it:9000/zSoSo9jMN-MgInLb9e4G0Q ========================================================================== glite-wms-job-status https://devel07.cnaf.infn.it:9000/zSoSo9jMN-MgInLb9e4G0Q ======================= glite-wms-job-status Success ===================== BOOKKEEPING INFORMATION: Status info for the Job : https://devel07.cnaf.infn.it:9000/zSoSo9jMN-MgInLb9e4G0Q Current Status: Done(Success) Exit code: 0 Status Reason: Job terminated successfully Destination: dagman Submitted: Fri May 25 13:43:44 2012 CEST ========================================================================== - Nodes information for: Status info for the Job : https://devel07.cnaf.infn.it:9000/2DZeXp0v5_3zGNueXqFeyw Current Status: Done(Success) Exit code: 0 Status Reason: Job terminated successfully Destination: ce-enmr.chemie.uni-frankfurt.de:2119/jobmanager-lcgpbs-long Submitted: Fri May 25 13:43:44 2012 CEST ========================================================================== Status info for the Job : https://devel07.cnaf.infn.it:9000/7Jk1daotQC_uXmv33bbCgA Current Status: Done(Success) Exit code: 0 Status Reason: Job terminated successfully Destination: ce-enmr.chemie.uni-frankfurt.de:2119/jobmanager-lcgpbs-long Submitted: Fri May 25 13:43:44 2012 CEST ========================================================================== Status info for the Job : https://devel07.cnaf.infn.it:9000/9m_oRvKFp1texXeSR6UUbg Current Status: Done(Success) Exit code: 0 Status Reason: Job terminated successfully Destination: ce-enmr.chemie.uni-frankfurt.de:2119/jobmanager-lcgpbs-cert Submitted: Fri May 25 13:43:44 2012 CEST ========================================================================== Status info for the Job : https://devel07.cnaf.infn.it:9000/QCBl5XHvyAOJPdkzdNO_VQ Current Status: Done(Success) Logged Reason(s): - epilogue failed with error 1 Fri May 25 15:16:18 CEST 2012: Taken token gsiftp://devel11.cnaf.infn.it/var/SandboxDir/QC/https_3a_2f_2fdevel07.cnaf.infn.it_3a9000_2fQCBl5XHvyAOJPdkzdNO_5fVQ/token.txt_0 - Job terminated successfully Exit code: 0 Status Reason: Job terminated successfully Destination: ce-enmr.chemie.uni-frankfurt.de:2119/jobmanager-lcgpbs-verylong Submitted: Fri May 25 13:43:44 2012 CEST ========================================================================== Status info for the Job : https://devel07.cnaf.infn.it:9000/_wtIrUOhjHyHCAiwR3Vcxw Current Status: Done(Success) Exit code: 0 Status Reason: Job terminated successfully Destination: ce-enmr.chemie.uni-frankfurt.de:2119/jobmanager-lcgpbs-medium Submitted: Fri May 25 13:43:44 2012 CEST ========================================================================== Status info for the Job : https://devel07.cnaf.infn.it:9000/epUEnBpmwZhaxLX1fWZYGA Current Status: Done(Success) Logged Reason(s): - - Job terminated successfully Exit code: 0 Status Reason: Job terminated successfully Destination: argoce01.na.infn.it:2119/jobmanager-lcgpbs-cert Submitted: Fri May 25 13:43:44 2012 CEST ========================================================================== Status info for the Job : https://devel07.cnaf.infn.it:9000/jeG1XmOzuRIxW-PAIyIG_w Current Status: Done(Success) Exit code: 0 Status Reason: Job terminated successfully Destination: boalice3.bo.infn.it:2119/jobmanager-lcgpbs-cert Submitted: Fri May 25 13:43:44 2012 CEST ========================================================================== Status info for the Job : https://devel07.cnaf.infn.it:9000/mlrvfleUcbWtRODiOFCxgQ Current Status: Done(Success) Exit code: 0 Status Reason: Job terminated successfully Destination: boalice3.bo.infn.it:2119/jobmanager-lcgpbs-cert Submitted: Fri May 25 13:43:44 2012 CEST ========================================================================== Status info for the Job : https://devel07.cnaf.infn.it:9000/qXXOVm_X7KvNf53g_2_vdQ Current Status: Done(Success) Logged Reason(s): - - Job terminated successfully Exit code: 0 Status Reason: Job terminated successfully Destination: ce-edu.grid.acad.bg:2119/jobmanager-pbs-dteam Submitted: Fri May 25 13:43:44 2012 CEST ========================================================================== Status info for the Job : https://devel07.cnaf.infn.it:9000/vTOKXPI8oMnUmpJZajgS7w Current Status: Done(Success) Logged Reason(s): - - Job terminated successfully Exit code: 0 Status Reason: Job terminated successfully Destination: ce-atlas.ipb.ac.rs:2119/jobmanager-pbs-dteam Submitted: Fri May 25 13:43:44 2012 CEST ========================================================================== Status info for the Job : https://devel07.cnaf.infn.it:9000/wb7RKljxzfNjv0cxbYxjtA Current Status: Done(Success) Logged Reason(s): - - Job terminated successfully Exit code: 0 Status Reason: Job terminated successfully Destination: ce-edu.grid.acad.bg:2119/jobmanager-pbs-dteam Submitted: Fri May 25 13:43:44 2012 CEST ========================================================================== </verbatim> -- Main.MarcoCecchi - 2012-04-26
E
dit
|
A
ttach
|
PDF
|
H
istory
: r16
<
r15
<
r14
<
r13
<
r12
|
B
acklinks
|
V
iew topic
|
M
ore topic actions
Topic revision: r16 - 2012-07-19
-
MarcoCecchi
Home
Site map
CEMon web
CREAM web
Cloud web
Cyclops web
DGAS web
EgeeJra1It web
Gows web
GridOversight web
IGIPortal web
IGIRelease web
MPI web
Main web
MarcheCloud web
MarcheCloudPilotaCNAF web
Middleware web
Operations web
Sandbox web
Security web
SiteAdminCorner web
TWiki web
Training web
UserSupport web
VOMS web
WMS web
WMSMonitor web
WeNMR web
WMS Web
Create New Topic
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
View
Raw View
Print version
Find backlinks
History
More topic actions
Edit
Raw edit
Attach file or image
Edit topic preference settings
Set new parent
More topic actions
Account
Log In
E
dit
A
ttach
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback