Pre-certification of WMS 3.3.6

https://savannah.cern.ch/task/?27731

Repository

http://etics-repository.cern.ch/repository/pm/volatile/repomd/id/4d429a92-e1ae-42e9-95e4-b2d5338349a5/sl5_x86_64_gcc412EPEL

updating from a WMS 3.3.5

Tests:

SVG #4073

OK

SVG #4039

OK

BUG https://savannah.cern.ch/bugs/?92657

The pre-certification consists of simply submit a job to the WMS and scan the syslog file /var/log/message to see if the WMProxy and Manager logged the relevant information required by this bug. Simply log as root on the WMS machine and execute the command:

tail -f /var/log/messages|egrep "wmproxy|manager"

then log into an UI and submit a job (whatever JDL you like) to the WMS. 2 log lines should appear after few seconds in the console running the tail command:

May 18 14:23:07 devel11 glite_wms_wmproxy_server[32565]: submission from lxgrid05.pd.infn.it, DN=/C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Alvise Dorigo, FQAN=/dteam/Role=NULL/Capability=NULL, userid=18702 for jobid=https://devel07.cnaf.infn.it:9000/rkYSfEe5IqsDc17_UpPu3Q

May 18 14:23:10 devel11 glite-wms-workload_manager: jobid https://devel07.cnaf.infn.it:9000/rkYSfEe5IqsDc17_UpPu3Q was matched to destination creamce.gina.sara.nl:8443/cream-pbs-infra

Note in particular the DN,FQAN,JobID information and the UI's hostname.

BUG https://savannah.cern.ch/bugs/?92922

The pre-certification I did was in 2 phases: verification of the bug with EMI1 installation, installation of the new RPM (that will be in the next EMI1 update 17) and verification that the bug disappeared.

In order to reproduce the bug it is sufficient to use this JDL:

[ 
Executable = "/bin/touch" ; 
Arguments = "/foo" ; 
Retrycount = 2; 
usertags = [ exe = "touch" ]; 
VirtualOrganisation="dteam"; 
requirements = ! RegExp("cream.*", other.GlueCEUniqueID); 
] 
and submit it to a WMS EMI1 that have not the fix. Note that this bug occurs when the job lands on a NON-CREAM CE (this is why the requirements attribute specification in the JDL). This should be the result:

glite-wms-job-status https://devel09.cnaf.infn.it:9000/U... 
======================= glite-wms-job-status Success ===================== 
BOOKKEEPING INFORMATION: 
Status info for the Job : https://devel09.cnaf.infn.it:9000/U... 
Current Status: Done(Success) 
Exit code: 0 
Status Reason: Job terminated successfully 
Destination: ce01.dur.scotgrid.ac.uk:2119/jobmanager-lcgpbs-q7d 
Submitted: Wed May 16 16:19:38 2012 CEST 
========================================================================== 
Even if the job should return an exit code 1 (cannot create a file in /, permission denied), the Exit Code reported by the status is 0 as shown above. After applying the fix to this EMI1 WMS, the same JDL should produce the expected exit code (1)

glite-wms-job-status https://devel07.cnaf.infn.it:9000/f... 
======================= glite-wms-job-status Success ===================== 
BOOKKEEPING INFORMATION: 
Status info for the Job : https://devel07.cnaf.infn.it:9000/f... 
Current Status: Done(Exit Code !=0) 
Exit code: 1 
Status Reason: Warning: job exit code != 0 
Submitted: Wed May 16 16:54:25 2012 CEST 
========================================================================== 

BUG https://savannah.cern.ch/bugs/?93673

The bug is "Hopefully fixed". Triggering the problem is very difficult.

Submission to ARC: BUGS https://savannah.cern.ch/bugs/?92742 and https://savannah.cern.ch/bugs/?92924

- Use a CMS proxy and submit to korundi.grid.helsinki.fi ONLY:

======================= glite-wms-job-status Success =====================
BOOKKEEPING INFORMATION:

Status info for the Job : https://devel11.cnaf.infn.it:9000/pLl1nyepSih7NYrivm8T1A
Current Status:     Done (Success)
Exit code:          0
Status Reason:      Job terminated successfully
Destination:        korundi.grid.helsinki.fi:2811/nordugrid-GE-mgrid
Submitted:          Tue Jun  5 22:37:22 2012 CEST
==========================================================================

- Check if the gridmanager stays alive for reasonable amounts of time (occasional crashes are "normal" on WMS nodes).

[root@devel11 mcecchi]# grep STARTING /var/local/condor/log/GridmanagerLog.glite |tail
05/31 14:59:45 ** condor_gridmanager (CONDOR_GRIDMANAGER) STARTING UP
05/31 15:09:57 ** condor_gridmanager (CONDOR_GRIDMANAGER) STARTING UP
05/31 15:10:08 ** condor_gridmanager (CONDOR_GRIDMANAGER) STARTING UP
05/31 16:13:20 ** condor_gridmanager (CONDOR_GRIDMANAGER) STARTING UP
05/31 16:13:30 ** condor_gridmanager (CONDOR_GRIDMANAGER) STARTING UP
05/31 16:23:42 ** condor_gridmanager (CONDOR_GRIDMANAGER) STARTING UP
05/31 16:24:00 ** condor_gridmanager (CONDOR_GRIDMANAGER) STARTING UP
06/01 15:58:27 ** condor_gridmanager (CONDOR_GRIDMANAGER) STARTING UP
06/05 13:06:53 ** condor_gridmanager (CONDOR_GRIDMANAGER) STARTING UP
06/05 22:19:07 ** condor_gridmanager (CONDOR_GRIDMANAGER) STARTING UP

also check that the grid_monitor.sh script is properly read:

[root@devel11 mcecchi]# locate grid_monitor.sh|xargs  ls -lu
-rwxr-xr-x 1 root root 42728 Jun  1 13:20 /opt/condor-7.4.2/libexec/glite/grid_monitor.sh
-rwxr-xr-x 1 root root 38151 Jun  5 23:07 /opt/condor-7.4.2/sbin/grid_monitor.sh 

Generic test of job submission

Submission of 3 jobs (single, collection, dag) and one cancel with NON CREAM CE destination (use 'requirements = ! RegExp("cream.*", other.GlueCEUniqueID);' in the JDL).
Single job
$ cat ~/JDLs/WMS/wms_submission_non_cream_true.jdl
[
Executable = "/bin/true";
Arguments = "";
myproxyserver="";
requirements = ! RegExp("cream.*", other.GlueCEUniqueID);
RetryCount = 0;
ShallowRetryCount = 1;
]

$ glite-wms-job-submit -a -c ~/JDLs/WMS/wmp_devel11.conf ~/JDLs/WMS/wms_submission_non_cream_true.jdl 

Connecting to the service https://devel11.cnaf.infn.it:7443/glite_wms_wmproxy_server


====================== glite-wms-job-submit Success ======================

The job has been successfully submitted to the WMProxy
Your job identifier is:

https://devel07.cnaf.infn.it:9000/mq4J3d8lBxgBj_TA34Aa8g

==========================================================================

$ glite-wms-job-status https://devel07.cnaf.infn.it:9000/mq4J3d8lBxgBj_TA34Aa8g


======================= glite-wms-job-status Success =====================
BOOKKEEPING INFORMATION:

Status info for the Job : https://devel07.cnaf.infn.it:9000/mq4J3d8lBxgBj_TA34Aa8g
Current Status:     Done(Success)
Exit code:          0
Status Reason:      Job terminated successfully
Destination:        dangus.itpa.lt:2119/jobmanager-lcgpbs-short
Submitted:          Fri May 25 09:29:43 2012 CEST
==========================================================================

Job cancellation
$ glite-wms-job-submit -a -c ~/JDLs/WMS/wmp_devel11.conf ~/JDLs/WMS/wms_sottomissione_non_cream_true.jdl

Connecting to the service https://devel11.cnaf.infn.it:7443/glite_wms_wmproxy_server


====================== glite-wms-job-submit Success ======================

The job has been successfully submitted to the WMProxy
Your job identifier is:

https://devel07.cnaf.infn.it:9000/CFUZC1XPny7j5Dd596torw

==========================================================================

$ glite-wms-job-status https://devel07.cnaf.infn.it:9000/CFUZC1XPny7j5Dd596torw


======================= glite-wms-job-status Success =====================
BOOKKEEPING INFORMATION:

Status info for the Job : https://devel07.cnaf.infn.it:9000/CFUZC1XPny7j5Dd596torw
Current Status:     Scheduled
Status Reason:      Job successfully submitted to Globus
Destination:        egee02.grid.hku.hk:2119/jobmanager-lcgpbs-dteam
Submitted:          Fri May 25 09:50:50 2012 CEST
==========================================================================



$ glite-wms-job-cancel https://devel07.cnaf.infn.it:9000/CFUZC1XPny7j5Dd596torw

Are you sure you want to remove specified job(s) [y/n]y : y

Connecting to the service https://devel11.cnaf.infn.it:7443/glite_wms_wmproxy_server


============================= glite-wms-job-cancel Success =============================

The cancellation request has been successfully submitted for the following job(s):

- https://devel07.cnaf.infn.it:9000/CFUZC1XPny7j5Dd596torw

========================================================================================



$ glite-wms-job-status https://devel07.cnaf.infn.it:9000/CFUZC1XPny7j5Dd596torw


======================= glite-wms-job-status Success =====================
BOOKKEEPING INFORMATION:

Status info for the Job : https://devel07.cnaf.infn.it:9000/CFUZC1XPny7j5Dd596torw
Current Status:     Cancelled
Destination:        egee02.grid.hku.hk:2119/jobmanager-lcgpbs-dteam
Submitted:          Fri May 25 09:50:50 2012 CEST
==========================================================================

Collection
$ cat /home/dorigoa/JDLs/WMS/coll1.jdl 
  [
    type = "collection";
    nodes = {
      [
      file ="/home/dorigoa/JDLs/WMS/coll/job.jdl" ; 
      ],
      [
      file ="/home/dorigoa/JDLs/WMS/coll/job.jdl" ; 
      ],
      [
      file ="/home/dorigoa/JDLs/WMS/coll/job.jdl" ; 
      ],
      [
      file ="/home/dorigoa/JDLs/WMS/coll/job.jdl" ; 
      ],
      [
      file ="/home/dorigoa/JDLs/WMS/coll/job.jdl" ; 
      ]
      };
  ]


$ cat /home/dorigoa/JDLs/WMS/coll/job.jdl
[
Executable = "/bin/ls" ;
Arguments = "/tmp" ;
RetryCount = 2 ;
Stdoutput = "std.out" ;
StdError =  "std.err" ;
OutputSandbox = { "std.out" ,"std.err"} ;
InputSandbox = { "data/pippo" };
rank = 1 ;
ShallowRetryCount = 2;
usertags = [ exe = "ls" ];
requirements = !RegExp("cream.*", other.GlueCEUniqueID);
]


$ glite-wms-job-submit -a -c ~/JDLs/WMS/wmp_devel11.conf ~/JDLs/Alessio/UI/jdl/coll1.jdl

Connecting to the service https://devel11.cnaf.infn.it:7443/glite_wms_wmproxy_server


====================== glite-wms-job-submit Success ======================

The job has been successfully submitted to the WMProxy
Your job identifier is:

https://devel07.cnaf.infn.it:9000/NpR759bu84k_72RM5_v-kw

==========================================================================

$ glite-wms-job-status https://devel07.cnaf.infn.it:9000/NpR759bu84k_72RM5_v-kw


======================= glite-wms-job-status Success =====================
BOOKKEEPING INFORMATION:

Status info for the Job : https://devel07.cnaf.infn.it:9000/NpR759bu84k_72RM5_v-kw
Current Status:     Done(Success)
Exit code:          0
Submitted:          Fri May 25 11:14:54 2012 CEST
==========================================================================

- Nodes information for: 
    Status info for the Job : https://devel07.cnaf.infn.it:9000/FRKaSvQ-fX0Q5Qd-BFuIGg
    Current Status:     Done(Success)
    Logged Reason(s):
        - 
        - Job terminated successfully
    Exit code:          0
    Status Reason:      Job terminated successfully
    Destination:        ce1.grid.lebedev.ru:2119/jobmanager-lcgpbs-dteam
    Submitted:          Fri May 25 11:14:54 2012 CEST
==========================================================================
    
    Status info for the Job : https://devel07.cnaf.infn.it:9000/SfrHHxw4E3-KZKRx4-hZEQ
    Current Status:     Done(Success)
    Logged Reason(s):
        - 
        - Job terminated successfully
    Exit code:          0
    Status Reason:      Job terminated successfully
    Destination:        ce01.athena.hellasgrid.gr:2119/jobmanager-pbs-dteam
    Submitted:          Fri May 25 11:14:54 2012 CEST
==========================================================================
    
    Status info for the Job : https://devel07.cnaf.infn.it:9000/fDTaOZAeRkvB0Rgt6ahu5Q
    Current Status:     Done(Success)
    Exit code:          0
    Status Reason:      Job terminated successfully
    Destination:        ce.reef.man.poznan.pl:2119/jobmanager-pbs-dteam
    Submitted:          Fri May 25 11:14:54 2012 CEST
==========================================================================
    
    Status info for the Job : https://devel07.cnaf.infn.it:9000/ly9bq3Y764NQPh73HhjA-Q
    Current Status:     Done(Success)
    Logged Reason(s):
        - 
        - Job terminated successfully
    Exit code:          0
    Status Reason:      Job terminated successfully
    Destination:        ce01.kallisto.hellasgrid.gr:2119/jobmanager-pbs-dteam
    Submitted:          Fri May 25 11:14:54 2012 CEST
==========================================================================
    
    Status info for the Job : https://devel07.cnaf.infn.it:9000/qy5ih7me0ABi2AzO2RZzqA
    Current Status:     Done(Success)
    Exit code:          0
    Status Reason:      Job terminated successfully
    Destination:        ceprod03.grid.hep.ph.ic.ac.uk:2119/jobmanager-sge-long
    Submitted:          Fri May 25 11:14:54 2012 CEST
==========================================================================

DAG

$ cat ~/JDLs/WMS/dag10nodi.jdl
  [
    SignificantAttributes = { "Requirements", "Rank" };
    type = "dag";
    requirements = !RegExp("cream.*", other.GlueCEUniqueID);   

   nodes = [
         nodeA = [
        node_type = "edg-jdl";
        file ="/home/dorigoa/JDLs/WMS/env.jdl" ; 
      ];
       nodeB = [
        node_type = "edg-jdl";
        file ="/home/dorigoa/JDLs/WMS/sleep.jdl" ; 
      ];
       nodeC = [
        node_type = "edg-jdl";
        file ="/home/dorigoa/JDLs/WMS/touch.jdl" ; 
      ];
         nodeD = [
        node_type = "edg-jdl";
        file ="/home/dorigoa/JDLs/WMS/ls.jdl" ;
      ];
         nodeE = [
        node_type = "edg-jdl";
        file ="/home/dorigoa/JDLs/WMS/ls.jdl" ;
      ];
         nodeF = [
        node_type = "edg-jdl";
        file ="/home/dorigoa/JDLs/WMS/env.jdl" ;
      ];
         nodeG = [
        node_type = "edg-jdl";
        file ="/home/dorigoa/JDLs/WMS/echo.jdl" ;
      ];
         nodeH = [
        node_type = "edg-jdl";
        file ="/home/dorigoa/JDLs/WMS/random.jdl" ;
      ];
         nodeI = [
        node_type = "edg-jdl";
        file ="/home/dorigoa/JDLs/WMS/ls.jdl" ;
      ];
         nodeL = [
        node_type = "edg-jdl";
        file ="/home/dorigoa/JDLs/WMS/cat.jdl" ;
      ];
         nodeM = [
        node_type = "edg-jdl";
        file ="/home/dorigoa/JDLs/WMS/cat.jdl" ;
      ];
         dependencies = {
        { nodeA, { nodeB, nodeC, nodeD, nodeE } },
            { nodeD, nodeF },
            { nodeD, nodeG }, 
            { nodeE, nodeH },
            { nodeF, nodeI },
            { nodeF, nodeL },  
            { nodeL, nodeM }
      }
    ];
  ]


glite-wms-job-submit -a -c ~/JDLs/WMS/wmp_devel11.conf ~/JDLs/WMS/dag10nodi.jdl

Connecting to the service https://devel11.cnaf.infn.it:7443/glite_wms_wmproxy_server


====================== glite-wms-job-submit Success ======================

The job has been successfully submitted to the WMProxy
Your job identifier is:

https://devel07.cnaf.infn.it:9000/zSoSo9jMN-MgInLb9e4G0Q

==========================================================================
glite-wms-job-status https://devel07.cnaf.infn.it:9000/zSoSo9jMN-MgInLb9e4G0Q


======================= glite-wms-job-status Success =====================
BOOKKEEPING INFORMATION:

Status info for the Job : https://devel07.cnaf.infn.it:9000/zSoSo9jMN-MgInLb9e4G0Q
Current Status:     Done(Success)
Exit code:          0
Status Reason:      Job terminated successfully
Destination:        dagman
Submitted:          Fri May 25 13:43:44 2012 CEST
==========================================================================

- Nodes information for: 
    Status info for the Job : https://devel07.cnaf.infn.it:9000/2DZeXp0v5_3zGNueXqFeyw
    Current Status:     Done(Success)
    Exit code:          0
    Status Reason:      Job terminated successfully
    Destination:        ce-enmr.chemie.uni-frankfurt.de:2119/jobmanager-lcgpbs-long
    Submitted:          Fri May 25 13:43:44 2012 CEST
==========================================================================
    
    Status info for the Job : https://devel07.cnaf.infn.it:9000/7Jk1daotQC_uXmv33bbCgA
    Current Status:     Done(Success)
    Exit code:          0
    Status Reason:      Job terminated successfully
    Destination:        ce-enmr.chemie.uni-frankfurt.de:2119/jobmanager-lcgpbs-long
    Submitted:          Fri May 25 13:43:44 2012 CEST
==========================================================================
    
    Status info for the Job : https://devel07.cnaf.infn.it:9000/9m_oRvKFp1texXeSR6UUbg
    Current Status:     Done(Success)
    Exit code:          0
    Status Reason:      Job terminated successfully
    Destination:        ce-enmr.chemie.uni-frankfurt.de:2119/jobmanager-lcgpbs-cert
    Submitted:          Fri May 25 13:43:44 2012 CEST
==========================================================================
    
    Status info for the Job : https://devel07.cnaf.infn.it:9000/QCBl5XHvyAOJPdkzdNO_VQ
    Current Status:     Done(Success)
    Logged Reason(s):
        - epilogue failed with error 1
Fri May 25 15:16:18 CEST 2012: Taken token gsiftp://devel11.cnaf.infn.it/var/SandboxDir/QC/https_3a_2f_2fdevel07.cnaf.infn.it_3a9000_2fQCBl5XHvyAOJPdkzdNO_5fVQ/token.txt_0
        - Job terminated successfully
    Exit code:          0
    Status Reason:      Job terminated successfully
    Destination:        ce-enmr.chemie.uni-frankfurt.de:2119/jobmanager-lcgpbs-verylong
    Submitted:          Fri May 25 13:43:44 2012 CEST
==========================================================================
    
    Status info for the Job : https://devel07.cnaf.infn.it:9000/_wtIrUOhjHyHCAiwR3Vcxw
    Current Status:     Done(Success)
    Exit code:          0
    Status Reason:      Job terminated successfully
    Destination:        ce-enmr.chemie.uni-frankfurt.de:2119/jobmanager-lcgpbs-medium
    Submitted:          Fri May 25 13:43:44 2012 CEST
==========================================================================
    
    Status info for the Job : https://devel07.cnaf.infn.it:9000/epUEnBpmwZhaxLX1fWZYGA
    Current Status:     Done(Success)
    Logged Reason(s):
        - 
        - Job terminated successfully
    Exit code:          0
    Status Reason:      Job terminated successfully
    Destination:        argoce01.na.infn.it:2119/jobmanager-lcgpbs-cert
    Submitted:          Fri May 25 13:43:44 2012 CEST
==========================================================================
    
    Status info for the Job : https://devel07.cnaf.infn.it:9000/jeG1XmOzuRIxW-PAIyIG_w
    Current Status:     Done(Success)
    Exit code:          0
    Status Reason:      Job terminated successfully
    Destination:        boalice3.bo.infn.it:2119/jobmanager-lcgpbs-cert
    Submitted:          Fri May 25 13:43:44 2012 CEST
==========================================================================
    
    Status info for the Job : https://devel07.cnaf.infn.it:9000/mlrvfleUcbWtRODiOFCxgQ
    Current Status:     Done(Success)
    Exit code:          0
    Status Reason:      Job terminated successfully
    Destination:        boalice3.bo.infn.it:2119/jobmanager-lcgpbs-cert
    Submitted:          Fri May 25 13:43:44 2012 CEST
==========================================================================
    
    Status info for the Job : https://devel07.cnaf.infn.it:9000/qXXOVm_X7KvNf53g_2_vdQ
    Current Status:     Done(Success)
    Logged Reason(s):
        - 
        - Job terminated successfully
    Exit code:          0
    Status Reason:      Job terminated successfully
    Destination:        ce-edu.grid.acad.bg:2119/jobmanager-pbs-dteam
    Submitted:          Fri May 25 13:43:44 2012 CEST
==========================================================================
    
    Status info for the Job : https://devel07.cnaf.infn.it:9000/vTOKXPI8oMnUmpJZajgS7w
    Current Status:     Done(Success)
    Logged Reason(s):
        - 
        - Job terminated successfully
    Exit code:          0
    Status Reason:      Job terminated successfully
    Destination:        ce-atlas.ipb.ac.rs:2119/jobmanager-pbs-dteam
    Submitted:          Fri May 25 13:43:44 2012 CEST
==========================================================================
    
    Status info for the Job : https://devel07.cnaf.infn.it:9000/wb7RKljxzfNjv0cxbYxjtA
    Current Status:     Done(Success)
    Logged Reason(s):
        - 
        - Job terminated successfully
    Exit code:          0
    Status Reason:      Job terminated successfully
    Destination:        ce-edu.grid.acad.bg:2119/jobmanager-pbs-dteam
    Submitted:          Fri May 25 13:43:44 2012 CEST
==========================================================================
    

-- MarcoCecchi - 2012-04-26

Edit | Attach | PDF | History: r16 < r15 < r14 < r13 < r12 | Backlinks | Raw View | More topic actions
Topic revision: r16 - 2012-07-19 - MarcoCecchi
 
This site is powered by the TWiki collaboration platformCopyright © 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback