WMS 3.4 EMI2 precertification report

Testing server on: devel09.cnaf.infn.it (SL5) and devel08.cnaf.infn.it (SL6)

17/7/2012 Using repository http://etics-repository.cern.ch/repository/pm/volatile/repomd/id/0cbf8e3b-7082-40d4-8667-906d62de8742//sl6_x86_64_gcc412EPEL

31/07/2012 Tested list-match, job-submit, delegation, job-info installing from testing repository

9/7/2012 Using repository http://etics-repository.cern.ch/repository/pm/volatile/repomd/id/c3998ef5-2745-43b8-b7c7-7a1cc8e069b2/sl5_x86_64_gcc412EPEL

12/06/2012 Using repository http://etics-repository.cern.ch/repository/pm/volatile/repomd/id/c3998ef5-2745-43b8-b7c7-7a1cc8e069b2/sl5_x86_64_gcc412EPEL

28/03/2012 Using repository http://etics-repository.cern.ch/repository/pm/volatile/repomd/id/d0831cfb-9bcf-4588-8a6d-edf8fbd7d3b3/sl5_x86_64_gcc412EPEL

23/02/2012 Using repository http://etics-repository.cern.ch/repository/pm/volatile/repomd/id/bbf07458-a777-4062-94e3-409a2b481cac/sl5_x86_64_gcc412EPEL

21/03/2012 Using repository http://etics-repository.cern.ch/repository/pm/volatile/repomd/id/73c67e19-8837-4ced-ab48-c7cadefdf589/sl5_x86_64_gcc412EPEL

Installation

clean install reports attached at the bottom of this document

Testing client on: devel15.cnaf.infn.it (SL5)

23/04/2012 Using repository http://etics-repository.cern.ch/repository/pm/volatile/repomd/id/1651ef71-bd32-48bb-9eb7-692297c56ad1/sl5_x86_64_gcc412EPEL

18/04/2012 Using repository http://etics-repository.cern.ch/repository/pm/volatile/repomd/id/9bca1e56-da1c-4b92-8a96-28bbe62a21b9/sl5_x86_64_gcc412EPEL

18/04/2012 UI Installation report (excerpts)

Upgraded from EMI-1 UI:

[root@devel15 mcecchi]# head -30 /etc/yum.repos.d/ui_emi2.repo 
# ETICS Automatic Repomd (Yum) Repository
#
# Submission ID: 9bca1e56-da1c-4b92-8a96-28bbe62a21b9
# Platform: sl5_x86_64_gcc412EPEL
# Date: 18/04/2012 11:28:33
#
# Project Name: emi
# Configuration Name: emi-wms-ui_B_3_4
# Configuration Version: 3.4.99-0
#
# Configuration Version: 3.4.99-0
#
# Build Reports: http://etics-repository.cern.ch/repository/reports/id/9bca1e56-da1c-4b92-8a96-28bbe62a21b9/sl5_x86_64_gcc412EPEL/-/reports/index.html
#
# Author: CN=Paolo Andreetto, L=Padova, OU=Personal Certificate, O=INFN, C=IT

[ETICS-volatile-build-9bca1e56-da1c-4b92-8a96-28bbe62a21b9-sl5_x86_64_gcc412EPEL]
name=ETICS build of emi-wms-ui_B_3_4 on sl5_x86_64_gcc412EPEL
baseurl=http://etics-repository.cern.ch/repository/pm/volatile/repomd/id/9bca1e56-da1c-4b92-8a96-28bbe62a21b9/sl5_x86_64_gcc412EPEL
protect=0
enabled=1
gpgcheck=0
priority=40

# 31 packages available in this repository:
#
# emi-delegation-interface (2.0.3-1.sl5)
# http://etics-repository.cern.ch/repository/uuid/volatile/c8fcf5d9-5667-4e34-b574-eaeea26fc503/emi-delegation-interface-2.0.3-1.sl5.noarch.rpm
#
# emi-delegation-java (2.2.0-2.sl5)

[root@devel15 mcecchi]# head -30 /etc/yum.repos.d/emi-2-rc4-sl5.repo 
[core]
name=name=SL 5 base
baseurl=http://linuxsoft.cern.ch/scientific/5x/$basearch/SL
   http://ftp.scientificlinux.org/linux/scientific/5x/$basearch/SL
        http://ftp1.scientificlinux.org/linux/scientific/5x/$basearch/SL
        http://ftp2.scientificlinux.org/linux/scientific/5x/$basearch/SL
protect=0

[extras]
name=epel
mirrorlist=http://mirrors.fedoraproject.org/mirrorlist?repo=epel-5&arch=$basearch
protect=0

[EGI-trustanchors]
name=EGI-trustanchors
baseurl=http://repository.egi.eu/sw/production/cas/1/current/
gpgkey=http://repository.egi.eu/sw/production/cas/1/GPG-KEY-EUGridPMA-RPM-3
gpgcheck=1
enabled=1

[EMI-2-RC4-base]
name=EMI 2 RC4 Base Repository
baseurl=http://emisoft.web.cern.ch/emisoft/dist/EMI/2/RC4/sl5/$basearch/base
gpgkey=http://emisoft.web.cern.ch/emisoft/dist/EMI/2/RPM-GPG-KEY-emi
priority=45
protect=0
enabled=1
gpgcheck=0

[EMI-2-RC4-third-party]

[root@devel15 mcecchi]# rpm -qa | grep glite-wms
glite-wms-utils-exception-3.3.0-2.sl5
glite-wms-ui-api-python-3.4.99-0.sl5
glite-wms-ui-commands-3.4.99-0.sl5
glite-wms-brokerinfo-access-lib-3.4.99-0.sl5
glite-wms-wmproxy-api-java-3.4.99-0.sl5
glite-wms-wmproxy-api-python-3.4.99-0.sl5
glite-wms-utils-classad-3.3.0-2.sl5
glite-wms-wmproxy-api-cpp-3.4.99-0.sl5
glite-wms-brokerinfo-access-3.4.99-0.sl5
[root@devel15 mcecchi]# 

Deployment/Configuration/Installation notes (for SERVER)

LCMAPS needs configuration FIXED

In /etc/lcmaps/lcmaps.db.gridftp, /etc/lcmaps/lcmaps.db replace:

path = /usr/lib64/modules ---> path = /usr/lib64/lcmaps

and several misconfiguration errors in yaim-core (yaim for lcas-lcmaps-gt4-interface) that cause:

Mar 21 12:56:10 devel09 glite_wms_wmproxy_server: lcmaps: (null): LCMAPS
initialization failure
Mar 21 12:56:48 devel09 glite_wms_wmproxy_server: lcmaps:
/etc/lcmaps/lcmaps.db:53: [error] variable 'good' already defined at
line 9;
Mar 21 12:56:48 devel09 glite_wms_wmproxy_server: lcmaps:
/etc/lcmaps/lcmaps.db:53: [error] pervious value: 'lcmaps_dummy_good.mod'.
Mar 21 12:56:48 devel09 glite_wms_wmproxy_server: lcmaps:
/etc/lcmaps/lcmaps.db:56: [error] variable 'localaccount' already
defined at line 11;
Mar 21 12:56:48 devel09 glite_wms_wmproxy_server: lcmaps:
/etc/lcmaps/lcmaps.db:56: [error] pervious value:
'lcmaps_localaccount.mod -gridmapfile /etc/grid-security/grid-mapfile'.
Mar 21 12:56:48 devel09 glite_wms_wmproxy_server: lcmaps:
/etc/lcmaps/lcmaps.db:61: [error] variable 'poolaccount' already defined
at line 14;

As of June 15th, 2012, a fresh install with these artefacts:

[root@devel09 SandboxDir]# rpm -qa | grep lcmaps lcmaps-1.5.5-1.el5 lcas-lcmaps-gt4-interface-0.2.1-4.el5 lcmaps-without-gsi-1.5.5-1.el5 lcmaps-plugins-basic-1.5.0-3.el5 lcmaps-plugins-voms-1.5.3-1.el5

seems to have the proper configuration

LB

mode=both: PASS (used throughout in the tests)

mode=server: PASS

mcecchi 17/05/2012. Tried with devel07 as EMI-2 LB server.

[mcecchi@ui ~]$ glite-wms-job-submit -a --endpoint https://devel09.cnaf.infn.it:7443/glite_wms_wmproxy_server ls.jdl 

Connecting to the service https://devel09.cnaf.infn.it:7443/glite_wms_wmproxy_server


====================== glite-wms-job-submit Success ======================

The job has been successfully submitted to the WMProxy
Your job identifier is:

https://devel07.cnaf.infn.it:9000/YSm5s0kWCCZLjPjF2cp4vw

==========================================================================


[mcecchi@ui ~]$ glite-wms-job-logging-info https://devel07.cnaf.infn.it:9000/YSm5s0kWCCZLjPjF2cp4vw

===================== glite-job-logging-info Success =====================

LOGGING INFORMATION:

Printing info for the Job : https://devel07.cnaf.infn.it:9000/YSm5s0kWCCZLjPjF2cp4vw
 
   ---
Event: RegJob
- Source                     =    NetworkServer
- Timestamp                  =    Thu May 17 17:38:24 2012 CEST
   ---
Event: Accepted
- Source                     =    NetworkServer
- Timestamp                  =    Thu May 17 17:38:24 2012 CEST
   ---
Event: EnQueued
- Result                     =    START
- Source                     =    NetworkServer
- Timestamp                  =    Thu May 17 17:38:24 2012 CEST
   ---
Event: EnQueued
- Result                     =    OK
- Source                     =    NetworkServer
- Timestamp                  =    Thu May 17 17:38:24 2012 CEST
   ---
Event: DeQueued
- Source                     =    WorkloadManager
- Timestamp                  =    Thu May 17 17:38:25 2012 CEST
   ---
Event: Pending
- Source                     =    WorkloadManager
- Timestamp                  =    Thu May 17 17:38:26 2012 CEST
==========================================================================

Major changes (only for SERVER)

* GLUE2 purchasing and match-making

- There is one and only ISM. User requirements are appended to WmsRequirements, that included authenticatin checks and various constraints on the resource side. According to what type of purchaser is selected, the user and wms requirements expressions must be adapted accordingly. They can also be crafted in such a way to be able to manage the co-existence of GLUE 1.3 and 2.0 purchasers. The selection is done with these attributes:

EnableIsmIiGlue13Purchasing = ; EnableIsmIiGlue20Purchasing == ;

The default is c) EnableIsmIiGlue13Purchasing = true & EnableIsmIiGlue20Purchasing = false:

The GLUE 1.3 purchaser is always executed before the GLUE 2.0 one. In this way, when the latter starts it can find:

c1) esiste gią in ISM un G13Ad con id = G2Ad.id: allora si fa semplicemente un C13.Ad.Update(G2Ad) (di fatto merge). La risorsa risultante sarą matchabile con requirements G2 o G13 indipendentemente inoltre, in questo modo, anche la parte di storage continuerą a funzionare purchč eventuali attributi nel JDL riguardanti lo storage siano espressi in G13.

c2) in ISM non c'e' un AD con lo stesso G2Ad.id: in questo caso non facciamo alcun update ma un bel insert dell'ad che sarą matchabile solo con requirements G2 compliant nel JDL.

For example, the authZ check would become:

if(isdefined(other.GlueCEAccessControlBaseRule) , member(CertificateSubject,other.GlueCEAccessControlBaseRule), member(CertificateSubject,other.GLUE2.Computing.Endpoint.Policy));

* Argus authZ for access control

for the moment only openssl old DN format is accepted.

* dagmanless DAGs

Nodes are re-evaluated in the WM every MatchRetryPeriod

* Condor 7.8.0

It will fully enable submission to GRAM5 CEs. The feature requires testing submission to LCG-CE, ARC CE, OSG gram2 and gram5

* refactoring of authN/Z in wmp

requires to test authN/Z at large, not only for Argus

* support for RFC proxies (https://savannah.cern.ch/bugs/?88128)

LCG-CE not expected to support them. OSG, ARC and CREAM CEs will.

Conf. Changes

- removed DagmanLogLevel

- removed DagmanLogRotate

- removed DagmanMaxPre

- removed MaxDAGRunningNodes

- removed wmp tools

- removed par. bulkMM

- removed par. filelist

- removed par. locallogger in wmp

- removed asynch purchasing

- added IiGlueLib (= "libglite_wms_ism_ii_purchaser.so.0";)

- added EnableIsmIiGlue20Purchasing and EnableIsmIiGlue13Purchasing

- added LDAP filters for g2 ii purchasers, IsmIiG2LDAPCEFilterExt & IsmIiG2LDAPSEFilterExt

- added attributes for .so in dlopen (helper, purchasers)

- par. MatchRetryPeriod reduced 600 -> 300. It now also indicates the time interval with which DAGs are evaluated by the WM engine

- SbRetryDifferentProtocols = true by default in WM conf

- WmsRequirements now has also the queue requirements, that were hard-coded before

LIST OF BUGS

Server

Vulnerability bug in ICE's proxy renewal (Advisory-SVG-2012-4073) Yes / Done
Vulnerability bug in ICE's proxy renewal (Advisory-SVG-2012-4039) Yes / Done
glite-wms-ice-proxy-renew can block undefinitely (https://savannah.cern.ch/bugs/?95584) Yes / Done
yaim-wms: set ldap query filter expression for GLUE2 in WMS configuration (https://savannah.cern.ch/bugs/?91563) Yes / Done
JobController logfile name is misspelled (https://savannah.cern.ch/bugs/?32611) Yes / Done
glite-wms-job-submit doesn't always pick up other WMProxy endpoints if load on WMS is high (https://savannah.cern.ch/bugs/?40370) Yes / Done
[wms] GlueServiceStatusInfo content is ugly (https://savannah.cern.ch/bugs/?48068) Yes / Done
[ yaim-wms ] CeForwardParameters should include several more parameters (https://savannah.cern.ch/bugs/?61315) Yes / Done
WMS needs cron job to kill stale GridFTP processes (https://savannah.cern.ch/bugs/?67489) Yes / Done
WMProxy code requires FQANs (https://savannah.cern.ch/bugs/?72169) Yes / Done
WMProxy limiter should log more at info level (https://savannah.cern.ch/bugs/?72280) Yes / Done
There's an un-catched out_of_range exception in the ICE component (https://savannah.cern.ch/bugs/?75099) Yes / Done
ICE jobdir issue - 1 bad CE can block all jobs (https://savannah.cern.ch/bugs/?80751) Yes / Done
Cancellation of a dag's node doesn't work (https://savannah.cern.ch/bugs/?81651) Yes / Done
Deregistration of a proxy (2) (https://savannah.cern.ch/bugs/?83453) Yes / Done
Last LB event logged by ICE when job aborted for proxy expired should be ABORTED (https://savannah.cern.ch/bugs/?84839) Yes / Done
queryDb has 2 bugs handling user's options (see ggus ticket for more info) (https://savannah.cern.ch/bugs/?86267) Yes / Done
WMproxy GACLs do not support wildcards (as they used to do) (https://savannah.cern.ch/bugs/?87261) Yes / Done
Submission with rfc proxy doesn't work (https://savannah.cern.ch/bugs/?88128) Yes / Done
Semi-automated service backends configuration for WMS (task #23845, EMI Development Tracker, Done) Yes / Done
GlueServiceStatusInfo: ?? (https://savannah.cern.ch/bugs/?89435) Yes / Done
EMI WMS wmproxy rpm doesn't set execution permissions as it used to do in gLite (https://savannah.cern.ch/bugs/?89506) Yes / Done
EMI WMS WM might abort resubmitted jobs (https://savannah.cern.ch/bugs/?89508) Yes / Done
EMI WMS wmproxy init.d script stop/start problems (https://savannah.cern.ch/bugs/?89577) Yes / Done
glite-wms-check-daemons.sh should not restart daemons under the admin's nose (https://savannah.cern.ch/bugs/?89674) Yes / Done
Wrong location for PID file (https://savannah.cern.ch/bugs/?89857) Yes / Done
WMS logs should keep track of the last 90 days (https://savannah.cern.ch/bugs/?89871) Yes / Done
LB failover mechanism in WMproxy needs to be reviewed (https://savannah.cern.ch/bugs/?90034) Yes / Done
yaim-wms creates wms.proxy in wrong path (https://savannah.cern.ch/bugs/?90129) Yes / Done
cron job deletes /var/proxycache (https://savannah.cern.ch/bugs/?90640) Yes / Done
yaim-wms changes for Argus based authZ (https://savannah.cern.ch/bugs/?90760) Yes / Done
ICE should use env vars in its configuration (https://savannah.cern.ch/bugs/?90830) Yes / Done
ICE log verbosity should be reduced to 300 (https://savannah.cern.ch/bugs/?91078) Yes / Done
Make some WMS init scripts System V compatible (https://savannah.cern.ch/bugs/?91115) Yes / Done
move lcmaps.log from /var/log/glite to WMS_LOCATION_LOG (https://savannah.cern.ch/bugs/?91484) Yes / Done
WMS: use logrotate uniformly in ice, lm, jc, wm, wmp (https://savannah.cern.ch/bugs/?91486) Yes / Done
remove several dismissed parameters from the WMS configuration (https://savannah.cern.ch/bugs/?91488) Yes / Done
Pid file of ICE and WM has glite ownership (https://savannah.cern.ch/bugs/?91834) Yes / Done
The job replanner should be configurable (https://savannah.cern.ch/bugs/?91941) Yes / Done
some sensible information should be logged on syslog (https://savannah.cern.ch/bugs/?92657) Yes / Done
EMI-1 WMS does not propagate user job exit code (https://savannah.cern.ch/bugs/?92922) Yes / Done
glite_wms_wmproxy_server segfaults after job registration failure (https://savannah.cern.ch/bugs/?94845) Yes / Done

UI

glite-wms-job-status needs a better handing of purged-related error code. (https://savannah.cern.ch/bugs/?85063) Yes / Done
WMS UI depends on a buggy libtar (on SL5 at least) (https://savannah.cern.ch/bugs/?89443) Yes / Done mcecchi 18/04/12 WARNING: [mcecchi@devel15 ~]$ ldd /usr/bin/glite-wms-job-submit 'pipe' grep libtar, gives: libtar.so.1 => /usr/lib64/libtar.so.1 (0x000000350c000000), the dependency still exists
getaddrinfo() sorts results according to RFC3484, but random ordering is lost (https://savannah.cern.ch/bugs/?82779) Yes / Done
glite-wms-job-status needs a json-compliant format (https://savannah.cern.ch/bugs/?82995) Yes / Done
Files specified with absolute paths shouldn't be used with inputsandboxbaseuri (https://savannah.cern.ch/bugs/?74832) Yes / Done
Too much flexibility in JDL syntax (https://savannah.cern.ch/bugs/?75802) Yes / Done
glite-wms-job-list-match --help show an un-implemented (and useless) option "--default-jdl" (https://savannah.cern.ch/bugs/?87444) Yes / Done
WMS-UI: update "KNOWN PROBLEMS AND CAVEATS" section of WMPROXY guide (https://savannah.cern.ch/bugs/?90003) Yes / Done
WMS UI emi-wmproxy-api-cpp and emi-wms-ui-api-python still use use gethostbyaddr/gethostbyname (https://savannah.cern.ch/bugs/?89668) Yes / Done
pkg-config info for wmproxy-api-cpp should be enriched (https://savannah.cern.ch/bugs/?85799) Yes / Done

UI BASIC FUNCTIONALITY TESTS

23/04/2012:

[mcecchi@devel15 ~]$ glite-wms-job-logging-info --debug --logfile log.txt --output output.txt https://emitb1.ics.muni.cz:9000/q6tmTtMPfuXAN1xY8YPXGA 

VirtualOrganisation value :dteam
####
Configuration file loaded: //etc/glite_wmsui_cmd_var.conf 
 [
 ]
#### Mon Apr 23 12:33:07 2012 Debug Message ####
Selected Virtual Organisation name (from proxy certificate extension): dteam
VOMS configuration file successfully loaded:

 [
 ]
#### End Dehttps://savannah.cern.ch/bugs/?###

**** Error: UI_GENERIC_ERROR_ON_JOB_ID ****  
Error retrieving information on JobID "https://emitb1.ics.muni.cz:9000/q6tmTtMPfuXAN1xY8YPXGA". 
Error description: Unable to retrieve the Job Events for: https://emitb1.ics.muni.cz:9000/q6tmTtMPfuXAN1xY8YPXGA
glite.lb.Exception: edg_wll_JobLog: No such file or directory: no matching events found
   at glite::lb::Job::log[./src/Job.cpp:123]



                           *** Log file created ***
Possible Errors and Debug messages have been printed in the following file:
/home/mcecchi/log.txt

[mcecchi@devel15 ~]$ glite-wms-job-status --debug https://emitb1.ics.muni.cz:9000/0GK5XFRiKmSEs8jBY_2Alg 

VirtualOrganisation value :dteam
####
Configuration file loaded: //etc/glite_wmsui_cmd_var.conf 
 [
 ]
#### Mon Apr 23 12:33:18 2012 Debug Message ####
Selected Virtual Organisation name (from proxy certificate extension): dteam
VOMS configuration file successfully loaded:

 [
 ]
#### End Dehttps://savannah.cern.ch/bugs/?###

#### Mon Apr 23 12:33:18 2012 Debug API ####
The function 'Job::getStatus' has been called with the following parameter(s):
>> https://emitb1.ics.muni.cz:9000/0GK5XFRiKmSEs8jBY_2Alg
>> 1
#### End Dehttps://savannah.cern.ch/bugs/?###

**** Error: API_NATIVE_ERROR ****  
Error while calling the "Job:getStatus" native api 
Unable to retrieve the status for: https://emitb1.ics.muni.cz:9000/0GK5XFRiKmSEs8jBY_2Alg
glite.lb.Exception: edg_wll_JobStatus: Operation not permitted: matching jobs found but authorization failed
   at glite::lb::Job::status[./src/Job.cpp:87]




                           *** Log file created ***
Possible Errors and Debug messages have been printed in the following file:
/tmp/glite-wms-job-status_505_7423_1335177198.log

18/04/2012:


[mcecchi@devel15 ~]$ voms-proxy-info --all
subject   : /C=IT/O=INFN/OU=Personal Certificate/L=CNAF/CN=Marco Cecchi/CN=proxy
issuer    : /C=IT/O=INFN/OU=Personal Certificate/L=CNAF/CN=Marco Cecchi
identity  : /C=IT/O=INFN/OU=Personal Certificate/L=CNAF/CN=Marco Cecchi
type      : proxy
strength  : 1024 bits
path      : /tmp/x509up_u505
timeleft  : 46:41:06
key usage : Digital Signature, Key Encipherment, Data Encipherment
=== VO dteam extension information ===
VO        : dteam
subject   : /C=IT/O=INFN/OU=Personal Certificate/L=CNAF/CN=Marco Cecchi
issuer    : /C=GR/O=HellasGrid/OU=hellasgrid.gr/CN=voms.hellasgrid.gr
attribute : /dteam/Role=NULL/Capability=NULL
attribute : /dteam/NGI_IT/Role=NULL/Capability=NULL
timeleft  : 22:41:06
uri       : voms.hellasgrid.gr:15004
[mcecchi@devel15 ~]$ 
[mcecchi@devel15 ~]$ glite-wms-job-delegate-proxy -d mcecchi --endpoint https://wms013.cnaf.infn.it:7443/glite_wms_wmproxy_server

Connecting to the service https://wms013.cnaf.infn.it:7443/glite_wms_wmproxy_server


================== glite-wms-job-delegate-proxy Success ==================

Your proxy has been successfully delegated to the WMProxy(s):
https://wms013.cnaf.infn.it:7443/glite_wms_wmproxy_server
with the delegation identifier: mcecchi

==========================================================================

[mcecchi@devel15 ~]$ glite-wms-job-info -d mcecchi --endpoint https://wms013.cnaf.infn.it:7443/glite_wms_wmproxy_server

Connecting to the service https://wms013.cnaf.infn.it:7443/glite_wms_wmproxy_server


======================= glite-wms-job-info Success =======================

Your proxy delegated to the endpoint https://wms013.cnaf.infn.it:7443/glite_wms_wmproxy_server
with delegationID mcecchi: 

Subject     : /C=IT/O=INFN/OU=Personal Certificate/L=CNAF/CN=Marco Cecchi/CN=proxy/CN=proxy
Issuer      : /C=IT/O=INFN/OU=Personal Certificate/L=CNAF/CN=Marco Cecchi/CN=proxy
Identity    : /C=IT/O=INFN/OU=Personal Certificate/L=CNAF/CN=Marco Cecchi/CN=proxy
Type        : proxy
Strength    : 512
StartDate   : 18 Apr 2012 - 14:38:44
Expiration  : 20 Apr 2012 - 13:03:44
Timeleft    : 1 days 22 hours 19 min 12 sec 
=== VO dteam extension information ===
VO          : dteam
Subject     : /C=IT/O=INFN/OU=Personal Certificate/L=CNAF/CN=Marco Cecchi
Issuer      : /C=GR/O=HellasGrid/OU=hellasgrid.gr/CN=voms.hellasgrid.gr
URI         : voms.hellasgrid.gr:15004
Attribute   : /dteam/Role=NULL/Capability=NULL
Attribute   : /dteam/NGI_IT/Role=NULL/Capability=NULL
StartTime   : 18 Apr 2012 - 13:04:36
Expiration  : 19 Apr 2012 - 13:04:36
Timeleft    : 22 hours 20 min 04 sec 

==========================================================================


[mcecchi@devel15 ~]$ cat zipped_isb.jdl 
[
Executable = "/bin/echo";
EnableZIppedISB=true;
a=[b=23];
Arguments = "Hello";
StdOutput = "out.log";
StdError = "err.log";
InputSandbox = {"Test.sh"};
OutputSandbox = {"out.log", "err.log"};
requirements = !RegExp("cream.*", other.GlueCEUniqueID);;
AllowZippedISB = true;
rank=a.b*3;
myproxyserver="";
#myproxyserver="myproxy.cnaf.infn.it";
RetryCount = 0;
ShallowRetryCount = -1;
]

[mcecchi@devel15 ~]$ glite-wms-job-submit --version

WMS User Interface version  3.3.3
Copyright (C) 2008 by ElsagDatamat SpA


Connecting to the service https://devel09.cnaf.infn.it:7443/glite_wms_wmproxy_server


WMProxy Version: 3.3.1

[mcecchi@devel15 ~]$ glite-wms-job-submit -d mcecchi --endpoint https://wms013.cnaf.infn.it:7443/glite_wms_wmproxy_server zipped_isb.jdl 

Connecting to the service https://wms013.cnaf.infn.it:7443/glite_wms_wmproxy_server


====================== glite-wms-job-submit Success ======================

The job has been successfully submitted to the WMProxy
Your job identifier is:

https://wms013.cnaf.infn.it:9000/hEQOXd73TkRGYlkCQDxOmA

==========================================================================

[mcecchi@devel15 ~]$ glite-wms-job-status --json https://wms013.cnaf.infn.it:9000/hEQOXd73TkRGYlkCQDxOmA

{ "result": "success" , "https://wms013.cnaf.infn.it:9000/hEQOXd73TkRGYlkCQDxOmA": { "Current Status": "Done(Success)", "Exit code": "0", "Status Reason": "Job terminated successfully", "Destination": "ceprod03.grid.hep.ph.ic.ac.uk:2119/jobmanager-sge-long", "Submitted": "Wed Apr 18 14:26:51 2012 CEST", "Done": "1334752083"}   }
[mcecchi@devel15 ~]$ glite-wms-job-status https://wms013.cnaf.infn.it:9000/hEQOXd73TkRGYlkCQDxOmA


======================= glite-wms-job-status Success =====================
BOOKKEEPING INFORMATION:

Status info for the Job : https://wms013.cnaf.infn.it:9000/hEQOXd73TkRGYlkCQDxOmA
Current Status:     Done(Success)
Exit code:          0
Status Reason:      Job terminated successfully
Destination:        ceprod03.grid.hep.ph.ic.ac.uk:2119/jobmanager-sge-long
Submitted:          Wed Apr 18 14:26:51 2012 CEST
==========================================================================
[mcecchi@devel15 ~]$ glite-wms-job-logging-info https://wms013.cnaf.infn.it:9000/hEQOXd73TkRGYlkCQDxOmA

===================== glite-wms-job-logging-info Success =====================

LOGGING INFORMATION:

Printing info for the Job : https://wms013.cnaf.infn.it:9000/hEQOXd73TkRGYlkCQDxOmA
 
   ---
Event: RegJob
- Source                     =    NetworkServer
- Timestamp                  =    Wed Apr 18 14:26:51 2012 CEST
   ---
Event: Accepted
- Source                     =    NetworkServer
- Timestamp                  =    Wed Apr 18 14:26:52 2012 CEST
   ---
Event: EnQueued
- Result                     =    START
- Source                     =    NetworkServer
- Timestamp                  =    Wed Apr 18 14:26:52 2012 CEST
   ---
Event: EnQueued
- Result                     =    OK
- Source                     =    NetworkServer
- Timestamp                  =    Wed Apr 18 14:26:52 2012 CEST
   ---
Event: DeQueued
- Source                     =    WorkloadManager
- Timestamp                  =    Wed Apr 18 14:26:53 2012 CEST
   ---
Event: Match
- Dest id                    =    ceprod03.grid.hep.ph.ic.ac.uk:2119/jobmanager-sge-long
- Source                     =    WorkloadManager
- Timestamp                  =    Wed Apr 18 14:26:53 2012 CEST
   ---
Event: UserTag
- Source                     =    WorkloadManager
- Timestamp                  =    Wed Apr 18 14:26:53 2012 CEST
   ---
Event: EnQueued
- Result                     =    START
- Source                     =    WorkloadManager
- Timestamp                  =    Wed Apr 18 14:26:53 2012 CEST
   ---
Event: EnQueued
- Result                     =    OK
- Source                     =    WorkloadManager
- Timestamp                  =    Wed Apr 18 14:26:53 2012 CEST
   ---
Event: DeQueued
- Source                     =    JobController
- Timestamp                  =    Wed Apr 18 14:26:54 2012 CEST
   ---
Event: Transfer
- Destination                =    LogMonitor
- Result                     =    START
- Source                     =    JobController
- Timestamp                  =    Wed Apr 18 14:26:54 2012 CEST
   ---
Event: Transfer
- Destination                =    LogMonitor
- Result                     =    OK
- Source                     =    JobController
- Timestamp                  =    Wed Apr 18 14:26:54 2012 CEST
   ---
Event: Accepted
- Source                     =    LogMonitor
- Timestamp                  =    Wed Apr 18 14:26:57 2012 CEST
   ---
Event: Transfer
- Destination                =    LRMS
- Result                     =    OK
- Source                     =    LogMonitor
- Timestamp                  =    Wed Apr 18 14:27:15 2012 CEST
   ---
Event: Running
- Source                     =    LogMonitor
- Timestamp                  =    Wed Apr 18 14:27:39 2012 CEST
   ---
Event: ReallyRunning
- Source                     =    LogMonitor
- Timestamp                  =    Wed Apr 18 14:28:03 2012 CEST
   ---
Event: Done
- Source                     =    LogMonitor
- Timestamp                  =    Wed Apr 18 14:28:03 2012 CEST
==========================================================================

[mcecchi@devel15 ~]$ glite-wms-job-output https://wms013.cnaf.infn.it:9000/hEQOXd73TkRGYlkCQDxOmA

Connecting to the service https://wms013.cnaf.infn.it:7443/glite_wms_wmproxy_server


================================================================================

         JOB GET OUTPUT OUTCOME

Output sandbox files for the job:
https://wms013.cnaf.infn.it:9000/hEQOXd73TkRGYlkCQDxOmA
have been successfully retrieved and stored in the directory:
/tmp/jobOutput/mcecchi_hEQOXd73TkRGYlkCQDxOmA

================================================================================


[mcecchi@devel15 ~]$ cat /tmp/jobOutput/mcecchi_hEQOXd73TkRGYlkCQDxOmA/
err.log  out.log  
[mcecchi@devel15 ~]$ cat /tmp/jobOutput/mcecchi_hEQOXd73TkRGYlkCQDxOmA/out.log 
Hello

[mcecchi@devel15 ~]$ glite-wms-job-list-match -a --endpoint https://wms013.cnaf.infn.it:7443/glite_wms_wmproxy_server ls.jdl |wc -l
179

[mcecchi@devel15 ~]$ glite-wms-job-submit -a --endpoint https://wms013.cnaf.infn.it:7443/glite_wms_wmproxy_server coll_1.jdl 

Connecting to the service https://wms013.cnaf.infn.it:7443/glite_wms_wmproxy_server


====================== glite-wms-job-submit Success ======================

The job has been successfully submitted to the WMProxy
Your job identifier is:

https://wms013.cnaf.infn.it:9000/ZPQkLewFTjI0eaJTwsFwJw

==========================================================================


[mcecchi@devel15 ~]$ glite-wms-job-submit -a --endpoint https://wms013.cnaf.infn.it:7443/glite_wms_wmproxy_server coll_1.jdl 

[mcecchi@devel15 ~]$ glite-wms-job-status https://wms013.cnaf.infn.it:9000/ZPQkLewFTjI0eaJTwsFwJw


======================= glite-wms-job-status Success =====================
BOOKKEEPING INFORMATION:

Status info for the Job : https://wms013.cnaf.infn.it:9000/ZPQkLewFTjI0eaJTwsFwJw
Current Status:     Waiting
Submitted:          Wed Apr 18 17:01:56 2012 CEST
==========================================================================

- Nodes information for: 
    Status info for the Job : https://wms013.cnaf.infn.it:9000/CS3gAQD4M3clf3DFdMpOew
    Current Status:     Scheduled
    Status Reason:      unavailable
    Destination:        cccreamceli08.in2p3.fr:8443/cream-sge-long
    Submitted:          Wed Apr 18 17:01:56 2012 CEST
==========================================================================
    

[mcecchi@devel15 ~]$ cat perusal.jdl 
Executable = "testperusal.sh";
StdOutput = "stdout";
StdError = "stderr";
InputSandbox = {"testperusal.sh"};
OutputSandbox = {"stdout", "stderr", "test"};
PerusalTimeInterval = 15;
PerusalFileEnable = true;
Requirements = true;


[mcecchi@devel15 ~]$ cat testperusal.sh 
#!/bin/sh
for i in `seq 1 100`; do
   sleep 10
   echo prova >> test
   err
done

[mcecchi@devel15 ~]$ glite-wms-job-perusal --get -f test --dir . https://wms013.cnaf.infn.it:9000/orrIQnq8aek3-jHFRxn5kA

Connecting to the service https://wms013.cnaf.infn.it:7443/glite_wms_wmproxy_server


====================== glite-wms-job-perusal Success ======================

No files to be retrieved for the job:
https://wms013.cnaf.infn.it:9000/orrIQnq8aek3-jHFRxn5kA

==========================================================================


[mcecchi@devel15 ~]$ date
Thu Apr 19 10:36:48 CEST 2012
[mcecchi@devel15 ~]$ date
Thu Apr 19 10:53:39 CEST 2012
[mcecchi@devel15 ~]$ glite-wms-job-perusal --get -f test --dir . https://wms013.cnaf.infn.it:9000/orrIQnq8aek3-jHFRxn5kA

Connecting to the service https://wms013.cnaf.infn.it:7443/glite_wms_wmproxy_server


====================== glite-wms-job-perusal Success ======================

The retrieved files have been successfully stored in:
/home/mcecchi

==========================================================================


--------------------------------------------------------------------------
file 1/1: test-20120419095258_1-20120419095258_1
--------------------------------------------------------------------------

SERVER BASIC FUNCTIONALITY TESTS

- GLUE2.0 purchasing PASS

set

EnableIsmIiGlue13Purchasing = false; EnableIsmIiGlue20Purchasing = true;

in the WM conf, and create a file (the name doesn't matter) like this:

echo "[ command = \"ism_dump\"; ]"> jobdir/new/a

Start the WM, and check that no crash occurs after 3/4 purchasing cycles (decrease the PurchasingRate in case). The dump file will report, for example, entries like this:

    [
        expiry_time = 900;
        update_time = 1340106474;
        id = "ce.csl.ee.upatras.gr:8443/cream-pbs-dteam/";
        info =
            [
                GlueCEAccessControlBaseRule = GLUE2.Computing.Endpoint.Policy;
                GlobusResourceContactString = CEid;
                QueueName = GLUE2.Computing.Share.MappingQueue;
                GlueHostApplicationSoftwareRunTimeEnvironment = GLUE2.ApplicationEnvironment.AppName;
                GlueCEInfoHostName = GLUE2.Computing.Share.OtherInfo.InfoProviderHost;
                GlueCEUniqueID = CEid;
                LRMSType = GLUE2.Computing.Manager.ProductName;
                CEid = "ce.csl.ee.upatras.gr:8443/cream-pbs-dteam";
                GLUE2 =
                    [
                        Computing =
                            [
                                Endpoint =
                                    [
                                        Semantics =
                                           {
                                              "http://wiki.italiangrid.org/twiki/bin/view/CREAM/UserGuide"
                                           };
                                        Implementor = "gLite";
                                        WSDL =
                                           {
                                              "https://ce.csl.ee.upatras.gr:8443/ce-cream/services/CREAM2?wsdl"
                                           };
                                        ServingState = "production";
                                        Name = "ce.csl.ee.upatras.gr_org.glite.ce.CREAM";
                                        ID = "ce.csl.ee.upatras.gr_org.glite.ce.CREAM";
                                        HealthState = "ok";
                                        SupportedProfile =
                                           {
                                              "http://www.ws-i.org/Profiles/BasicProfile-1.0.html"
                                           };
                                        Staging = "staginginout";
                                        Capability =
                                           {
                                              "executionmanagement.jobexecution"
                                           };
                                        QualityLevel = "production";
                                        StartTime = "2012-06-16T16:30:03Z";
                                        TrustedCA =
                                           {
                                              "IGTF"
                                           };
                                        ImplementationName = "CREAM";
                                        DowntimeInfo = "See the GOC DB for downtimes: https://goc.egi.eu/";
                                        InterfaceName = "org.glite.ce.CREAM";
                                        OtherInfo =
                                            [
                                                HostDN = "/C=GR/O=HellasGrid/OU=upatras.gr/CN=ce.csl.ee.upatras.gr";
                                                MiddlewareVersion = "2.0.0-1";
                                                MiddlewareName = "EMI"
                                            ];
                                        Technology = "webservice";
                                        IssuerCA = "/C=GR/O=HellasGrid/OU=Certification Authorities/CN=HellasGrid CA 2006";
                                        Policy =
                                           {
                                              "VO:ops",
                                              "VO:dteam",
                                              "VO:see"
                                           };
                                        URL = "https://ce.csl.ee.upatras.gr8443/ce-cream/services";
                                        HealthStateInfo = "/etc/init.d/tomcat5 is already running (25631)";
                                        ImplementationVersion = "1.14";
                                        InterfaceVersion = "2.1";
                                        JobDescription =
                                           {
                                              "glite:jdl"
                                           }
                                    ];
                                Manager =
                                    [
                                        Name = "Computing Manager on ce.csl.ee.upatras.gr";
                                        ID = "ce.csl.ee.upatras.gr_ComputingElement_Manager";
                                        ProductVersion = "2.5.7";
                                        ProductName = "torque"
                                    ];
                                Share =
                                    [
                                        DefaultCPUTime = 2880;
                                        MaxWallTime = 4320;
                                        RunningJobs = 0;
                                        WaitingJobs = 444444;
                                        ServingState = "production";
                                        MaxCPUTime = 2880;
                                        FreeSlots = 0;
                                        EstimatedAverageWaitingTime = 2146660842;
                                        ID = "ops_ops_ce.csl.ee.upatras.gr_ComputingElement";
                                        MappingQueue = "dteam";
                                        TotalJobs = 0;
                                        DefaultWallTime = 60;
                                        Description = "Share of dteam for ops";
                                        OtherInfo =
                                            [
                                                InfoProviderName = "glite-ce-glue2-share-static";
                                                InfoProviderHost = "ce.csl.ee.upatras.gr";
                                                CREAMCEId = "ce.csl.ee.upatras.gr:8443/cream-pbs-dteam";
                                                InfoProviderVersion = "1.0"
                                            ];
                                        Policy =
                                           {
                                              ""
                                           };
                                        EstimatedWorstWaitingTime = 2146660842;
                                        MaxRunningJobs = 999999999
                                    ];
                                Service =
                                    [
                                        Name = "Computing Service ce.csl.ee.upatras.gr_ComputingElement";
                                        Type = "org.glite.ce.CREAM";
                                        ID = "ce.csl.ee.upatras.gr_ComputingElement";
                                        Capability =
                                           {
                                              "executionmanagement.jobexecution"
                                           };
                                        QualityLevel = "production";
                                        Complexity = "endpointType=2, share=2, resource=1"
                                    ]
                       Benchmark =
                           {

                                  [
                                      Name = "Benchmark HEP-SPEC06";
                                      Type = "HEP-SPEC06"; 
                                      ID = "ce.csl.ee.upatras.gr_HEP-SPEC06";
                                      Value = "8.35-HEP-SPEC06"
                                  ],

                                  [
                                      Name = "Benchmark specfp2000";
                                      Type = "specfp2000"; 
                                      ID = "ce.csl.ee.upatras.gr_specfp2000";
                                      Value = 1.763000000000000E+03
                                  ],

                                  [
                                      Name = "Benchmark specint2000";
                                      Type = "specint2000"; 
                                      ID = "ce.csl.ee.upatras.gr_specint2000";
                                      Value = 2.088000000000000E+03
                                  ]
                           }; 
                        ApplicationEnvironment =
                            [
                                AppName =
                                   {
                                      "GLITE-3_0_0",
                                      "GLITE-3_1_0",
                                      "GLITE-3_2_0",
                                      "LCG-2",
                                      "LCG-2_1_0",
                                      "LCG-2_1_1",
                                      "LCG-2_2_0",
                                      "LCG-2_3_0",
                                      "LCG-2_3_1",
                                      "LCG-2_4_0",
                                      "LCG-2_5_0",
                                      "LCG-2_6_0",
                                      "LCG-2_7_0",
                                      "R-GMA",
                                      "vo-see-fortran-4.1.2"
                                   }
                            ]
                    ]; 
                PurchasedBy = "ism_ii_g2_purchaser"
            ]
    ]

Note the attribute PurchasedBy = "ism_ii_g2_purchaser"

- GLUE2.0 match-making PASS

After having enabled G2 purchasing in the WM, as in the above test, created a requirements expression like this in your JDL:

requirements = other.GLUE2.Computing.Endpoint.Name == "ce.csl.ee.upatras.gr_org.glite.ce.CREAM";

where the proper values have been taken from the ISM dump. Then, in the WM section, comment out the WmsRequirements expression and replace it with an expression referring to valid GLUE2.0 attributes (WmsRequirements=true can be good, to start) . Restart the WMPROXY. Now submit a list-match request, making sure that no default is taken by some inherited configurations.

[mcecchi@ui ~]$ glite-wms-job-list-match -a -c glite_wmsui.conf  --endpoint https://devel09.cnaf.infn.it:7443/glite_wms_wmproxy_server ls_g2.jdl 

Connecting to the service https://devel09.cnaf.infn.it:7443/glite_wms_wmproxy_server

==========================================================================

           COMPUTING ELEMENT IDs LIST 
 The following CE(s) matching your job requirements have been found:

   *CEId*
 - ce.csl.ee.upatras.gr:8443/cream-pbs-dteam
 - ce.csl.ee.upatras.gr:8443/cream-pbs-see

==========================================================================

- Submit single jobs PASS

CREAM:

[dorigoa@cream-12 ~]$ glite-wms-job-status https://devel09.cnaf.infn.it:9000/Ydl1Nc77DozScQ5PO-cUOg

======================= glite-wms-job-status Success =====================
BOOKKEEPING INFORMATION:

Status info for the Job : https://devel09.cnaf.infn.it:9000/Ydl1Nc77DozScQ5PO-cUOg
Current Status:     Done (Success)
Logged Reason(s):
    - job completed
    - Job Terminated Successfully
Exit code:          0
Status Reason:      Job Terminated Successfully
Destination:        cream-23.pd.infn.it:8443/cream-lsf-creamtest2
Submitted:          Fri Jun 15 10:55:32 2012 CEST
==========================================================================

LCG_CE: PASS

ARC on nordugrid: PASS

NOTE: failure is expected.

[mcecchi@devel15 ~]$ cat arc.jdl 
[
Executable = "/bin/echo";
Arguments = "Hello";
StdOutput = "out.log";
StdError = "err.log";
InputSandbox = {"Test.sh"};
OutputSandbox = {"out.log", "err.log"};
requirements = RegExp("krokar.ijs.si:2811/nordugrid-torque-arc", other.GlueCEUniqueID);
//requirements = RegExp("arc.univ.kiev.ua:2811/nordugrid-torque-arc", other.GlueCEUniqueID) || RegExp("krokar.ijs.si:2811/nordugrid-torque-arc", other.GlueCEUniqueID);
AllowZippedISB = true;
rank=0;
myproxyserver="";
RetryCount = 0;
ShallowRetryCount = -1;
]

======================= glite-wms-job-status Success =====================
BOOKKEEPING INFORMATION:

Status info for the Job : https://devel09.cnaf.infn.it:9000/_s_DOLlFGpbmYkntMQAWcQ
Current Status:     Done(Failed)
Logged Reason(s):
    - Got a job held event, reason: globus_ftp_client: the server responded with an error 451 Requested queue arc does not match any of available queues.
    - Got a job held event, reason: globus_ftp_client: the server responded with an error 451 Requested queue arc does not match any of available queues.
Exit code:          0
Status Reason:      Got a job held event, reason: globus_ftp_client: the server responded with an error 451 Requested queue arc does not match any of available queues.
Destination:        krokar.ijs.si:2811/nordugrid-torque-arc
Submitted:          Fri Jun 15 16:52:08 2012 CEST
==========================================================================

- Cancel a single job PASS

[mcecchi@devel15 ~]$ glite-wms-job-cancel https://wms013.cnaf.infn.it:9000/3ZIiK2Blh5T428odOzes6A

Are you sure you want to remove specified job(s) [y/n]y : y

Connecting to the service https://wms013.cnaf.infn.it:7443/glite_wms_wmproxy_server


============================= glite-wms-job-cancel Success =============================

The cancellation request has been successfully submitted for the following job(s):

- https://wms013.cnaf.infn.it:9000/3ZIiK2Blh5T428odOzes6A

========================================================================================

[mcecchi@devel15 ~]$ glite-wms-job-status https://wms013.cnaf.infn.it:9000/3ZIiK2Blh5T428odOzes6A


======================= glite-wms-job-status Success =====================
BOOKKEEPING INFORMATION:

Status info for the Job : https://wms013.cnaf.infn.it:9000/3ZIiK2Blh5T428odOzes6A
Current Status:     Cancelled
Logged Reason(s):
    - Aborted by user
Destination:        spacina-ce.scope.unina.it:2119/jobmanager-lcgpbs-cert
Submitted:          Wed Apr 18 14:38:59 2012 CEST
==========================================================================

- Resubmission PASS

[mcecchi@ui ~]$ glite-wms-job-submit -a --endpoint https://devel09.cnaf.infn.it:7443/glite_wms_wmproxy_server resubmit.jdl 

Connecting to the service https://devel09.cnaf.infn.it:7443/glite_wms_wmproxy_server


====================== glite-wms-job-submit Success ======================

The job has been successfully submitted to the WMProxy
Your job identifier is:

https://devel09.cnaf.infn.it:9000/DWYRiSW5V_kBGieT_w3uIA

==========================================================================

[mcecchi@ui ~]$ glite-wms-job-status https://devel09.cnaf.infn.it:9000/DWYRiSW5V_kBGieT_w3uIA


======================= glite-wms-job-status Success =====================
BOOKKEEPING INFORMATION:

Status info for the Job : https://devel09.cnaf.infn.it:9000/DWYRiSW5V_kBGieT_w3uIA
Current Status:     Scheduled 
Status Reason:      unavailable
Destination:        lcgce07.gridpp.rl.ac.uk:8443/cream-pbs-grid700M
Submitted:          Fri Jul 20 11:40:34 2012 CEST
==========================================================================

[mcecchi@ui ~]$ glite-wms-job-status https://devel09.cnaf.infn.it:9000/DWYRiSW5V_kBGieT_w3uIA


======================= glite-wms-job-status Success =====================
BOOKKEEPING INFORMATION:

Status info for the Job : https://devel09.cnaf.infn.it:9000/DWYRiSW5V_kBGieT_w3uIA
Current Status:     Aborted 
Logged Reason(s):
    - Prologue failed with error 1
    - Prologue failed with error 1; reason=1
    - Prologue failed with error 1
    - Prologue failed with error 1
    - Prologue failed with error 1
    - Prologue failed with error 1
    - pbs_reason=1; Prologue failed with error 1
Status Reason:      hit job shallow retry count (3)
Destination:        grid-ce.physik.rwth-aachen.de:8443/cream-pbs-dteam
Submitted:          Fri Jul 20 11:40:34 2012 CEST
==========================================================================

18 May, 16:35:47 -I: [Info] operator()(/home/mcecchi/wms34/emi.wms.wms-manager/src/dispatcher_utils.cpp:227): new jobresubmit for https://devel07.cnaf.infn.it:9000/53qnhHmZ4TxQdrwtPUBgeA
18 May, 16:35:47 -D: [Debug] schedule_at(/home/mcecchi/wms34/emi.wms.wms-manager/src/events.cpp:156): timed event scheduled at 1337351748 with priority 20
18 May, 16:35:47 -D: [Debug] operator()(/home/mcecchi/wms34/emi.wms.wms-manager/src/submit_request.cpp:280): considering (re)submit of https://devel07.cnaf.infn.it:9000/53qnhHmZ4TxQdrwtPUBgeA
18 May, 16:35:47 -D: [Debug] operator()(/home/mcecchi/wms34/emi.wms.wms-manager/src/submit_request.cpp:672): found token number 0 for job https://devel07.cnaf.infn.it:9000/53qnhHmZ4TxQdrwtPUBgeA
18 May, 16:35:47 -I: [Info] checkRequirement(/home/mcecchi/wms34/emi.wms.wms-matchmaking/src/matchmakerISMImpl.cpp:105): MM for job: https://devel07.cnaf.infn.it:9000/53qnhHmZ4TxQdrwtPUBgeA (845/1145 [0] )
18 May, 16:35:47 -I: [Info] operator()(/home/mcecchi/wms34/emi.wms.wms-manager/src/submit_request.cpp:773): https://devel07.cnaf.infn.it:9000/53qnhHmZ4TxQdrwtPUBgeA delivered

- Zipped ISB

[mcecchi@devel15 ~]$ cat ls.jdl 
[
Executable = "/bin/echo";
EnableZIppedISB=true;
Arguments = "Hello";
StdOutput = "out.log";
StdError = "err.log";
InputSandbox = {"Test.sh"};
OutputSandbox = {"out.log", "err.log"};
requirements = !RegExp("cream.*", other.GlueCEUniqueID);;
AllowZippedISB = true;
myproxyserver="";
RetryCount = 0;
ShallowRetryCount = -1;
]

[mcecchi@devel15 ~]$ glite-wms-job-submit -a --endpoint https://devel09.cnaf.infn.it:7443/glite_wms_wmproxy_server ls.jdl 

Connecting to the service https://devel09.cnaf.infn.it:7443/glite_wms_wmproxy_server


====================== glite-wms-job-submit Success ======================

The job has been successfully submitted to the WMProxy
Your job identifier is:

https://devel09.cnaf.infn.it:9000/v5UveUGejZu0a7M8OtFKYQ

=====================================================================

======================= glite-wms-job-status Success =====================
BOOKKEEPING INFORMATION:

Status info for the Job : https://devel09.cnaf.infn.it:9000/v5UveUGejZu0a7M8OtFKYQ
Current Status:     Done(Success)
Exit code:          0
Status Reason:      Job terminated successfully
Destination:        ce01.grid.auth.gr:2119/jobmanager-pbs-dteam
Submitted:          Fri Jun 15 16:59:38 2012 CEST
==========================================================================

[mcecchi@devel15 ~]$ glite-wms-job-logging-info -v 3 https://devel09.cnaf.infn.it:9000/v5UveUGejZu0a7M8OtFKYQ|grep -i zipped
        AllowZippedISB = true; 
        ZippedISB = { "ISBfiles_WdwexlgUtW6sCsCZruA1Xg_0.tar.gz" }; 
        EnableZIppedISB = true;

- Submit a collection PASS

[mcecchi@ui ~]$ cat coll.jdl 
[
   type = "collection";
   VirtualOrganisation = "dteam";
   nodes = {
      [file ="ls.jdl";],
    [file ="ls.jdl";],
    [file ="ls.jdl";],
    [file ="ls.jdl";],
    [file ="ls.jdl";],
    [file ="ls.jdl";]
  };
] 
[mcecchi@ui ~]$ glite-wms-job-submit -a --endpoint https://devel09.cnaf.infn.it:7443/glite_wms_wmproxy_server coll.jdl 

Connecting to the service https://devel09.cnaf.infn.it:7443/glite_wms_wmproxy_server


====================== glite-wms-job-submit Success ======================

The job has been successfully submitted to the WMProxy
Your job identifier is:

https://devel09.cnaf.infn.it:9000/X7Es4Uwvw2NfKWVrHYpnmA

==========================================================================

======================= glite-wms-job-status Success =====================
BOOKKEEPING INFORMATION:

Status info for the Job : https://devel09.cnaf.infn.it:9000/X7Es4Uwvw2NfKWVrHYpnmA
Current Status:     Running 
Submitted:          Fri Mar 23 13:32:12 2012 CET
==========================================================================

- Nodes information for: 
    Status info for the Job : https://devel09.cnaf.infn.it:9000/CbMUUPjOSvy6HEv5O5Ef6A
    Current Status:     Running 
    Status Reason:      Job successfully submitted to Globus
    Destination:        atlasce01.na.infn.it:2119/jobmanager-lcgpbs-cert
    Submitted:          Fri Mar 23 13:32:12 2012 CET
==========================================================================
    
    Status info for the Job : https://devel09.cnaf.infn.it:9000/F2YEUJYYQKUF0kMY5htIAQ
    Current Status:     Running 
    Status Reason:      Job successfully submitted to Globus
    Destination:        ce-atlas.ipb.ac.rs:2119/jobmanager-pbs-dteam
    Submitted:          Fri Mar 23 13:32:12 2012 CET
==========================================================================
    
    Status info for the Job : https://devel09.cnaf.infn.it:9000/FNvTOawm0gZVI68_PzlayQ
    Current Status:     Running 
    Status Reason:      Job successfully submitted to Globus
    Destination:        ce01.dur.scotgrid.ac.uk:2119/jobmanager-lcgpbs-q2d
    Submitted:          Fri Mar 23 13:32:12 2012 CET
==========================================================================
    
    Status info for the Job : https://devel09.cnaf.infn.it:9000/Hka_ksPeEcS6zMvydbqMbw
    Current Status:     Running 
    Status Reason:      Job successfully submitted to Globus
    Destination:        ce-grid.obspm.fr:2119/jobmanager-pbs-dteam
    Submitted:          Fri Mar 23 13:32:12 2012 CET
==========================================================================
    
    Status info for the Job : https://devel09.cnaf.infn.it:9000/M8t3s7nJCFJkztWElfFozA
    Current Status:     Running 
    Status Reason:      Job successfully submitted to Globus
    Destination:        egee.irb.hr:2119/jobmanager-lcgpbs-mon
    Submitted:          Fri Mar 23 13:32:12 2012 CET
==========================================================================
    
    Status info for the Job : https://devel09.cnaf.infn.it:9000/yf13CQ8UvIUJzgIkI5t-kQ
    Current Status:     Running 
    Status Reason:      Job successfully submitted to Globus
    Destination:        ce-enmr.chemie.uni-frankfurt.de:2119/jobmanager-lcgpbs-long
    Submitted:          Fri Mar 23 13:32:12 2012 CET
==========================================================================

- Submit DAG jobs: IMPORTANT PASS

[ type = "dag"; VirtualOrganisation = "dteam"; nodes = [ nodeA = [file ="ls.jdl";]; nodeB = [file ="ls.jdl";]; Dependencies = { {nodeA, nodeB} } ]; ]

Dependency fulfilled:

======================= glite-wms-job-status Success =====================
BOOKKEEPING INFORMATION:

Status info for the Job : https://devel07.cnaf.infn.it:9000/mAf1L_q5stx4YlYnzt4Dtw
Current Status:     Running 
Destination:        dagman
Submitted:          Fri May 18 23:54:54 2012 CEST
==========================================================================

- Nodes information for: 
    Status info for the Job : https://devel07.cnaf.infn.it:9000/q2dBjsinbryF_H9R7glxEw
    Current Status:     Scheduled 
    Status Reason:      unavailable
    Destination:        phoebe.htc.biggrid.nl:8443/cream-pbs-medium
    Submitted:          Fri May 18 23:54:54 2012 CEST
==========================================================================
    Status info for the Job : https://devel07.cnaf.infn.it:9000/tGmm0h0EA0chivy52RSINg
    Current Status:     Submitted 
    Submitted:          Fri May 18 23:54:54 2012 CEST
==========================================================================

- Status of a DAG job PASS Also, DAG jobs are finally Aborted when one node Aborts!

======================= glite-wms-job-status Success =====================
BOOKKEEPING INFORMATION:

Status info for the Job : https://devel07.cnaf.infn.it:9000/ilhOCkYZ3mXIlMf4pSTjHg
Current Status:     Aborted 
Status Reason:      DAG completed with failed jobs
Destination:        dagman
Submitted:          Sat May 19 00:07:53 2012 CEST
==========================================================================

- Nodes information for: 
    Status info for the Job : https://devel07.cnaf.infn.it:9000/5-m_QV3ELj6VI6_5gSzl2A
    Current Status:     Aborted 
    Status Reason:      parents have aborted
    Submitted:          Sat May 19 00:07:53 2012 CEST
==========================================================================
    
    Status info for the Job : https://devel07.cnaf.infn.it:9000/b_0fxArwL63pzeoqxLn98Q
    Current Status:     Aborted 
    Logged Reason(s):
        - Cannot move ISB (retry_copy ${globus_transfer_cmd} gsiftp://devel09.cnaf.infn.it:2811/var/SandboxDir/b_/https_3a_2f_2fdevel07.cnaf.infn.it_3a9000_2fb_5f0fxArwL63pzeoqxLn98Q/input/Test.sh file:///pool/4644914.lcgbatch02.gridpp.rl.ac.uk/CREAM092148144/Test.sh): 
error: globus_ftp_client: the server responded with an error
500 500-Command failed. : globus_l_gfs_file_open failed.
500-globus_xio: Unable to open file /var/SandboxDir/b_/https_3a_2f_2fdevel07.cnaf.infn.it_3a9000_2fb_5f0fxArwL63pzeoqxLn98Q/input/Test.sh
500-globus_xio: System error in open: Permission denied
500-globus_xio: A system call failed: Permission denied
500 End.
        - pbs_reason=1
    Status Reason:       failed (LB query failed)
    Destination:        lcgce09.gridpp.rl.ac.uk:8443/cream-pbs-grid6000M
    Submitted:          Sat May 19 00:07:53 2012 CEST
==========================================================================

Even cancelling a node causes termination of children jobs

======================= glite-wms-job-status Success =====================
BOOKKEEPING INFORMATION:

Status info for the Job : https://devel09.cnaf.infn.it:9000/Gfrjk_6LnE-jYhR3yXgzlg
Current Status:     Aborted 
Status Reason:      DAG completed with failed jobs
Destination:        dagman
Submitted:          Mon Jun 18 11:42:28 2012 CEST
==========================================================================

- Nodes information for: 
    Status info for the Job : https://devel09.cnaf.infn.it:9000/8vWeCAcV2ODQmaxzF1e0dA
    Current Status:     Cancelled 
    Destination:        cream02.dur.scotgrid.ac.uk:8443/cream-pbs-q7d
    Submitted:          Mon Jun 18 11:42:28 2012 CEST
==========================================================================
    
    Status info for the Job : https://devel09.cnaf.infn.it:9000/j2NFnbjC-3bK01fK8EE6uw
    Current Status:     Aborted 
    Status Reason:      parents have aborted
    Submitted:          Mon Jun 18 11:42:28 2012 CEST
==========================================================================

- Cancel PASS (overall status is Aborted because some nodes were aborted)

======================= glite-wms-job-status Success =====================
BOOKKEEPING INFORMATION:

Status info for the Job : https://devel09.cnaf.infn.it:9000/2ELy83Xoo99hGI-CHwRwMw
Current Status:     Aborted 
Logged Reason(s):
    - Aborted by user
Status Reason:      X509 proxy not found or I/O error (/var/SandboxDir/2E/https_3a_2f_2fdevel09.cnaf.infn.it_3a9000_2f2ELy83Xoo99hGI-CHwRwMw/user.proxy)
Destination:        dagman
Submitted:          Tue May 22 10:52:58 2012 CEST
==========================================================================

- Nodes information for: 
    Status info for the Job : https://devel09.cnaf.infn.it:9000/AGrC529w-NEfkr3A1P5XgA
    Current Status:     Cancelled 
    Logged Reason(s):
        - Cancelled by user
    Destination:        creamce01.ge.infn.it:8443/cream-lsf-cert
    Submitted:          Tue May 22 10:52:58 2012 CEST
==========================================================================
    
    Status info for the Job : https://devel09.cnaf.infn.it:9000/fEyWrMB0N8nT55iHH2zpDg
    Current Status:     Cancelled 
    Submitted:          Tue May 22 10:52:58 2012 CEST
==========================================================================
    
    Status info for the Job : https://devel09.cnaf.infn.it:9000/k19fxZAabsvBWnP_P1RUsA
    Current Status:     Aborted 
    Logged Reason(s):
        - Job got an error while in the CondorG queue.
    Status Reason:       failed (LB query failed)
    Destination:        ce3.itep.ru:2119/jobmanager-lcgpbs-dteam
    Submitted:          Tue May 22 10:52:58 2012 CEST
==========================================================================
    
    Status info for the Job : https://devel09.cnaf.infn.it:9000/kShzxpqr91qHCXCmHIeKgg
    Current Status:     Aborted 
    Logged Reason(s):
        - Transfer to CREAM failed due to exception: CREAM Register raised std::exception N5glite2ce16cream_client_api16cream_exceptions30JobSubmissionDisabledExceptionE
    Status Reason:       failed (LB query failed)
    Destination:        cccreamceli07.in2p3.fr:8443/cream-sge-medium
    Submitted:          Tue May 22 10:52:58 2012 CEST
==========================================================================
    
    Status info for the Job : https://devel09.cnaf.infn.it:9000/spKsWwVHtshg3IMYN5Y5Wg
    Current Status:     Cancelled 
    Logged Reason(s):
        - Cancelled by user
    Destination:        egice.polito.it:8443/cream-pbs-cert
    Submitted:          Tue May 22 10:52:58 2012 CEST
==========================================================================
    
    Status info for the Job : https://devel09.cnaf.infn.it:9000/tRcMUgad4ww1d-YjoFXqdQ
    Current Status:     Cancelled 
    Submitted:          Tue May 22 10:52:58 2012 CEST
==========================================================================

- Job with access to catalogues (mcecchi 25/5/12)

[mcecchi@ui ~]$ cat catalogue_access.jdl 
[
DataAccessProtocol = "gsiftp";
RetryCount = 1;
ShallowRetryCount = 2; 
Executable = "/bin/echo";
Arguments = "1000";
StdOutput = "std.out";
StdError = "std.err";
FuzzyRank = true;
InputSandbox = {"calc-pi.sh", "fileA", "prologue.sh"};
OutputSandbox = {"std.out", "std.err","out-PI.txt","out-e.txt"};
requirements = true;
DataRequirements = {
[
DataCatalogType = "DLI";
DataCatalog ="http://lfc.gridpp.rl.ac.uk:8085/"; 
InputData = { "lfn:/grid/t2k.org/nd280/raw/ND280/ND280/00005000_00005999/nd280_00005000_0002.daq.mid.gz" };
]
};
];

[mcecchi@ui ~]$ glite-wms-job-status https://devel09.cnaf.infn.it:9000/mB4YVNRzrseG_cRc-V8Ncw


======================= glite-wms-job-status Success =====================
BOOKKEEPING INFORMATION:

Status info for the Job : https://devel09.cnaf.infn.it:9000/mB4YVNRzrseG_cRc-V8Ncw
Current Status:     Running 
Status Reason:      unavailable
Destination:        ceprod07.grid.hep.ph.ic.ac.uk:8443/cream-sge-grid.q
Submitted:          Fri May 25 16:48:50 2012 CEST
==========================================================================

- Enable GLUE2 purchasers and check an ISM dump and MM FAIL [root@devel09 SandboxDir]# tail -f /var/log/wms/workload_manager_events.log /usr/bin/glite-wms-workload_manager(_ZN5boost3_bi5list1INS0_5valueINS_10shared_ptrIN5glite3wms3ism9purchaser13ism_purchaserEEEE /usr/bin/glite-wms-workload_manager(_ZN5boost3_bi6bind_tIvNS_4_mfi3mf0IvN5glite3wms3ism9purchaser13ism_purchaserEEENS0_5list1IN /usr/bin/glite-wms-workload_manager(_ZN5boost6detail8function26void_function_obj_invoker0INS_3_bi6bind_tIvNS_4_mfi3mf0IvN5glite boost::function0<void, std::allocator >::operator()() const /usr/bin/glite-wms-workload_manager /usr/bin/glite-wms-workload_manager boost::function0<void, std::allocator >::operator()() const glite::wms::manager::server::Events::run() boost::_mfi::mf0<void, glite::wms::manager::server::Events>::operator()(glite::wms::manager::server::Events*) const

- Enable Argus authZ and check results PASS

In site_info.def USE_ARGUS=yes ARGUS_PEPD_ENDPOINTS="https://argus01.lcg.cscs.ch:8154/authz https://argus02.lcg.cscs.ch:8154/authz https://argus03.lcg.cscs.ch:8154/authz"

22/03/2012 argus-gsi-pep-callout missing from the MP, error while using gridftp FIXED

22/05/2012 New test:


22 May 2012, 16:38:56 -I- PID: 30454 (Debug) - Calling the WMProxy jobRegister service

Warning - Unable to register the job to the service: https://devel09.cnaf.infn.it:7443/glite_wms_wmproxy_server Argus denied authorization on jobRegister by ,C=IT,O=INFN,OU=Personal Certificate,L=CNAF,CN=Marco Cecchi Error code: SOAP-ENV:Server

PASS

TODO: need to try with one policy that let us pass

BUGS:

Vulnerability bug in ICE's proxy renewal (Advisory-SVG-2012-4073) PRE-CERTIFIED (adorigo 18/07/2012)

I used the ICE RPM from http://etics-repository.cern.ch/repository/download/registered/emi/emi.wms.ice/3.3.5/sl5_x86_64_gcc412EPEL/glite-wms-ice-3.3.5-4.sl5.x86_64.rpm and verified that the issue specified in the SVG 4073 has been fixed, even at the ICE start (starting in a situation in which the vulnerability is present).

Vulnerability bug in ICE's proxy renewal (Advisory-SVG-2012-4039) PRE-CERTIFIED (adorigo 09/07/2012)

The bug has been fixed and verified. In particular it has been verified that the vulnerability has disappeared and that in "normal" conditions the proxy renewal still works correctly. here it is described how to check the correct working of proxy renewal.

glite-wms-ice-proxy-renew can block undefinitely (https://savannah.cern.ch/bugs/?95584) PRE-CERTIFIED (adorigo 09/07/2012)

Must set the following conf parameters in the file /etc/glite-wms/glite_wms.conf (in the WMS node):

[root@cream-01 persist_dir]# egrep "proxy_renewal_freq|ice_log_level|renewal_timeout" /etc/glite-wms/glite_wms.conf
    ice_log_level   =   700;
    proxy_renewal_frequency   =   60;
    proxy_renewal_timeout  =  60;

Then must open a console (as root) on any public machine and start the command:

[root@cream-28 ~]# openssl s_server -cert /etc/grid-security/hostcert.pem -key /etc/grid-security/hostkey.pem -accept 7000

This will simulates a non-responding overloaded MyProxyServer. Log into the WMS node and make sure that the proxy renewal daemon cache is empty and that ICE doesn't have any proxy cached (also remove old ICE logfiles for your ease):

[root@cream-01 persist_dir]# service gLite stop
[...]
[root@cream-01 persist_dir]# rm -f /var/ice/persist_dir/* /var/log/wms/ice.log*
\rm -f /var/glite/spool/glite-renewd/*
[root@cream-01 persist_dir]# service gLite start
Do a tail on the ICE's logfile filtering on proxy renewal's messages:
[root@cream-01 persist_dir]# tail -f /var/log/wms/ice.log |grep iceCommandDelegationRenewal
Switch on a WMS UI machine and write down this JDL:
dorigoa@cream-03 13:47:23 ~>cat wms.jdl 
[
Executable = "/bin/echo";
Arguments = "ciao";
InputSandbox = {};
stdoutput="stdout";
stderror="stderr";
OutputSandbox = {"stdout","stderr"};
requirements = RegExp("cream.*", other.GlueCEUniqueID);
myproxyserver="cream-28.pd.infn.it:7000"; // is the machines running the openssl server
]

Submit the above JDL and switch to the console where the tail is running. At a certain point you'll have to see this log message:

2012-07-09 13:47:52,785 DEBUG - iceCommandDelegationRenewal::renewAllDelegations() - Contacting MyProxy server [cream-28.pd.infn.it:7000] for user dn [/C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Alvise Dorigo-/dteam/Role=NULL/Capability=NULL] with proxy certificate [/var/ice/persist_dir/779DCED0D3865DD842211CF2C904581ED1E2A964.betterproxy] to renew it...
2012-07-09 13:48:52,822 ERROR - iceCommandDelegationRenewal::renewAllDelegations() - Proxy renewal failed: [ERROR - /usr/bin/glite-wms-ice-proxy-renew killed the renewal child after timeout of 60 seconds. The proxy /var/ice/persist_dir/779DCED0D3865DD842211CF2C904581ED1E2A964.betterproxy has NOT been renewed! ]

clearly showing that the proxy renewal process doens't block undefinitely (but for a maximum of 60 seconds as set in the file glite_wms.conf) anymore to a non-respoding server.

Deregistration of a proxy (2) (https://savannah.cern.ch/bugs/?83453) PRE-CERTIFIED (adorigo 25/06/2012) Submitted a job and verified the proxy-deregistration in the syslog:

Connecting to the service https://cream-01.pd.infn.it:7443/glite_wms_wmproxy_server


====================== glite-wms-job-submit Success ======================

The job has been successfully submitted to the WMProxy
Your job identifier is:

https://devel09.cnaf.infn.it:9000/tsk_AU6HJZB4l9IPNeLdPQ

==========================================================================


From file /var/log/message: 
Jun 25 16:35:03 cream-01 glite-proxy-renewd[18810]: Proxy /var/proxycache/%2FC%3DIT%2FO%3DINFN%2FOU%3DPersonal%20Certificate%2FL%3DPadova%2FCN%3DAlvise%20Dorigo/9n4QO2qQq50mmLDohkmIhw/userproxy.pem of job https://devel09.cnaf.infn.it:9000/cJBETCHFmomxOnqC8ZJTiw has been registered as /var/glite/spool/glite-renewd/45a96bd6a16770e5fdc4c60bbae2646e.0
Jun 25 16:35:03 cream-01 glite-proxy-renewd[18810]: Proxy /var/proxycache/%2FC%3DIT%2FO%3DINFN%2FOU%3DPersonal%20Certificate%2FL%3DPadova%2FCN%3DAlvise%20Dorigo/9n4QO2qQq50mmLDohkmIhw/userproxy.pem of job https://devel09.cnaf.infn.it:9000/82dORq-A05f06LDYxwtlXQ has been registered as /var/glite/spool/glite-renewd/45a96bd6a16770e5fdc4c60bbae2646e.0
Jun 25 16:35:03 cream-01 glite_wms_wmproxy_server: submission from cream-12.pd.infn.it, DN=/C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Alvise Dorigo, FQAN=/dteam/Role=NULL/Capability=NULL, userid=18757 for jobid=https://devel09.cnaf.infn.it:9000/tsk_AU6HJZB4l9IPNeLdPQ


[dorigoa@cream-12 ~]$ glite-wms-job-status https://devel09.cnaf.infn.it:9000/tsk_AU6HJZB4l9IPNeLdPQ


======================= glite-wms-job-status Success =====================
BOOKKEEPING INFORMATION:

Status info for the Job : https://devel09.cnaf.infn.it:9000/tsk_AU6HJZB4l9IPNeLdPQ
Current Status:     Done (Success)
Exit code:          0
Submitted:          Mon Jun 25 16:35:03 2012 CEST
==========================================================================

- Nodes information for: 
   Status info for the Job : https://devel09.cnaf.infn.it:9000/82dORq-A05f06LDYxwtlXQ
   Current Status:     Done (Success)
   Logged Reason(s):
       - job completed
       - Job Terminated Successfully
   Exit code:          0
   Status Reason:      Job Terminated Successfully
   Destination:        cream-23.pd.infn.it:8443/cream-lsf-creamtest2
   Submitted:          Mon Jun 25 16:35:03 2012 CEST
==========================================================================

   Status info for the Job : https://devel09.cnaf.infn.it:9000/cJBETCHFmomxOnqC8ZJTiw
   Current Status:     Done (Success)
   Logged Reason(s):
       - job completed
       - Job Terminated Successfully
   Exit code:          0
   Status Reason:      Job Terminated Successfully
   Destination:        cream-23.pd.infn.it:8443/cream-lsf-creamtest2
   Submitted:          Mon Jun 25 16:35:03 2012 CEST
==========================================================================



Again from /var/log/messages:

Jun 25 16:35:56 cream-01 glite-proxy-renewd[18810]: Proxy /var/glite/spool/glite-renewd/45a96bd6a16770e5fdc4c60bbae2646e.0 of job https://devel09.cnaf.infn.it:9000/cJBETCHFmomxOnqC8ZJTiw has been unregistered
Jun 25 16:35:56 cream-01 glite-proxy-renewd[18810]: Proxy /var/glite/spool/glite-renewd/45a96bd6a16770e5fdc4c60bbae2646e.0 of job https://devel09.cnaf.infn.it:9000/82dORq-A05f06LDYxwtlXQ has been unregistered

some sensible information should be logged on syslog (https://savannah.cern.ch/bugs/?92657) PRE-CERTIFIED (mcecchi 29/03/12)

May 18 17:19:07 devel09 glite_wms_wmproxy_server: submission from ui.cnaf.infn.it, DN=/C=IT/O=INFN/OU=Personal Certificate/L=CNAF/CN=Marco Cecchi, FQAN=/dteam/Role=NULL/Capability=NULL, userid=18264 for jobid=https://devel07.cnaf.infn.it:9000/_K-LYpekDA1xk9sisW5hBA

May 18 17:19:36 devel09 glite-wms-workload_manager: jobid=https://devel07.cnaf.infn.it:9000/D6jb8a1zk8kfwLhm2PxAIw, destination=cmsrm-cream01.roma1.infn.it:8443/cream-lsf-cmsgcert
May 18 17:19:36 devel09 glite-wms-workload_manager: jobid=https://devel07.cnaf.infn.it:9000/mlMh39hZcZSHzLozxmSUXQ, destination=ce203.cern.ch:8443/cream-lsf-grid_2nh_dteam
May 18 17:19:36 devel09 glite-wms-workload_manager: jobid=https://devel07.cnaf.infn.it:9000/ToPUYBYlHPX8ucct9GMxJA, destination=ce207.cern.ch:8443/cream-lsf-grid_2nh_dteam
May 18 17:19:36 devel09 glite-wms-workload_manager: jobid=https://devel07.cnaf.infn.it:9000/eqK70rofnD_ssOJVVGzZFg, destination=ce04.esc.qmul.ac.uk:8443/cream-sge-lcg_long
May 18 17:19:36 devel09 glite-wms-workload_manager: jobid=https://devel07.cnaf.infn.it:9000/RWbT_HnCv_xhVIgxOz6Zew, destination=atlasce02.scope.unina.it:8443/cream-pbs-egeecert
May 18 17:19:37 devel09 glite-wms-workload_manager: jobid=https://devel07.cnaf.infn.it:9000/Uen9xJpMD_muv4DgUa7XHg, destination=ce01.eela.if.ufrj.br:8443/cream-pbs-dteam
May 18 17:19:37 devel09 glite-wms-workload_manager: jobid=https://devel07.cnaf.infn.it:9000/surSO68FUvkoMMJ5vwnslw, destination=ce-cr-02.ts.infn.it:8443/cream-lsf-cert
May 18 17:19:37 devel09 glite-wms-workload_manager: jobid=https://devel07.cnaf.infn.it:9000/K6OlQy5IR0a_bs3jyIcASQ, destination=ce-grisbi.cbib.u-bordeaux2.fr:8443/cream-pbs-dteam
May 18 17:19:37 devel09 glite-wms-workload_manager: jobid=https://devel07.cnaf.infn.it:9000/rQzNsXNGtVRtKjt0hW-n7w, destination=ce0.m3pec.u-bordeaux1.fr:8443/cream-pbs-dteam
May 18 17:19:37 devel09 glite-wms-workload_manager: jobid=https://devel07.cnaf.infn.it:9000/PVm7kNQu_h7CzrsFhGbIrQ, destination=cccreamceli06.in2p3.fr:8443/cream-sge-long

WMS UI emi-wmproxy-api-cpp and emi-wms-ui-api-python still use use gethostbyaddr/gethostbyname (https://savannah.cern.ch/bugs/?89668) PRE-CERTIFIED (alvise 28/03/12)

grep on source code

Submission with rfc proxy doesn't work (https://savannah.cern.ch/bugs/?88128) PRE-CERTIFIED (mcecchi 14/6/12)

Create a proxy with voms-proxy-init and option -rfc. 1) Authentication fails at the LCG-CE, as expected:

27 Mar, 16:01:17 -I- EventGlobusSubmitFailed::process_event(): Got globus submit failed event.
27 Mar, 16:01:17 -I- EventGlobusSubmitFailed::process_event(): For cluster: 1363, reason: 7 authentication failed: GSS Major Status: Authentication Failed GSS Minor Status Error Chain:  init.c:499: globus_gss_assist_init_sec_context_async: Error during context initialization init_sec_contex
27 Mar, 16:01:17 -I- EventGlobusSubmitFailed::process_event(): Job id = https://devel09.cnaf.infn.it:9000/kDLx5tj_Nxpt2ewI35AARQ
27 Mar, 16:01:17 -I- SubmitReader::internalRead(): Reading condor submit file of job https://devel09.cnaf.infn.it:9000/kDLx5tj_Nxpt2ewI35AARQ

2) let's submit a job to CREAM:

[mcecchi@devel15 ~]$ glite-wms-job-submit -a --endpoint https://devel16.cnaf.infn.it:7443/glite_wms_wmproxy_server -r cream-38.pd.infn.it:8443/cream-pbs-creamtest2  ls_cream.jdl 
Connecting to the service https://devel16.cnaf.infn.it:7443/glite_wms_wmproxy_server


====================== glite-wms-job-submit Success ======================

The job has been successfully submitted to the WMProxy
Your job identifier is:

https://devel16.cnaf.infn.it:9000/XRSoZaPhZds7iwhh_z4DDw

==========================================================================


[mcecchi@devel15 ~]$ glite-wms-job-status https://devel16.cnaf.infn.it:9000/XRSoZaPhZds7iwhh_z4DDw


======================= glite-wms-job-status Success =====================
BOOKKEEPING INFORMATION:

Status info for the Job : https://devel16.cnaf.infn.it:9000/XRSoZaPhZds7iwhh_z4DDw
Current Status:     Running
Status Reason:      unavailable
Destination:        cream-38.pd.infn.it:8443/cream-pbs-creamtest2
Submitted:          Thu Jun 14 15:18:15 2012 CEST
==========================================================================

[mcecchi@devel15 ~]$ glite-wms-job-status https://devel16.cnaf.infn.it:9000/XRSoZaPhZds7iwhh_z4DDw


======================= glite-wms-job-status Success =====================
BOOKKEEPING INFORMATION:

Status info for the Job : https://devel16.cnaf.infn.it:9000/XRSoZaPhZds7iwhh_z4DDw
Current Status:     Done(Success)
Logged Reason(s):
    - job completed
    - Job Terminated Successfully
Exit code:          0
Status Reason:      Job Terminated Successfully
Destination:        cream-38.pd.infn.it:8443/cream-pbs-creamtest2
Submitted:          Thu Jun 14 15:18:15 2012 CEST

EMI WMS wmproxy init.d script stop/start problems (https://savannah.cern.ch/bugs/?89577) PRE-CERTIFIED (mcecchi 23/03/12)

1. The restart command does not restart httpd, whereas stop + start does:

[root@devel09 ~]# /etc/init.d/glite-wms-wmproxy status
/usr/bin/glite_wms_wmproxy_server is running...
[root@devel09 ~]# ps aux | grep httpd
glite    11440  0.0  0.0  96440  2196 ?        S    11:31   0:00 /usr/sbin/httpd -k start -f /etc/glite-wms/glite_wms_wmproxy_httpd.conf
glite    11441  0.0  0.0  96440  2480 ?        S    11:31   0:00 /usr/sbin/httpd -k start -f /etc/glite-wms/glite_wms_wmproxy_httpd.conf
glite    11442  0.0  0.0  96440  2480 ?        S    11:31   0:00 /usr/sbin/httpd -k start -f /etc/glite-wms/glite_wms_wmproxy_httpd.conf
glite    11443  0.0  0.0  96440  2480 ?        S    11:31   0:00 /usr/sbin/httpd -k start -f /etc/glite-wms/glite_wms_wmproxy_httpd.conf
glite    11444  0.0  0.0  96440  2480 ?        S    11:31   0:00 /usr/sbin/httpd -k start -f /etc/glite-wms/glite_wms_wmproxy_httpd.conf
glite    11445  0.0  0.0  96440  2480 ?        S    11:31   0:00 /usr/sbin/httpd -k start -f /etc/glite-wms/glite_wms_wmproxy_httpd.conf
root     15789  0.0  0.0  61192   740 pts/0    R+   11:52   0:00 grep httpd
root     24818  0.0  0.1  96440  4716 ?        Ss   Mar19   0:04 /usr/sbin/httpd -k start -f /etc/glite-wms/glite_wms_wmproxy_httpd.conf
[root@devel09 ~]# /etc/init.d/glite-wms-wmproxy restart
Restarting /usr/bin/glite_wms_wmproxy_server... ok
[root@devel09 ~]# ps aux | grep httpd
glite    15889  0.0  0.0  96440  2196 ?        S    11:52   0:00 /usr/sbin/httpd -k start -f /etc/glite-wms/glite_wms_wmproxy_httpd.conf
glite    15890  0.0  0.0  96440  2488 ?        S    11:52   0:00 /usr/sbin/httpd -k start -f /etc/glite-wms/glite_wms_wmproxy_httpd.conf
glite    15891  0.0  0.0  96440  2480 ?        S    11:52   0:00 /usr/sbin/httpd -k start -f /etc/glite-wms/glite_wms_wmproxy_httpd.conf
glite    15892  0.0  0.0  96440  2480 ?        S    11:52   0:00 /usr/sbin/httpd -k start -f /etc/glite-wms/glite_wms_wmproxy_httpd.conf
glite    15893  0.0  0.0  96440  2480 ?        S    11:52   0:00 /usr/sbin/httpd -k start -f /etc/glite-wms/glite_wms_wmproxy_httpd.conf
glite    15894  0.0  0.0  96440  2480 ?        S    11:52   0:00 /usr/sbin/httpd -k start -f /etc/glite-wms/glite_wms_wmproxy_httpd.conf
root     15897  0.0  0.0  61196   772 pts/0    S+   11:53   0:00 grep httpd
root     24818  0.0  0.1  96440  4716 ?        Ss   Mar19   0:04 /usr/sbin/httpd -k start -f /etc/glite-wms/glite_wms_wmproxy_httpd.conf

2. A start immediately following a stop often fails and has to be repeated to get the service working again:

[root@devel09 ~]# /etc/init.d/glite-wms-wmproxy stop; /etc/init.d/glite-wms-wmproxy start
Stopping /usr/bin/glite_wms_wmproxy_server... ok
Starting /usr/bin/glite_wms_wmproxy_server... ok

3. The stop and start commands fail when invoked via ssh ([Sun Oct 09 17:08:01 2011] [warn] PassEnv variable HOSTNAME was undefined):

[mcecchi@cnaf ~]$ ssh root@devel09 '/etc/init.d/glite-wms-wmproxy stop;/etc/init.d/glite-wms-wmproxy start'
root@devel09's password: 
Stopping /usr/bin/glite_wms_wmproxy_server... ok
Starting /usr/bin/glite_wms_wmproxy_server... ok

Make some WMS init scripts System V compatible (https://savannah.cern.ch/bugs/?91115) PRE-CERTIFIED (mcecchi 23/03/12)

[root@devel09 ~]# grep -1 chkconfig /etc/init.d/glite-wms-ice 

# chkconfig: 345 95 06
# description: startup script for the ICE process
[root@devel09 ~]# grep -1 chkconfig /etc/init.d/glite-wms-wm

# chkconfig: 345 94 06 
# description: WMS processing engine

Semi-automated service backends configuration for WMS (task #23845, EMI Development Tracker, Done) PRE-CERTIFIED (mcecchi 23/03/12)

[root@devel09 ~]# cat /etc/my.cnf
[mysqld]
innodb_flush_log_at_trx_commit=2
innodb_buffer_pool_size=500M
!includedir /etc/mysql/conf.d/

innodb_flush_log_at_trx_commit=2 and innodb_buffer_pool_size=500M are what is expected to be present

WMproxy GACLs do not support wildcards (as they used to do) (https://savannah.cern.ch/bugs/?87261) PRE-CERTIFIED (mcecchi 23/03/12)

with:

<gacl version="0.0.1">
  <entry> <voms> <fqan>/dtea*</voms> <allow> <exec/> </allow> </entry>
</gacl>

GRANT

<gacl version="0.0.1">
  <entry> <voms> <fqan>/dteaM*</fqan></voms> <allow> <exec/> </allow> </entry>
</gacl>

DENY

<gacl version="0.0.1">
  <entry> <voms> <fqan>/dteam</fqan></voms> <allow> <exec/> </allow> </entry>
</gacl>

DENY

<gacl version="0.0.1">
  <entry> <voms> <fqan>/dteam</fqan></voms> <allow> <exec/> </allow> </entry>
  <entry> <voms> <fqan>/dteam/*</fqan></voms> <allow> <exec/> </allow> </entry>
</gacl>

GRANT

WMS logs should keep track of the last 90 days (https://savannah.cern.ch/bugs/?89871) PRE-CERTIFIED (mcecchi 22/03/12)

[root@devel09 ~]# grep -r rotate\ 90 /etc/logrotate.d/
/etc/logrotate.d/wm:       rotate 90
/etc/logrotate.d/globus-gridftp:    rotate 90
/etc/logrotate.d/globus-gridftp:    rotate 90
/etc/logrotate.d/lcmaps:       rotate 90
/etc/logrotate.d/lm:       rotate 90
/etc/logrotate.d/jc:       rotate 90
/etc/logrotate.d/glite-wms-purger:       rotate 90
/etc/logrotate.d/wmproxy:       rotate 90
/etc/logrotate.d/argus:       rotate 90
/etc/logrotate.d/ice:       rotate 90
[root@devel09 ~]# grep -r daily /etc/logrotate.d/
/etc/logrotate.d/wm:       daily
/etc/logrotate.d/kill-stale-ftp:    daily
/etc/logrotate.d/globus-gridftp:    daily
/etc/logrotate.d/globus-gridftp:    daily
/etc/logrotate.d/lcmaps:       daily
/etc/logrotate.d/lm:       daily
/etc/logrotate.d/jc:       daily
/etc/logrotate.d/glite-wms-purger:       daily
/etc/logrotate.d/wmproxy:       daily
/etc/logrotate.d/argus:       daily
/etc/logrotate.d/ice:       daily
/etc/logrotate.d/glite-lb-server:   daily
/etc/logrotate.d/bdii:    daily

/etc/logrotate.d/kill-stale-ftp has

rotate 30

but it should be

rotate 90

yaim-wms changes for Argus based authZ (https://savannah.cern.ch/bugs/?90760) NOT PRE-CERTIFIED (mcecchi 22/03/12)

Modified siteinfo.def with:

USE_ARGUS=yes

ARGUS_PEPD_ENDPOINTS="https://argus01.lcg.cscs.ch:8154/authz https://argus02.lcg.cscs.ch:8154/authz https://argus03.lcg.cscs.ch:8154/authz"

ran:

/opt/glite/yaim/bin/yaim -c -s siteinfo/site-info.def -n WMS

in glite_wms.conf:

ArgusAuthz = true;

ArgusPepdEndpoints = {"https://argus01.lcg.cscs.ch:8154/authz", "https://argus02.lcg.cscs.ch:8154/authz", "https://argus03.lcg.cscs.ch:8154/authz"};

glite-wms-check-daemons.sh should not restart daemons under the admin's nose (https://savannah.cern.ch/bugs/?89674) PRE-CERTIFIED (mcecchi 22/03/12)

[root@devel09 ~]# /etc/init.d/glite-wms-wm start
starting workload manager... ok
[root@devel09 ~]# /etc/init.d/glite-wms-wm status
/usr/bin/glite-wms-workload_manager (pid 486) is running...
[root@devel09 ~]# ll /var/run/glite-wms-w*
-rw-r--r-- 1 root root 4 Mar 22 10:02 /var/run/glite-wms-workload_manager.pid
[root@devel09 ~]# /etc/init.d/glite-wms-wm stop
stopping workload manager... ok
[root@devel09 ~]# date
Thu Mar 22 10:02:37 CET 2012
[root@devel09 ~]# ll /var/run/glite-wms-w*
ls: /var/run/glite-wms-w*: No such file or directory
[root@devel09 ~]# cat /etc/cron.d/glite-wms-check-daemons.cron 
HOME=/
MAILTO=SA3-italia

*/5 * * * * root . /usr/libexec/grid-env.sh ; sh /usr/libexec/glite-wms-check-daemons.sh > /dev/null 2>&1
[root@devel09 ~]#  sh /usr/libexec/glite-wms-check-daemons.sh
[root@devel09 ~]# /etc/init.d/glite-wms-wm status
/usr/bin/glite-wms-workload_manager is not running
[root@devel09 ~]# ps aux | grep workl
root       970  0.0  0.0  61192   756 pts/2    S+   10:03   0:00 grep workl
[root@devel09 ~]# /etc/init.d/glite-wms-wm start
starting workload manager... ok
[root@devel09 ~]# ps aux | grep workl
glite     1009 11.0  0.6 255060 26444 ?        Ss   10:04   0:00 /usr/bin/glite-wms-workload_manager --conf glite_wms.conf --daemon
root      1013  0.0  0.0  61196   764 pts/2    S+   10:04   0:00 grep workl
[root@devel09 ~]# kill -9 1009
[root@devel09 ~]#  sh /usr/libexec/glite-wms-check-daemons.sh
stopping workload manager... ok
starting workload manager... ok
[root@devel09 ~]# /etc/init.d/glite-wms-wm status
/usr/bin/glite-wms-workload_manager (pid 1196) is running...
[root@devel09 ~]# 

Wrong location for PID file (https://savannah.cern.ch/bugs/?89857) PRE-CERTIFIED (mcecchi 21/03/12)

[root@devel09 ~]# ls /var/run/*pid
/var/run/atd.pid            /var/run/crond.pid               /var/run/glite-wms-job_controller.pid    /var/run/gpm.pid        /var/run/klogd.pid       /var/run/ntpd.pid
/var/run/brcm_iscsiuio.pid  /var/run/exim.pid                /var/run/glite-wms-log_monitor.pid       /var/run/haldaemon.pid  /var/run/messagebus.pid  /var/run/sshd.pid
/var/run/condor_master.pid  /var/run/glite-wms-ice-safe.pid  /var/run/glite-wms-workload_manager.pid  /var/run/iscsid.pid     /var/run/nrpe.pid        /var/run/syslogd.pid

For ICE now the pid is correctly handled:

[root@devel09 ~]# /etc/init.d/glite-wms-ice status
/usr/bin/glite-wms-ice-safe (pid 16820) is running...
[root@devel09 ~]# ll /var/run/glite-wms-ice-safe.pid 
-rw-r--r-- 1 root root 6 Mar 27 14:41 /var/run/glite-wms-ice-safe.pid
[root@devel09 ~]# cat /var/run/glite-wms-ice-safe.pid
16820
[root@devel09 ~]# /etc/init.d/glite-wms-ice stop
stopping ICE... ok
[root@devel09 ~]# /etc/init.d/glite-wms-ice start
starting ICE... ok
[root@devel09 ~]# ll /var/run/glite-wms-ice-safe.pid 
-rw-r--r-- 1 root root 6 Mar 27 14:42 /var/run/glite-wms-ice-safe.pid
[root@devel09 ~]# cat /var/run/glite-wms-ice-safe.pid
16969
[root@devel09 ~]# /etc/init.d/glite-wms-ice status
/usr/bin/glite-wms-ice-safe (pid 16969) is running...
[root@devel09 ~]# /etc/init.d/glite-wms-ice restart
stopping ICE... ok
starting ICE... ok
[root@devel09 ~]# ps -ef|grep ice
root      2942     1  0 Mar26 ?        00:00:00 gpm -m /dev/input/mice -t exps2
glite    17074     1  0 14:43 ?        00:00:00 /usr/bin/glite-wms-ice-safe --conf glite_wms.conf --daemon /tmp/icepid
glite    17080 17074  0 14:43 ?        00:00:00 sh -c /usr/bin/glite-wms-ice --conf glite_wms.conf /var/log/wms/ice_console.log 2>&1
glite    17081 17080  0 14:43 ?        00:00:00 /usr/bin/glite-wms-ice --conf glite_wms.conf /var/log/wms/ice_console.log
root     17114 23643  0 14:43 pts/1    00:00:00 grep ice

*Pid file of ICE and WM has glite ownership (https://savannah.cern.ch/bugs/?91834) - PRE-CERTIFIED (adorigo - 20121001)

It must be verified that the owner of WM and ICE PID files is root. To do this, as root, stop WMS and ICE and delete any previous WMS and ICE's PID file; then restart services and check the ownership as in the example below:

[root@cream-01 ~]# /etc/init.d/glite-wms-wm stop
stopping workload manager... ok
[root@cream-01 ~]# \rm /var/run/glite-wms-workload_manager.pid
[root@cream-01 ~]# /etc/init.d/glite-wms-ice stop
stopping ICE... ok
[root@cream-01 ~]# \rm /var/run/glite-wms-ice-safe.pid
[root@cream-01 ~]# /etc/init.d/glite-wms-wm start
starting workload manager... ok
[root@cream-01 ~]# /etc/init.d/glite-wms-ice start
starting ICE... ok
[root@cream-01 ~]# ll /var/run/glite-wms-workload_manager.pid /var/run/glite-wms-ice-safe.pid 
-rw-r--r-- 1 root root 6 Oct  1 11:39 /var/run/glite-wms-ice-safe.pid
-rw-r--r-- 1 root root 6 Oct  1 11:38 /var/run/glite-wms-workload_manager.pid

The job replanner should be configurable (https://savannah.cern.ch/bugs/?91941None) PRE-CERTIFIED (mcecchi 21/03/12)

EnableReplanner=true in the WM conf

21 Mar, 16:51:54 -I: [Info] main(/home/condor/execute/dir_2479/userdir/emi.wms.wms-manager/src/main.cpp:468): WM startup completed...
21 Mar, 16:51:54 -I: [Info] operator()(/home/condor/execute/dir_2479/userdir/emi.wms.wms-manager/src/replanner.cpp:288): replanner in action
21 Mar, 16:51:57 -W: [Warning] get_site_name(/home/condor/execute/dir_2479/userdir/emi.wms.wms-ism/src/purchaser/ldap-utils.cpp:162): Cannot find GlueSiteUniqueID assignment.

EnableReplanner=false in the WM conf

21 Mar, 16:54:02 -I: [Info] main(/home/condor/execute/dir_2479/userdir/emi.wms.wms-manager/src/main.cpp:468): WM startup completed...
21 Mar, 16:54:05 -W: [Warning] get_site_name(/home/condor/execute/dir_2479/userdir/emi.wms.wms-ism/src/purchaser/ldap-utils.cpp:162): Cannot find GlueSiteUniqueID assignment.

GlueServiceStatusInfo: ?? (https://savannah.cern.ch/bugs/?89435) PRE-CERTIFIED (mcecchi 21/03/12)

[root@devel09 ~]# /var/lib/bdii/gip/provider/glite-info-provider-service-wmproxy-wrapper|grep -i servicestatusinfo 
GlueServiceStatusInfo: /usr/bin/glite_wms_wmproxy_server is running...

WMProxy limiter should log more at info level (https://savannah.cern.ch/bugs/?72280) PRE-CERTIFIED (mcecchi 21/03/12).

In wmp conf:

jobRegister  =  "${WMS_LOCATION_SBIN}/glite_wms_wmproxy_load_monitor --oper jobRegister --load1 0 --load5 20 --load15 18 --memusage 99 --diskusage 95 --fdnum 1000 --jdnum 1500 --ftpconn 300";
LogLevel  =  5;

restart wmp, in wmproxy log:

21 Mar, 16:28:52 -S- PID: 875 - "wmputils::doExecv": Child failure, exit code: 256
21 Mar, 16:28:52 -I- PID: 875 - "wmpgsoapoperations::ns1__jobRegister": ------------------------------- Fault description --------------------------------
21 Mar, 16:28:52 -I- PID: 875 - "wmpgsoapoperations::ns1__jobRegister": Method: jobRegister
21 Mar, 16:28:52 -I- PID: 875 - "wmpgsoapoperations::ns1__jobRegister": Code: 1228
21 Mar, 16:28:52 -I- PID: 875 - "wmpgsoapoperations::ns1__jobRegister": Description: System load is too high:
Threshold for Load Average(1 min): 0 => Detected value for Load Average(1 min):  0.19

* EMI WMS wmproxy rpm doesn't set execution permissions as it used to do in gLite (https://savannah.cern.ch/bugs/?89506) PRE-CERTIFIED (mcecchi 15/5/2012)

after installing the wmproxy rpm and without running yaim:

[root@devel09 ~]# ll /usr/libexec/glite_wms_wmproxy_dirmanager 
-rwsr-xr-x 1 root root 18989 May 15 10:26 /usr/libexec/glite_wms_wmproxy_dirmanager
[root@devel09 ~]# ll /usr/sbin/glite_wms_wmproxy_load_monitor
-rwsr-xr-x 1 root root 22915 May 15 10:26 /usr/sbin/glite_wms_wmproxy_load_monitor

yaim-wms: set ldap query filter expression for GLUE2 in WMS configuration (https://savannah.cern.ch/bugs/?91563) PRE-CERTIFIED (mcecchi)

There are GLUE2 filters for both CE and SE:

[root@devel09 ~]# grep G2LDAP /etc/glite-wms/glite_wms.conf IsmIiG2LDAPCEFilterExt = "(|(&(objectclass=GLUE2ComputingService)(|(GLUE2ServiceType=org.glite.ce.ARC)(GLUE2ServiceType=org.glite.ce.CREAM)))(|(objectclass=GLUE2ComputingManager)(|(objectclass=GLUE2ComputingShare)(|(&(objectclass=GLUE2ComputingEndPoint)(GLUE2EndpointInterfaceName=org.glite.ce.CREAM))(|(objectclass=GLUE2ToStorageService)(|(&(objectclass=GLUE2MappingPolicy)(GLUE2PolicyScheme=org.glite.standard))(|(&(objectclass=GLUE2AccessPolicy)(GLUE2PolicyScheme=org.glite.standard))(|(objectclass=GLUE2ExecutionEnvironment)(|(objectclass=GLUE2ApplicationEnvironment)(|(objectclass=GLUE2Benchmark)))))))))))"; IsmIiG2LDAPSEFilterExt = "(|(objectclass=GLUE2StorageService)(|(objectclass=GLUE2StorageManager)(|(objectclass=GLUE2StorageShare)(|(objectclass=GLUE2StorageEndPoint)(|(objectclass=GLUE2MappingPolicy)(|(objectclass=GLUE2AccessPolicy)(|(objectclass=GLUE2DataStore)(|(objectclass=GLUE2StorageServiceCapacity)(|(objectclass=GLUE2StorageShareCapacity))))))))))";; [root@devel09 ~]#

JobController logfile name is misspelled (https://savannah.cern.ch/bugs/?32611https://savannah.cern.ch/bugs/?32611) PRE-CERTIFIED (alvise)

Verified that in the glite_wms.conf file the log file name was correct:


    [root@devel09 glite-wms]# grep jobcontroller glite_wms.conf
    LogFile  =  "${WMS_LOCATION_LOG}/jobcontroller_events.log";

glite-wms-job-submit doesn't always pick up other WMProxy endpoints if load on WMS is high (https://savannah.cern.ch/bugs/?40370https://savannah.cern.ch/bugs/?40370) HOPEFULLY FIXED (alvise)

[wms] GlueServiceStatusInfo content is ugly (https://savannah.cern.ch/bugs/?48068https://savannah.cern.ch/bugs/?48068) PRE-CERTIFIED (alvise).

Verified that the output of the command "/etc/init.d/glite-wms-wmproxy status" is as requested.

[ yaim-wms ] CeForwardParameters should include several more parameters (https://savannah.cern.ch/bugs/?61315https://savannah.cern.ch/bugs/?61315) PRE-CERTIFIED (alvise):


[root@devel09 ~]# grep CeF /etc/glite-wms/glite_wms.conf 
    CeForwardParameters  =  {"GlueHostMainMemoryVirtualSize","GlueHostMainMemoryRAMSize",
                                               "GlueCEPolicyMaxCPUTime", "GlueCEPolicyMaxObtainableCPUTime", "GlueCEPolicyMaxObtainableWallClockTime", "GlueCEPolicyMaxWallClockTime" };

Files specified with absolute paths shouldn't be used with inputsandboxbaseuri (https://savannah.cern.ch/bugs/?74832https://savannah.cern.ch/bugs/?74832) PRE-CERTIFIED (alvise).

Verified that JDL described in the comment is correctly handled by activating debug (--debug); the debug log showed that the file /etc/fstab is correctly staged from the UI node via gsiftp.

There's an un-catched out_of_range exception in the ICE component (https://savannah.cern.ch/bugs/?75099https://savannah.cern.ch/bugs/?75099 ) PRE-CERTIFIED (alvise)

Tried on my build machine (able to run ICE without WM) submittinga JDL with empty "ReallyRunningToken" attribute. ICE didn't crash as before. There's not yet possibility to test all-in-one (WMProxy/WM/ICE) because of a problem with LCMAPS.

Too much flexibility in JDL syntax (https://savannah.cern.ch/bugs/?75802https://savannah.cern.ch/bugs/?75802) PRE-CERTIFIED (alvise)

Verified with --debug that glite-wms-job-submit:

dorigoa@cream-01 11:00:47 ~/emi/wmsui_emi2>grep -i environment jdl2 
environment = "FOO=bar";

dorigoa@cream-01 11:50:02 ~/emi/wmsui_emi2>stage/usr/bin/glite-wms-job-submit --debug -a -c ~/JDLs/WMS/wmp_gridit.conf jdl2
[...]
-----------------------------------------
07 March 2012, 11:50:45 -I- PID: 3397 (Debug) - Registering JDL [ stdoutput = "out3.out"; SignificantAttributes = { "Requirements","Rank" }; DefaultNodeRetryCount = 5; executable = "ssh1.sh"; Type = "job"; Environment = { "FOO=bar" }; AllowZippedISB = false; VirtualOrganisation = "dteam"; JobType = "normal"; DefaultRank =  -other.GlueCEStateEstimatedResponseTime; outputsandbox = { "out3.out","err2.err","fstab","grid-mapfile","groupmapfile","  passwd" }; InputSandbox = { "file:///etc/fstab","grid-mapfile","groupmapfile","gsiftp://cream-38.pd.infn.it/etc/passwd","file:///home/dorigoa/ssh1.sh" }; stderror = "err2.err"; inputsandboxbaseuri = "gsiftp://cream-38.pd.infn.it/etc/grid-security"; rank =  -other.GlueCEStateEstimatedResponseTime; MyProxyServer = "myproxy.cern.ch"; requirements = other.GlueCEStateStatus == "Production" || other.GlueCEStateStatus == "testbedb" ]
[...]

So the mangling  Environment = "FOO=bar"; -> Environment = { "FOO=bar" };  occurs correctly.

getaddrinfo() sorts results according to RFC3484, but random ordering is lost (https://savannah.cern.ch/bugs/?82779https://savannah.cern.ch/bugs/?82779) PRE-CERTIFIED (alvise).

I did a not deep test. A deep test needs an alias pointing to at least 3 or 4 different WM nodes. The alias provided in the bug's savannah page just is resolved to 2 different physical hosts, and I observed that both hosts are choosen by the UI while submitting several jobs. I did this test from my EMI2 WMSUI workarea as I do not have any WMS UI EMI2 machine to try on.

glite-wms-job-status needs a json-compliant format (https://savannah.cern.ch/bugs/?82995https://savannah.cern.ch/bugs/?82995) PRE-CERTIFIED (alvise) from my WMSUI EMI2 workarea:


dorigoa@lxgrid05 14:20:13 ~/emi/wmsui_emi2>stage/usr/bin/glite-wms-job-status --json https://wms014.cnaf.infn.it:9000/pVQojatZbyoj_Pyab66_dw

{ "result": "success" , "https://wms014.cnaf.infn.it:9000/pVQojatZbyoj_Pyab66_dw": { "Current Status": "Done(Success)", "Logged Reason": {"0": "job completed","1": "Job Terminated Successfully"}, "Exit code": "0", "Status Reason": "Job Terminated Successfully", "Destination": "grive02.ibcp.fr:8443/cream-pbs-dteam", "Submitted": "Mon Mar 12 14:11:55 2012 CET", "Done": "1331558020"}   }

Last LB event logged by ICE when job aborted for proxy expired should be ABORTED (https://savannah.cern.ch/bugs/?84839) PRE_CERTIFIED (alvise)

Submitted to ICE (running from my workarea emi2) a job sleeping for 5 minutes with a proxy valid for 3 minutes (myproxyserver not set, so no proxy renewal). Last event logged by ICE (as shown in the ICE's log) is:


2012-03-13 10:49:33,616 INFO - iceLBLogger::logEvent() - Job Aborted Event, reason=[Proxy is expired; Job has been terminated (got SIGTERM)] - [GRIDJobID="https://grid005.pd.infn.it:9000/0001331632035.314183" CREAMJobID="https://cream-23.pd.infn.it:8443/CREAM017935418"]

glite-wms-job-status needs a better handing of purged-related error code. (https://savannah.cern.ch/bugs/?85063https://savannah.cern.ch/bugs/?85063) HOPEFULLY-FIXED (alvise). Reproducing the scenario that triggered the problem is highly improbable.

pkg-config info for wmproxy-api-cpp should be enriched (https://savannah.cern.ch/bugs/?85799) PRE-CERTIFIED (alvise, 30/03/2012):

[root@devel09 ~]# rpm -ql glite-wms-wmproxy-api-cpp-devel
/usr/include/glite
/usr/include/glite/wms
/usr/include/glite/wms/wmproxyapi
/usr/include/glite/wms/wmproxyapi/wmproxy_api.h
/usr/include/glite/wms/wmproxyapi/wmproxy_api_utilities.h
/usr/lib64/libglite_wms_wmproxy_api_cpp.so
/usr/lib64/pkgconfig/wmproxy-api-cpp.pc
[root@devel09 ~]# cat /usr/lib64/pkgconfig/wmproxy-api-cpp.pc
prefix=/usr
exec_prefix=${prefix}
libdir=${exec_prefix}/lib64
includedir=${prefix}/include

Name: wmproxy api cpp
Description: WMProxy C/C++ APIs
Version: 3.3.3
Requires: emi-gridsite-openssl
Libs: -L${libdir} -lglite_wms_wmproxy_api_cpp
Cflags: -I${includedir}

queryDb has 2 handling user's options (see ggus ticket for more info) (https://savannah.cern.ch/bugs/?86267https://savannah.cern.ch/bugs/?86267) PRE-CERTIFIED (alvise). The way to verify it has been the same described (by me) in the related ticket: https://ggus.eu/tech/ticket_show.php?ticket=73658.

glite-wms-job-list-match --help show an un-implemented (and useless) option "--default-jdl" (https://savannah.cern.ch/bugs/?87444https://savannah.cern.ch/bugs/?87444) PRE-CERTIFIED (alvise)

The command glite-wms-job-list-match --help doesn't show that option anymore.

EMI WMS wmproxy rpm doesn't set execution permissions as it used to do in gLite (https://savannah.cern.ch/bugs/?89506https://savannah.cern.ch/bugs/?89506) PRE-CERTIFIED (alvise):

[root@devel09 ~]# ll /usr/sbin/glite_wms_wmproxy_load_monitor /usr/bin/glite_wms_wmproxy_server /usr/bin/glite-wms-wmproxy-purge-proxycache /usr/libexec/glite_wms_wmproxy_dirmanager
-rwxr-xr-x 1 nobody nobody    1876 Mar  2 15:14 /usr/bin/glite-wms-wmproxy-purge-proxycache
-rwxr-xr-x 1 nobody nobody 3059020 Mar  2 15:14 /usr/bin/glite_wms_wmproxy_server
-rwsr-xr-x 1 nobody nobody   22637 Mar  2 15:14 /usr/libexec/glite_wms_wmproxy_dirmanager
-rwsr-xr-x 1 nobody nobody   22915 Mar  2 15:14 /usr/sbin/glite_wms_wmproxy_load_monitor

no root:root and not suid bit either

yaim-wms creates wms.proxy in wrong path (https://savannah.cern.ch/bugs/?90129 https://savannah.cern.ch/bugs/?90129) PRE-CERTIFIED (alvise)

the path of wms.proxy seems to be correct now:

[root@devel09 ~]# ll ${WMS_LOCATION_VAR}/glite/wms.proxy 
-r-------- 1 glite glite 2824 Mar 14 12:00 /var/glite/wms.proxy
[root@devel09 ~]# ll ${WMS_LOCATION_VAR}/wms.proxy 
ls: /var/wms.proxy: No such file or directory

ICE log verbosity should be reduced to 300 (https://savannah.cern.ch/bugs/?91078https://savannah.cern.ch/bugs/?91078) PRE-CERTIFIED (alvise):

[root@devel09 etc]# grep ice_log_level /etc/glite-wms/glite_wms.conf
    ice_log_level   =   300;

move lcmaps.log from /var/log/glite to WMS_LOCATION_LOG (https://savannah.cern.ch/bugs/?91484https://savannah.cern.ch/bugs/?91484) PRE-CERTIFIED (alvise):


[root@devel09 etc]# ll $WMS_LOCATION_LOG/lcmaps.log ; ll /var/log/glite/lcmaps.log
-rw-r--r-- 1 glite glite 588 Mar 19 09:41 /var/log/wms/lcmaps.log
ls: /var/log/glite/lcmaps.log: No such file or directory

WMS: use logrotate uniformly in ice, lm, jc, wm, wmp (https://savannah.cern.ch/bugs/?91486) PRE-CERTIFIED (22/03/12 mcecchi)

logrotate disappeared from crom jobs

[root@devel09 ~]# grep -r rotate /etc/cron.d
[root@devel09 ~]# 

because it is consistently managed here:

[root@devel09 ~]# ll /etc/logrotate.d/
total 96
-rw-r--r-- 1 root root 111 Mar 22 15:40 argus
-rw-r--r-- 1 root root 106 Mar 10 00:13 bdii
-rw-r--r-- 1 root root 109 Mar 22 15:39 fetch-crl
-rw-r--r-- 1 root root 194 Mar 10 11:23 glite-lb-server
-rw-r--r-- 1 root root 128 Mar 22 15:40 glite-wms-purger
-rw-r--r-- 1 root root 240 Nov 22 05:06 globus-gridftp
-rw-r--r-- 1 root root 167 Feb 27 20:09 httpd
-rw-r--r-- 1 root root 109 Mar 22 15:40 ice
-rw-r--r-- 1 root root 126 Mar 22 15:40 jc
-rw-r--r-- 1 root root  83 Mar 10 06:34 kill-stale-ftp
-rw-r--r-- 1 root root 112 Mar 22 15:40 lcmaps
-rw-r--r-- 1 root root 123 Mar 22 15:40 lm
-rw-r--r-- 1 root root 129 Mar 22 15:40 wm
-rw-r--r-- 1 root root 192 Mar 22 15:40 wmproxy

remove several dismissed parameters from the WMS configuration (https://savannah.cern.ch/bugs/?91488https://savannah.cern.ch/bugs/?91488) PRE-CERTIFIED (alvise)

verified that the params cited in the savannah bug are missing in the glite_wms.conf (this command grep -E 'log_file_max_size|log_rotation_base_file|log_rotation_max_file_number|ice.input_type|wmp.input_type|wmp.locallogger|wm.dispatcher_type|wm.enable_bulk_mm|wm.ism_ii_ldapsearch_async' /etc/glite-wms/glite_wms.conf did't produce any output).

WMS needs cron job to kill stale GridFTP processes (https://savannah.cern.ch/bugs/?67489DONE) PRE-CERTIFIED (alvise, 27/03/2012)

rebuilt the RPM kill-stale-ftp from branch; installed on devel09.

[root@devel09 ~]# cat /etc/cron.d/kill-stale-ftp.cron 
PATH=/sbin:/bin:/usr/sbin:/usr/bin
5,15,25,35,45,55 * * * * root /sbin/kill-stale-ftp.sh >> /var/log/kill-stale-ftp.log 2>&1
[root@devel09 ~]# ll /sbin/kill-stale-ftp.sh
-rwxr-xr-x 1 root root 841 Mar 27 12:29 /sbin/kill-stale-ftp.sh
now the path is correct with my commit of today (27/03/2012). Moreover now the script seems to be working (when invoked by the cron):
[root@devel09 ~]# tail -2 /var/log/kill-stale-ftp.log
=== START Tue Mar 27 14:05:01 CEST 2012 PID 6617
=== READY Tue Mar 27 14:05:01 CEST 2012 PID 6617

WMS UI depends on a buggy libtar (on SL5 at least) (https://savannah.cern.ch/bugs/?89443) PRE-CERTIFIED (alvise, 28/03/2012). Tried this JDL:

dorigoa@lxgrid05 16:01:39 ~/emi/wmsui_emi2>cat ~/JDLs/WMS/wms_test_tar_bug.jdl
[ 
AllowZippedISB = true;
Executable = "/bin/ls" ; 
Arguments = "-lha " ; 
Stdoutput = "ls.out" ; 
InputSandbox = {"isb1", "isb2","isb3", "temp/isb4"}; 
OutputSandbox = { ".BrokerInfo", "ls.out"} ; 
Retrycount = 2; 
ShallowRetryCount = -1; 
usertags = [ bug = "#82687" ]; 
VirtualOrganisation="dteam"; 
]
dorigoa@lxgrid05 16:01:41 ~/emi/wmsui_emi2>stage/usr/bin/glite-wms-job-submit --debug -a -e https://devel09.cnaf.infn.it:7443/glite_wms_wmproxy_server ~/JDLs/WMS/wms_test_tar_bug.jdl
[ ... ]
28 March 2012, 16:02:21 -I- PID: 14236 (Debug) - File Transfer (gsiftp) 
 Command: /usr/bin/globus-url-copy
Source: file:///tmp/ISBfiles_YaMkV2gJJbddD38QUNR5DA_0.tar.gz
Destination: gsiftp://devel09.cnaf.infn.it:2811/var/SandboxDir/p_/https_3a_2f_2fdevel09.cnaf.infn.it_3a9000_2fp_5fNucDphCF_5fynIE-0XKnxg/input/ISBfiles_YaMkV2gJJbddD38QUNR5DA_0.tar.gz
-----------------------------------------
-----------------------------------------
28 March 2012, 16:02:22 -I- PID: 14236 (Debug) - File Transfer (gsiftp) Transfer successfully done
[ ... ]
So the file .tar.gz has been correctly created/transferred/removed. Verified that the source code does not use anymore the libtar's functions:
dorigoa@lxgrid05 16:11:05 ~/emi/wmsui_emi2>grep -r libtar emi.wms-ui.wms-ui-commands/src/
emi.wms-ui.wms-ui-commands/src/utilities/options_utils.cpp:* of the archiving tool (libtar; if zipped feature is allowed).
emi.wms-ui.wms-ui-commands/src/utilities/options_utils.h:               * of the archiving tool (libtar; if zipped feature is allowed).
emi.wms-ui.wms-ui-commands/src/services/jobsubmit.cpp~://#include "libtar.h"
emi.wms-ui.wms-ui-commands/src/services/jobsubmit.cpp://#include "libtar.h"

Complete procedure to verify the bug and verify the fix (as reported in the savannah bug):

- Have root or sudo access to a UI with EMI1 installation 
- create the path /home/alex/J0 
- create the NON empty files: 
-bash-3.2# cd /home/alex/J0 
-bash-3.2# ls -l 
total 12 
-rw-r--r-- 1 root root 413 May 11 14:38 hoco_ltsh.e 
-rw-r--r-- 1 root root 413 May 11 14:38 ltsh.sh 
-rw-r--r-- 1 root root 413 May 11 14:38 plantilla_venus.dat 
(make sure they are world-readable) 
- Create this JDL file: 
[dorigoa@cream-12 ~]$ cat JDLs/WMS/JDL_bug_89443.jdl 
[ 
StdOutput = "myjob.out"; 
ShallowRetryCount = 10; 
SignificantAttributes = { "Requirements","Rank","FuzzyRank" }; 
RetryCount = 3; 
Executable = "ltsh.sh"; 
Type = "job"; 
Arguments = "hoco_ltsh.e 0 1 200 114611111"; 
AllowZippedISB = true; 
VirtualOrganisation = "gridit"; 
JobType = "normal"; 
DefaultRank = -other.GlueCEStateEstimatedResponseTime; 
ZippedISB = { "ISBfiles_rjKoznMzsjvH6Nuvp0AhMQ_0.tar.gz" }; 
OutputSandbox = { "myjob.out","myjob.err","out.tar.gz" }; 
InputSandbox = { "file:///home/alex/J0/plantilla_venu...","file:///home/alex/J0/ltsh.sh","file:///home/alex/J0/hoco_ltsh.e" }; 
StdError = "myjob.err"; 
rank = -other.GlueCEStateEstimatedResponseTime; 
MyProxyServer = "myproxy.cnaf.infn.it"; 
requirements = ( regexp("ng-ce.grid.unipg.it:8443/cream-pbs-grid",other.GlueCEUniqueID) )&& ( other.GlueCEStateStatus == "Production" ) 
] 
Do not change anything in it; it must be submitted "as is". 
Submit this JDL with this command: 
$ glite-wms-job-submit --register-only -a --debug -e https://prod-wms-01.ct.infn.it:7443... <YOUR_JDL_CREATED_IN_THE_PREVIOUS_STEP> >&! log 
>&! is to redirect out/err in a file with tcsh SHELL; change it accordingly to your SHELL. 

Then grep ZIP in the log just created: 
11 May 2012, 15:26:14 -I- PID: 11561 (Debug) - ISB ZIPPED file successfully created: /tmp/ISBfiles_ZT4DysizXpjHOT-hmzQf2A_0.tar.gz 
ISB ZIP file : /tmp/ISBfiles_ZT4DysizXpjHOT-hmzQf2A_0.tar.gz 
Decompress it: 
dorigoa@cream-12 15:29:34 ~/JDLs/WMS>gunzip /tmp/ISBfiles_QPGfonkfQyOTbXa6uDpnZQ_0.tar.gz 
dorigoa@cream-12 15:29:38 ~/JDLs/WMS>tar tvf /tmp/ISBfiles_QPGfonkfQyOTbXa6uDpnZQ_0.tar 
-rw-r--r-- root/root 413 2012-05-11 14:38:09 SandboxDir/nJ/https_3a_2f_2fprod-wms-01.ct.infn.it_3a9000_2fnJmsSZ3XIaff3gbwkF0TVQ/input/plantilla_venus.dat 
-rw-r--r-- root/root 413 2012-05-11 14:38:06 SandboxDir/nJ/https_3a_2f_2fprod-wms-01.ct.infn.it_3a9000_2fnJmsSZ3XIaff3gbwkF0TVQ/input/ltsh.sh 
-rw-r--r-- root/root 413 2012-05-11 14:38:12 SandboxDir/nJ/https_3a_2f_2fprod-wms-01.ct.infn.it_3a9000_2fnJmsSZ3XIaff3gbwkF0TVQ/input/hoco_ltsh. 
You can see that the filename hoco.ltsh.e has been truncated (hoco.ltsh.) in the archive. 
Repeat the same procedure on a UI EMI2; the output will change a bit for what concern the location of the file ISB.....tar.gz, but you will have again to unzip it and verify it with "tar tvf"; you should see that the last file is not truncated anymore.

ICE should use env vars in its configuration (https://savannah.cern.ch/bugs/?90830) PRE-CERTIFIED (alvise, 29/03/2012).

Check glite_wms.conf:

[root@devel09 siteinfo]# grep -E 'persist_dir|Input|ice_host_cert|ice_host_key' /etc/glite-wms/glite_wms.conf
    ice_host_cert   =   "${GLITE_HOST_CERT}";
    Input   =   "${WMS_LOCATION_VAR}/ice/jobdir";
    persist_dir   =   "${WMS_LOCATION_VAR}/ice/persist_dir";
    ice_host_key   =   "${GLITE_HOST_KEY}";
cron job deletes /var/proxycache (https://savannah.cern.ch/bugs/?90640) PRE-CERTIFIED (alvise, 29/03/2012). Verified the usage of "-mindepth 1" as explained in the bug's comment on savannah:
[root@devel09 cron.d]# grep proxycache *
glite-wms-wmproxy-purge-proxycache.cron:0 */6 * * * root . /usr/libexec/grid-env.sh ; /usr/bin/glite-wms-wmproxy-purge-proxycache /var/proxycache > /var/log/wms/glite-wms-wmproxy-purge-proxycache.log 2>&1

[root@devel09 cron.d]# grep find /usr/bin/glite-wms-wmproxy-purge-proxycache
find $1 -mindepth 1 -cmin +60 > $tmp_file
ICE jobdir issue - 1 bad CE can block all jobs (https://savannah.cern.ch/bugs/?80751) PRE-CERTIFIED (alvise, 29/03/2012). This is HOPEFULLY FIXED; I verified that the source code fixing the problem is there, but it is very difficult to test it, because it is needed to simulate a CE continuously going in connection timeout.

EMI-1 WMS does not propagate user job exit code (https://savannah.cern.ch/bugs/?92922) PRE-CERTIFIED (mcecchi, 7/5/2012)

Submitted this job:

[
Executable = "/bin/false";
Arguments = "";
StdOutput = "out.log";
StdError = "err.log";
InputSandbox = {};
OutputSandbox = {};
myproxyserver="";
requirements = !RegExp("cream.*", other.GlueCEUniqueID);
RetryCount = 0;
ShallowRetryCount = 1;
]

and got:

======================= glite-wms-job-status Success =====================
BOOKKEEPING INFORMATION:

Status info for the Job : https://devel16.cnaf.infn.it:9000/SWI6lM88RZ0noSpQhmz1EQ
Current Status:     Done (Exit Code !=0)
Exit code:          1
Status Reason:      Warning: job exit code != 0
Destination:        lyogrid02.in2p3.fr:2119/jobmanager-pbs-dteam
Submitted:          Thu Jun  7 11:06:19 2012 CEST
==========================================================================

glite_wms_wmproxy_server segfaults after job registration failure (https://savannah.cern.ch/bugs/?94845) PRE-CERTIFIED (mcecchi, 18/6/2012)

This has been checked at large in all the tests. Submit a collection and check that is properly registered.

LB failover mechanism in WMproxy needs to be reviewed (https://savannah.cern.ch/bugs/?90034) PRE-CERTIFIED (mcecchi, 18/6/2012)

In the WorkloadManagerProxy conf, put an invalid url for the first LB in vector:

LBServer = {"aaa.cnaf.infn.it:9000", "devel09.cnaf.infn.it:9000"};

Submit a job from the UI, after a while, you will see that the second LB in the vector is picked up:

[mcecchi@ui ~]$ glite-wms-job-submit -a --endpoint https://devel09.cnaf.infn.it:7443/glite_wms_wmproxy_server coll_1.jdl 

Connecting to the service https://devel09.cnaf.infn.it:7443/glite_wms_wmproxy_server


====================== glite-wms-job-submit Success ======================

The job has been successfully submitted to the WMProxy
Your job identifier is:

https://devel09.cnaf.infn.it:9000/T3olEsUGW4Tgi-EAp4hUcw

==========================================================================

Cancellation of a dag's node doesn't work (https://savannah.cern.ch/bugs/?81651) PRE-CERTIFIED (mcecchi, 18/6/2012)

[mcecchi@ui ~]$ glite-wms-job-cancel https://devel09.cnaf.infn.it:9000/UE6W4uCneruHXY4R05azWg

Are you sure you want to remove specified job(s) [y/n]y : y

Connecting to the service https://devel09.cnaf.infn.it:7443/glite_wms_wmproxy_server


============================= glite-wms-job-cancel Success =============================

The cancellation request has been successfully submitted for the following job(s):

- https://devel09.cnaf.infn.it:9000/UE6W4uCneruHXY4R05azWg

========================================================================================

[mcecchi@ui ~]$ glite-wms-job-status https://devel09.cnaf.infn.it:9000/u-ig_gAOkNyOaHMYf2kyHA


======================= glite-wms-job-status Success =====================
BOOKKEEPING INFORMATION:

Status info for the Job : https://devel09.cnaf.infn.it:9000/u-ig_gAOkNyOaHMYf2kyHA
Current Status:     Running 
Submitted:          Mon Jun 18 12:27:14 2012 CEST
==========================================================================

- Nodes information for: 
    Status info for the Job : https://devel09.cnaf.infn.it:9000/UE6W4uCneruHXY4R05azWg
    Current Status:     Running 
    Status Reason:      unavailable
    Destination:        ce03.ific.uv.es:8443/cream-pbs-short
    Submitted:          Mon Jun 18 12:27:14 2012 CEST
==========================================================================
    
[mcecchi@ui ~]$ glite-wms-job-status https://devel09.cnaf.infn.it:9000/u-ig_gAOkNyOaHMYf2kyHA


======================= glite-wms-job-status Success =====================
BOOKKEEPING INFORMATION:

Status info for the Job : https://devel09.cnaf.infn.it:9000/u-ig_gAOkNyOaHMYf2kyHA
Current Status:     Cleared 
Submitted:          Mon Jun 18 12:27:14 2012 CEST
==========================================================================

- Nodes information for: 
    Status info for the Job : https://devel09.cnaf.infn.it:9000/UE6W4uCneruHXY4R05azWg
    Current Status:     Cancelled 
    Logged Reason(s):
        - Cancelled by user
    Status Reason:      Cancelled by user
    Destination:        ce03.ific.uv.es:8443/cream-pbs-short
    Submitted:          Mon Jun 18 12:27:14 2012 CEST
==========================================================================

EMI WMS WM might abort resubmitted jobs (https://savannah.cern.ch/bugs/?89508) PRE-CERTIFIED (mcecchi, 19/6/2012)

This problem was caused by the use of a throwing statement to read the value of GlueCEInfoHostName in the CE Ad. Now a non throwing one is used:

ce_ad->EvaluateAttrString("GlueCEInfoHostName", ceinfohostname);

To test it, submit to a queue that does not publish GlueCEInfoHostName. The submission should run fine, without giving this error:

ClassAd error: attribute "GlueCEInfoHostName" does not exist or has the wrong type (expecting "std::string"))

WMProxy code requires FQANs (https://savannah.cern.ch/bugs/?72169) PRE-CERTIFIED (mcecchi, 19/6/2012)

The WMP code has been changed as requested in the ticket

// gacl file has a valid entry for user proxy without fqan
if (execDN || execAU){
exec = execDN = execAU = true;
}

Various issues found while testing

18/04/2012

1) [dorigoa@cream-12 ~]$ glite-wms-job-submit -c ~/JDLs/WMS/wmp_devel09.conf -a ~/JDLs/WMS/wms.jdl

Connecting to the service https://devel09.cnaf.infn.it:7443/glite_wms_wmproxy_server

Warning - Unable to submit the job to the service: https://devel09.cnaf.infn.it:7443/glite_wms_wmproxy_server

Proxy file doesn't exist or has bad permissions

Error code: SOAP-ENV:Server

FIXED 17/04/12, commit in wmproxyThe problem is due to the recent authN/Z restructuring and occurs on a jobSubmit operation (i.e. a job submitted without ISB)

May 4 2012:

2) load_monitor gives: Can't do setuid (cannot exec sperl)

FIXED to be added dep perl-suidperl in MP

Using repo from 9/7/2012:

  dependency: perl-suidperl
   provider: perl-suidperl.i386 4:5.8.8-32.el5_7.6
   provider: perl-suidperl.x86_64 4:5.8.8-32.el5_7.6
   provider: perl-suidperl.x86_64 4:5.8.8-32.el5_6.3

3) with 'AsyncJobStart = false;' wmp crashes every second submission. problem is wherever there is a:

if (conf.getAsyncJobStart()) { // \/ Copy environment and restore it right after FCGI_Finish char** backupenv = copyEnvironment(environ); FCGI_Finish(); // returns control to client environ = backupenv; // /_ From here on, execution is asynchronous }

/usr/bin/glite_wms_wmproxy_server
/lib64/libpthread.so.0
/lib64/libc.so.6(gsignal+0x35)
/lib64/libc.so.6(abort+0x110)
/lib64/libc.so.6
/lib64/libc.so.6
/lib64/libc.so.6(cfree+0x4b)
classad::ClassAd::~ClassAd()
glite::jdl::Ad::~Ad()
jobStart(jobStartResponse&, std::string const&, soap*)
ns1__jobStart(soap*, std::string, ns1__jobStartResponse&)
soap_serve_ns1__jobStart(soap*)
soap_serve_request(soap*)
glite::wms::wmproxy::server::WMProxyServe::wmproxy_soap_serve(soap*)
glite::wms::wmproxy::server::WMProxyServe::serve()
/usr/bin/glite_wms_wmproxy_server(main+0x667)
/lib64/libc.so.6(__libc_start_main+0xf4)
glite::wmsutils::exception::Exception::getStackTrace()

FIXED 14/05/12, commit in wmproxy (copyEnvironment)

4) submitting dag1.jdl getSandboxBulkDestURI(getSandboxBulkDestURIResponse&, std::string const&, std::string const&) ns1__getSandboxBulkDestURI(soap*, std::string, std::string, ns1__getSandboxBulkDestURIResponse&) soap_serve_ns1__getSandboxBulkDestURI(soap*) soap_serve_request(soap*) glite::wms::wmproxy::server::WMProxyServe::wmproxy_soap_serve(soap*) glite::wms::wmproxy::server::WMProxyServe::serve() /usr/bin/glite_wms_wmproxy_server(main+0x667) /lib64/libc.so.6(__libc_start_main+0xf4) glite::wmsutils::exception::Exception::getStackTrace()

FIXED 14/05/12 by an update in LB bkserver from the latest RC

[mcecchi@ui ~]$ glite-wms-job-submit -a --endpoint https://devel09.cnaf.infn.it:7443/glite_wms_wmproxy_server dag1.jdl 

Connecting to the service https://devel09.cnaf.infn.it:7443/glite_wms_wmproxy_server


====================== glite-wms-job-submit Success ======================

The job has been successfully submitted to the WMProxy
Your job identifier is:

https://devel09.cnaf.infn.it:9000/RDzM29cERl8imVCMnFoRyA

==========================================================================

5) WM DOES NOT MATCH ANYTHING

FIXED

fixed by moving requirements only on WmsRequirements on the JDL. No requirements on ce_ad and no need for symmetric_match anymore. The expression is now: WmsRequirements = ((ShortDeadlineJob ? TRUE ? RegExp(".*sdj$", other.GlueCEUniqueID) : RegExp(".*sdj$", other.GlueCEUniqueID)) && (other.GlueCEPolicyMaxTotalJobs = 0 || other.GlueCEStateTotalJobs < other.GlueCEPolicyMaxTotalJobs) && (EnableWmsFeedback =? TRUE ? RegExp("cream", other.GlueCEImplementationName, "i") : true) && (member(CertificateSubject,other.GlueCEAccessControlBaseRule) || member(strcat("VO:",VirtualOrganisation),other.GlueCEAccessControlBaseRule) || FQANmember(strcat("VOMS:", VOMS_FQAN),other.GlueCEAcc essControlBaseRule)) && FQANmember(strcat("DENY:",VOMS_FQAN),other.GlueCEAccessControlBaseRule) && (IsUndefined(other.OutputSE) || member(other.OutputSE,GlueCESEBindGroupSEUniqu eID)));

6) Load script on wmproxy has security issues:

Insecure $ENV{PATH} while running setuid at /usr/sbin/glite_wms_wmproxy_load_monitor line 26.

FIXED, commit in wmp in wmproxy, load script interpreted with perl -U

TESTED on 16/5/2012

16 May, 16:13:55 -D- PID: 6886 - "wmpcommon::callLoadScriptFile": Executing command:  /usr/sbin/glite_wms_wmproxy_load_monitor --oper jobRegister --load1 22 --load5 20 --load15 18 --memusage 99 --diskusage 95 --fdnum 1000 --jdnum 1500 --ftpconn 300
16 May, 16:13:55 -D- PID: 6886 - "wmpcommon::callLoadScriptFile": Executing load script file: /usr/sbin/glite_wms_wmproxy_load_monitor

22/05/12. INSTALLING FROM EMI2 RC4

7) [root@devel09 ~]# /usr/bin/glite-wms-workload_manager 22 May, 11:51:06 -I: [Info] main(main.cpp:289): This is the gLite Workload Manager, running with pid 2454 22 May, 11:51:06 -I: [Info] main(main.cpp:297): loading broker dll libglite_wms_helper_broker_ism.so cannot load dynamic library libglite_wms_helper_broker_ism.so: /usr/lib64/libgsoap++.so.0: undefined symbol: soap_faultstring

25/5/12 FIXED, commit in broker-info Makefile.am

22/05/12

8) slower MM after authZcheck by conf? (0/4310 [1] )

FIXED

15 Jun, 10:07:19 -I: [Info] checkRequirement(/home/mcecchi/34/emi.wms.wms-matchmaking/src/matchmakerISMImpl.cpp:105): MM for job: https://devel09.cnaf.infn.it:9000/0LXzbbGARFooXwYBAPOXEQ (0/1145 [0] )

fixed using artefacts from remote build

9) Submission to CREAM DOES NOT WORK with collections and dags

FIXED by commit in wmproxy (setJobFileSystem)

        - Cannot move ISB (retry_copy ${globus_transfer_cmd}
gsiftp://devel09.cnaf.infn.it:2811/var/SandboxDir/tu/https_3a_2f_2fdevel09.cnaf.infn.it_3a9000_2ftutvLp_5fPTFrLUqQH4OSS-A/input/Test.sh
file:///scratch/9462489.1.medium/home_crm07_232749015/CREAM232749015/Test.sh):

error: globus_ftp_client: the server responded with an error
500 500-Command failed. : globus_l_gfs_file_open failed.
500-globus_xio: Unable to open file
/var/SandboxDir/tu/https_3a_2f_2fdevel09.cnaf.infn.it_3a9000_2ftutvLp_5fPTFrLUqQH4OSS-A/input/Test.sh
500-globus_xio: System error in open: Permission denied
500-globus_xio: A system call failed: Permission denied
500 End.
        - Cannot move ISB (retry_copy ${globus_transfer_cmd}
gsiftp://devel09.cnaf.infn.it:2811/var/SandboxDir/tu/https_3a_2f_2fdevel09.cnaf.infn.it_3a9000_2ftutvLp_5fPTFrLUqQH4OSS-A/input/Test.sh
file:///scratch/9462489.1.medium/home_crm07_232749015/CREAM232749015/Test.sh):
error: globus_ftp_client: the server responded with an error500
500-Command failed. : globus_l_gfs_file_open failed.500-globus_xio:
Unable to open file
/var/SandboxDir/tu/https_3a_2f_2fdevel09.cnaf.infn.it_3a9000_2ftutvLp_5fPTFrLUqQH4OSS-A/input/Test.sh500-globus_xio:
System error in open: Permission denied500-globus_xio: A system call
failed: Permission denied500 End.; Cannot move ISB (retry_copy
${globus_transfer_cmd}
gsiftp://devel09.cnaf.infn.it:2811/var/SandboxDir/tu/https_3a_2f_2fdevel09.cnaf.infn.it_3a9000_2ftutvLp_5fPTFrLUqQH4OSS-A/input/Test.sh
file:///scratch/9462489.1.medium/home_crm07_232749015/CREAM232749015/Test.sh):
error: globus_ftp_client: the server responded with an error 500
500-Command failed. : globus_l_gfs_file_open failed.  500-globus_xio:
Unable to open file
/var/SandboxDir/tu/https_3a_2f_2fdevel09.cnaf.infn.it_3a9000_2ftutvLp_5fPTFrLUqQH4OSS-A/input/Test.sh
 500-globus_xio: System error in open: Permission denied
500-globus_xio: A system call failed: Permission denied  500 End.
    Status Reason:       failed (LB query failed)

10) Proxy exception: Unable to get Not Before date from Proxy

It happens on submission when wmp serving processes restart FIXED committed in wmproxy (initwmp before authZ)

/var/log/wms/wmproxy.log-18 Jun, 10:14:31 -I- PID: 22575 - "wmproxy::main": Maximum core request count reached: 50
/var/log/wms/wmproxy.log-18 Jun, 10:14:31 -I- PID: 22575 - "wmproxy::main": Exiting WM proxy serving process ...
/var/log/wms/wmproxy.log-18 Jun, 10:14:31 -I- PID: 24999 - "wmproxy::main": ------- Starting Server Instance -------
/var/log/wms/wmproxy.log-18 Jun, 10:14:31 -I- PID: 24999 - "wmproxy::main": WM proxy serving process started
/var/log/wms/wmproxy.log-18 Jun, 10:14:31 -I- PID: 24999 - "wmproxy::main": ---------------------------------------

Reason is that Sandboxdir misses from the path

15 Jun, 12:08:23 -D- PID: 31416 - "WMPAuthorizer::checkProxyValidity": Proxy path: /var//ji/https_3a_2f_2fdevel09.cnaf.infn.it_3a9000_2fji-MZN8xuc8UXCvqceHaHQ/user.proxy
15 Jun, 12:08:23 -E- PID: 31416 - "WMPAuthorizer::getNotBefore": Unable to get Not Before date from Proxy

11) wms-wm stop doesn't delete pid file (and check-daemons kicks in)

FIXED by commit in wm init script

12) Configuration of Condor 7.8.0

FIXED by commit in yaim

13) Build against Condor 7.8.0 FIXED commit in jobsubmission

fix NEEDED in condorg.pc to build from FHS condor

14) Restarting /usr/bin/glite_wms_wmproxy_server... -bash: condorc-initialize: command not found

condorc-initialize was meant for the glite CE. Not needed.

in /opt/glite/yaim/functions/config_gliteservices_wms FIXED commit in yaim

15) Job to JC always stay in Running

FIXED by commit in jobsubmission (was due to the recent cleanup)

16) Crashes in WM with GLUE 2.0 purchaser enabled FIXED (among many others) by commit in ISM

a) glite-wms-workload_manager: /usr/include/boost/shared_ptr.hpp:247: typename boost::detail::shared_ptr_traits::reference boost::shared_ptr::operator*() const [with T = classad::ClassAd]: Assertion `px = 0' failed.

b) 15 Jun, 10:25:30 -D: [Debug] fetch_bdii_ce_info_g2(ldap-utils-g2.cpp:1290): #6993 LDAP entries received in 12 seconds 15 Jun, 10:25:30 -E: [Error] handle_synch_signal(signal_handling.cpp:77): Got a synchronous signal (6), stack trace: /usr/bin/glite-wms-workload_manager /lib64/libpthread.so.0 /lib64/libc.so.6(gsignal+0x35) /lib64/libc.so.6(abort+0x110) /lib64/libc.so.6(__assert_fail+0xf6) boost::shared_ptr<classad::ClassAd>::operator*() const /usr/lib64/libglite_wms_ism_ii_g2_purchaser.so.0(_ZN5glite3wms3ism9purchaser21fetch_bdii_ce_info_g2ERKSsmS4_lS4_RSt3mapISsN5boo /usr/lib64/libglite_wms_ism_ii_g2_purchaser.so.0(_ZN5glite3wms3ism9purchaser18fetch_bdii_info_g2ERKSsmS4_lS4_RSt3mapISsN5boost1 glite::wms::ism::purchaser::ism_ii_g2_purchaser::operator()() void boost::_mfi::mf0<void, glite::wms::ism::purchaser::ism_purchaser>::call<boost::shared_ptr<glite::wms::ism::purchaser::ism_purchaser> >(boost::shared_ptr<glite::wms::ism::purchaser::ism_purchaser>&, void const*) const void boost::_mfi::mf0<void, glite::wms::ism::purchaser::ism_purchaser>::operator()<boost::shared_ptr<glite::wms::ism::purchaser::ism_purchaser> >(boost::shared_ptr<glite::wms::ism::purchaser::ism_purchaser>&) const /usr/bin/glite-wms-workload_manager(_ZN5boost3_bi5list1INS0_5valueINS_10shared_ptrIN5glite3wms3ism9purchaser13ism_purchaserEEEE /usr/bin/glite-wms-workload_manager(_ZN5boost3_bi6bind_tIvNS_4_mfi3mf0IvN5glite3wms3ism9purchaser13ism_purchaserEEENS0_5list1IN /usr/bin/glite-wms-workload_manager(_ZN5boost6detail8function26void_function_obj_invoker0INS_3_bi6bind_tIvNS_4_mfi3mf0IvN5glite boost::function0<void, std::allocator >::operator()() const /usr/bin/glite-wms-workload_manager /usr/bin/glite-wms-workload_manager boost::function0<void, std::allocator >::operator()() const glite::wms::manager::server::Events::run() boost::_mfi::mf0<void, glite::wms::manager::server::Events>::operator()(glite::wms::manager::server::Events*) const

17) Crash in LM FIXED commit in CVS (SubmitHost not allocated by Condor APIs)

Program received signal SIGSEGV, Segmentation fault. 0x0000003d02e782b0 in strlen () from /lib64/libc.so.6 (gdb) bt #0 0x0000003d02e782b0 in strlen () from /lib64/libc.so.6 #1 0x00000034e489c500 in std::basic_string<char, std::char_traits, std::allocator >::basic_string(char const*, std::allocator const&) () from /usr/lib64/libstdc++.so.6 #2 0x0000003969e56708 in glite::wms::jobsubmission::logmonitor::processer::EventSubmit::process_event() () from /usr/lib64/libglite_wms_jss_logmonitor.so.0 #3 0x0000003969e30526 in glite::wms::jobsubmission::logmonitor::CondorMonitor::process_next_event() () from /usr/lib64/libglite_wms_jss_logmonitor.so.0 #4 0x000000000040fc18 in glite::wms::jobsubmission::daemons::MonitorLoop::run() () #5 0x000000000040c39a in (anonymous namespace)::run_instance(std::basic_string<char, std::char_traits, std::allocator > const&, glite::wms::common::utilities::LineParser const&, std::auto_ptr<glite::wms::jobsubmission::jccommon::LockFile>&, glite::wms::jobsubmission::daemons::MonitorLoop::run_code_t&) () #6 0x000000000040c72b in main ()

18) Scheduled status missing in jc/lm FIXED

Commited in LM, removed getSubmitHost()

19) Condor 7.8.0 rpm FIXED changed spec file in condor-emi-7.8.0

removed libclassad.so, executables (classad_version, ...), include LEFT libclassad_7_8_0.so, libclassad.so.3

it conflicts with classads and classads-devel

20) proxy renewal INVALID this is a problem of devel09, it works ok on a clean installation

this error: glite_renewal_RegisterProxy Exit code: 13 LB[Proxy] Error not available (empty messages) 15 Jun, 15:55:26 -S- PID: 26528 - "WMPEventLogger::registerProxyRenewal": Register job failed glite_renewal_RegisterProxy Exit code: 13 LB[Proxy] Error not available (empty messages)

means:

[root@devel09 ~]# /etc/init.d/glite-proxy-renewald status glite-proxy-renewd not running

21) UI BUG: zipped isb doesnt work with dags INVALID seems to have been fixed by some commit in the wm.

15 Jun, 17:11:44 -D- PID: 16909 - "wmputils::getDN_SSL": Getting user DN...
15 Jun, 17:11:44 -D- PID: 16909 - "wmputils::getDN_SSL": User DN: /C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Alvise Dorigo
15 Jun, 17:11:44 -D- PID: 16909 - "WMPEventlogger::registerSubJobs": Registering DAG subjobs to LB Proxy...
15 Jun, 17:11:44 -D- PID: 16909 - "WMPEventlogger::setLoggingJob": Setting job for logging to LB Proxy...
15 Jun, 17:11:44 -D- PID: 16909 - "wmputils::getDN_SSL": Getting user DN...
15 Jun, 17:11:44 -D- PID: 16909 - "wmputils::getDN_SSL": User DN: /C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Alvise Dorigo
15 Jun, 17:11:44 -D- PID: 16909 - "wmpcoreoperations::submit": registerSubJobs OK, writing flag file: /var/SandboxDir/M4/https_3a_2f_2fdevel09.cnaf.infn.it_3a9000_2fM4GwcN3q0KYZz-ed7SHHQA/.registersubjobsok
15 Jun, 17:11:44 -D- PID: 16909 - "wmpcoreoperations::submit": Uncompressing zip file: ISBfiles_1kZPt8uCtNxrEBxTLVHzlQ_0.tar.gz
15 Jun, 17:11:44 -D- PID: 16909 - "wmpcoreoperations::submit": Absolute path: /var/SandboxDir/M4/https_3a_2f_2fdevel09.cnaf.infn.it_3a9000_2fM4GwcN3q0KYZz-ed7SHHQA/input/ISBfiles_1kZPt8uCtNxrEBxTLVHzlQ_0.tar.gz
15 Jun, 17:11:44 -D- PID: 16909 - "wmpcoreoperations::submit": Target directory: /var
15 Jun, 17:11:44 -D- PID: 16909 - "wmputils::doExecv": Forking process...
15 Jun, 17:11:44 -D- PID: 16909 - "wmputils::doExecv": Parent PID wait: 16909 waiting for: 19967
15 Jun, 17:11:44 -D- PID: 16909 - "wmputils::doExecv": Parent PID after wait: 16909 waiting for: 19967
15 Jun, 17:11:44 -D- PID: 16909 - "wmputils::doExecv": Child wait succesfully (WIFEXITED(status))
15 Jun, 17:11:44 -D- PID: 16909 - "wmputils::doExecv": WEXITSTATUS(status): 2
15 Jun, 17:11:44 -S- PID: 16909 - "wmputils::doExecv": Child failure, exit code: 512
15 Jun, 17:11:44 -S- PID: 16909 - "wmputils::doExecv": Child failure, exit code: 512
15 Jun, 17:11:44 -C- PID: 16909 - "wmputils::untarFile": Unable to untar ISB file:/var/SandboxDir/M4/https_3a_2f_2fdevel09.cnaf.infn.it_3a9000_2fM4GwcN3q0KYZz-ed7SHHQA/input/ISBfiles_1kZPt8uCtNxrEBxTLVHzlQ_0.tar.gz
15 Jun, 17:11:44 -E- PID: 16909 - "wmpcoreoperations::submit": Logging LOG_ENQUEUE_FAIL, std::exception Unable to untar ISB file
(please contact server administrator)
15 Jun, 17:11:44 -D- PID: 16909 - "WMPEventlogger::logEvent": Logging to LB Proxy...
15 Jun, 17:11:44 -D- PID: 16909 - "WMPEventlogger::logEvent": Logging Enqueue FAIL event...
15 Jun, 17:11:44 -D- PID: 16909 - "wmpcoreoperations::submit": Removing lock...
15 Jun, 17:11:44 -D- PID: 16909 - "wmpgsoapoperations::ns1__jobStart": jobStart operation exception: The Operation is not allowed: Standard exception: Unable to untar ISB file
(please contact server administrator)
15 Jun, 17:11:44 -I- PID: 16909 - "wmpgsoapoperations::ns1__jobStart": ------------------------------- Fault description --------------------------------
15 Jun, 17:11:44 -I- PID: 16909 - "wmpgsoapoperations::ns1__jobStart": Method: jobStart
15 Jun, 17:11:44 -I- PID: 16909 - "wmpgsoapoperations::ns1__jobStart": Code: 900
15 Jun, 17:11:44 -I- PID: 16909 - "wmpgsoapoperations::ns1__jobStart": Description: The Operation is not allowed: Standard exception: Unable to untar ISB file
(please contact server administrator)
15 Jun, 17:11:44 -D- PID: 16909 - "wmpgsoapoperations::ns1__jobStart": Stack: 
15 Jun, 17:11:44 -D- PID: 16909 - "wmpgsoapoperations::ns1__jobStart": JobOperationException: The Operation is not allowed: Standard exception: Unable to untar ISB file
(please contact server administrator)
   at submit()[coreoperations.cpp:1726]
   at jobStart()[coreoperations.cpp:1838]

15 Jun, 17:11:44 -I- PID: 16909 - "wmpgsoapoperations::ns1__jobStart": ----------------------------------------------------------------------------------
15 Jun, 17:11:44 -D- PID: 16909 - "wmpgsoapoperations::ns1__jobStart": jobStart operation completed

22) fetch_bdii_ce_info: no such object NOT SEEN ANYMORE, seems to be a network error or something

[root@devel09 ~]# grep such /var/log/wms/workload_manager_events.log 15 Jun, 17:16:35 -W: [Warning] fetch_bdii_ce_info(ldap-utils.cpp:688): No such object 15 Jun, 17:17:56 -W: [Warning] fetch_bdii_se_info(ldap-utils.cpp:346): No such object

23) cancel status not properly managed by dagmanless FIXED commit in WM DAG engine

a job cancelled did not cause its consequent nodes to be aborted

24) cancellation to condor does not work FIXED commit in jobsubmission (restored irepository in JC)

18 Jun, 12:29:28 -D- cancelJob(...): Condor id of job was:
18 Jun, 12:29:28 -S- cancelJob(...): Job cancellation refused.
18 Jun, 12:29:28 -S- cancelJob(...): Condor ID =
18 Jun, 12:29:28 -S- cancelJob(...): Reason: "
Couldn't find/remove all jobs matching constraint (ClusterId== && ProcId==0 && JobStatus!=3)

no:

on devel11 (EMI-1):

18 Jun, 12:46:26 -I- ControllerLoop::run(): Got new remove request (JOB ID = https://devel11.cnaf.infn.it:9000/_UFQCsCUNGpVmZOU-q1-HQ)...
18 Jun, 12:46:26 -I- JobControllerReal::cancel(...): Asked to remove job: https://devel11.cnaf.infn.it:9000/_UFQCsCUNGpVmZOU-q1-HQ
18 Jun, 12:46:26 -I- cancelJob(...): Job has been succesfully removed.
18 Jun, 12:46:26 -V- JobControllerReal::cancel(...): Job https://devel11.cnaf.infn.it:9000/_UFQCsCUNGpVmZOU-q1-HQ successfully marked for removal.

25) check logrotation FIXED commit in yaim (copytruncate)

from lsof, LM was reading a log.1 file

even JC:

-rw-r--r-- 1 glite glite         0 Jun 19 04:02 jobcontroller_events.log
-rw-r--r-- 1 glite glite    349597 Jun 19 11:47 jobcontroller_events.log.1

[root@devel09 ~]# ps aux | grep job_
root      4193  0.0  0.0  61212   760 pts/0    S+   11:48   0:00 grep job_
glite    16261  0.0  0.1 226112  6616 ?        Ss   Jun18   0:00 /usr/bin/glite-wms-job_controller -c glite_wms.conf
[root@devel09 ~]# lsof -p 16261|grep .log
glite-wms 16261 glite  mem    REG  253,0   671439 17115535 /usr/lib64/libglite_wms_logger.so.0.0.0
glite-wms 16261 glite    3u   REG  253,0   349597 29524130 /var/log/wms/jobcontroller_events.log.1

26) first G2 sync purchasing is empty FIXED

First G2 sync purchasing had: config.ns()->ii_dn(),

instead of "o=glue"

27) new crash in G2 purchaser FIXED it's the same problem shown in point 28. The two purchasers cannot work together, as with the present design

19 Jun, 15:56:20 -E: [Error] handle_synch_signal(/home/mcecchi/34/emi.wms.wms-manager/src/signal_handling.cpp:77): Got a synchronous signal (11), stack trace:
/usr/bin/glite-wms-workload_manager
/lib64/libpthread.so.0
std::string::compare(std::string const&) const
bool std::operator< <char, std::char_traits<char>, std::allocator<char> >(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
std::less<std::string>::operator()(std::string const&, std::string const&) const
/usr/lib64/libglite_wms_ism_ii_g2_purchaser.so.0(_ZNSt8_Rb_treeISsSt4pairIKSsN5boost6tuples5tupleIiiNS2_10shared_ptrIN7classad7
/usr/lib64/libglite_wms_ism_ii_g2_purchaser.so.0(_ZNSt3mapISsN5boost6tuples5tupleIiiNS0_10shared_ptrIN7classad7ClassAdEEENS0_8f
/usr/lib64/libglite_wms_ism_ii_g2_purchaser.so.0
glite::wms::ism::purchaser::ism_ii_g2_purchaser::operator()()
void boost::_mfi::mf0<void, glite::wms::ism::purchaser::ism_purchaser>::call<boost::shared_ptr<glite::wms::ism::purchaser::ism_purchaser> >(boost::shared_ptr<glite::wms::ism::purchaser::ism_purchaser>&, void const*) const
void boost::_mfi::mf0<void, glite::wms::ism::purchaser::ism_purchaser>::operator()<boost::shared_ptr<glite::wms::ism::purchaser::ism_purchaser> >(boost::shared_ptr<glite::wms::ism::purchaser::ism_purchaser>&) const
/usr/bin/glite-wms-workload_manager(_ZN5boost3_bi5list1INS0_5valueINS_10shared_ptrIN5glite3wms3ism9purchaser13ism_purchaserEEEE
/usr/bin/glite-wms-workload_manager(_ZN5boost3_bi6bind_tIvNS_4_mfi3mf0IvN5glite3wms3ism9purchaser13ism_purchaserEEENS0_5list1IN
/usr/bin/glite-wms-workload_manager(_ZN5boost6detail8function26void_function_obj_invoker0INS_3_bi6bind_tIvNS_4_mfi3mf0IvN5glite
boost::function0<void, std::allocator<void> >::operator()() const
/usr/bin/glite-wms-workload_manager
/usr/bin/glite-wms-workload_manager
boost::function0<void, std::allocator<void> >::operator()() const
glite::wms::manager::server::Events::run()
boost::_mfi::mf0<void, glite::wms::manager::server::Events>::operator()(glite::wms::manager::server::Events*) const

28) G1 and G2 purchasers do not work togheter? Yes, this is a design issue FIXED

EnableIsmIiGlue13Purchasing = true;

EnableIsmIiGlue20Purchasing = true;

but:

19 Jun, 16:54:39 -I: [Info] checkRequirement(/home/mcecchi/34/emi.wms.wms-matchmaking/src/matchmakerISMImpl.cpp:110): MM for listmatch (0/1124 [0] )

at the beginning, and then:

19 Jun, 16:55:09 -D: [Debug] schedule_at(/home/mcecchi/34/emi.wms.wms-manager/src/events.cpp:156): timed event scheduled at 1340117710 with priority 20
19 Jun, 16:55:09 -D: [Debug] operator()(/home/mcecchi/34/emi.wms.wms-manager/src/match_request.cpp:69): considering match https://localhost:6000/hsgQjAjmC8fJs30scM8mAQ /tmp/13043.20120619165508248 -1 0
19 Jun, 16:55:09 -I: [Info] checkRequirement(/home/mcecchi/34/emi.wms.wms-matchmaking/src/matchmakerISMImpl.cpp:110): MM for listmatch (2/81 [0] )
19 Jun, 16:55:10 -D: [Debug] schedule_at(/home/mcecchi/34/emi.wms.wms-manager/src/events.cpp:156): timed event scheduled at 1340117711 with priority 20

19 Jun, 17:03:50 -I: [Info] checkRequirement(/home/mcecchi/34/emi.wms.wms-matchmaking/src/matchmakerISMImpl.cpp:110): MM for listmatch (81/81 [0] )
19 Jun, 17:03:51 -D: [Debug] schedule_at(/home/mcecchi/34/emi.wms.wms-manager/src/events.cpp:156): timed event scheduled at 1340118232 with priority 20

There are then several crashes when one purchaser causes the switch-over without the other one knowing.

29) proxy renewal not installed. FIXED commit in ice e wmproxy spce files (provides glite-px-proxyrenewal)

30) 18/07/2012 ICE log level: from 300 to 500. Tagged yaim 0_7, but out of the present registered build

31) 18/07/2012 ice persist_dir is OK

32) 19/07/2012 SL6 gridftp not working FIXED Fixed updating to a more recent EMI-2 repo

root@devel08 grid-security]# LCAS_DEBUG_LEVEL=5 LCAS_LOG_LEVEL=5 LCMAPS_DEBUG_LEVEL=5  LCMAPS_LOG_LEVEL=5 /usr/sbin/globus-gridftp-server -ns -p 24024 -d all >     /tmp/ftplog.txt 2>&1
[root@devel08 grid-security]# cat /tmp/ftplog.txt 
[19911] Thu Jul 19 14:58:45 2012 :: GFork functionality not enabled.:
globus_gfork: GFork error: Env not set

[19911] Thu Jul 19 14:58:45 2012 :: Configuration read from /etc/gridftp.conf.
[19911] Thu Jul 19 14:58:45 2012 :: Server started in daemon mode.
[19911] Thu Jul 19 14:58:48 2012 :: New connection from: ui.cnaf.infn.it:38435
[19911] Thu Jul 19 14:58:48 2012 :: ui.cnaf.infn.it:38435: [CLIENT]: USER :globus-mapping:
[19911] Thu Jul 19 14:58:48 2012 :: ui.cnaf.infn.it:38435: [SERVER]: 331 Password required for :globus-mapping:.
[19911] Thu Jul 19 14:58:48 2012 :: ui.cnaf.infn.it:38435: [CLIENT]: PASS dummy
[19911] Thu Jul 19 14:58:49 2012 :: ui.cnaf.infn.it:38435: [CLIENT]: PASS dummy
[19911] Thu Jul 19 14:58:49 2012 :: ui.cnaf.infn.it:38435: [SERVER]: 530-Login incorrect. : globus_gss_assist: Error invoking callout
530-globus_callout_module: The callout returned an error
530-an unknown error occurred
530 End.

33) 19/07/2012 SL6 LB not working FIXED Fixed updating to a more recent EMI-2 repo

34) 19/07/2012 SL6 WM not working FIXED

crash while loading II library

disappeared after reinstalling classad(-devl), common and other libraries

35) 31/7/2012 Again? FIXED cleanup/install/yaim of latest EMI-2

31 Jul, 11:58:32 -D- PID: 6649 - "wmpgsoapoperations::ns1__jobStart": jobStart operation called
31 Jul, 11:58:32 -D- PID: 6649 - "WMPAuthorizer::checkProxyValidity": Proxy path: /var//oz/https_3a_2f_2fdevel08.cnaf.infn.it_3a9000_2foz-mWzYgnJWchPD-_5faB7nA/user.proxy
31 Jul, 11:58:32 -E- PID: 6649 - "WMPAuthorizer::getNotBefore": Unable to get Not Before date from Proxy
31 Jul, 11:58:32 -D- PID: 6649 - "wmpgsoapoperations::ns1__jobStart": jobStart operation exception: Proxy exception: Unable to get Not Before date from Proxy
Topic attachments
I Attachment Action Size Date WhoSorted ascending Comment
Unknown file formatEXT wms34_clean_install manage 307.8 K 2012-07-18 - 12:15 MarcoCecchi EMI-2 WMS v. 3.4.0 clean install
Unknown file formatEXT wms34_clean_install_sl6 manage 50.2 K 2012-07-31 - 13:45 MarcoCecchi SL6 x86_64
Edit | Attach | PDF | History: r107 < r106 < r105 < r104 < r103 | Backlinks | Raw View | More topic actions
Topic revision: r107 - 2012-10-01 - AlviseDorigo
 
This site is powered by the TWiki collaboration platformCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback