-
/opt/glite/yaim/functions/config_glite_lb: line 99: /opt/glite/etc/glite-lb-dbsetup.sql: No such file or directory
ERROR 1146 (42S02) at line 1: Table 'lbserver20.short_fields' doesn't exist
ERROR 1146 (42S02) at line 1: Table 'lbserver20.long_fields' doesn't exist
ERROR 1146 (42S02) at line 1: Table 'lbserver20.states' doesn't exist
ERROR 1146 (42S02) at line 1: Table 'lbserver20.events' doesn't exist
/opt/glite/yaim/functions/config_glite_lb: line 190: /opt/glite/etc/init.d/glite-lb-bkserverd: No such file or directory
/opt/glite/yaim/functions/config_glite_lb: line 200: /opt/glite/etc/init.d/glite-lb-bkserverd: No such -> file or directory
ABORT: Service glite-lb-bkserverd failed to start!
ERROR: Error during the execution of function: config_glite_lb
ERROR: Error during the configuration.Exiting. [FAILED]
ERROR: One of the functions returned with error without specifying it's nature !
We nedd to install LB FIXED
-
DEBUG: Skipping function: config_glite_lb_setenv because it is not defined
DEBUG: Skipping function: config_glite_lb because it is not defined
ERROR: Error during the configuration.Exiting. [FAILED]
install glite-yaim-lb FIXED
-
ERROR: Unable to execute /etc/init.d/globus-gridftp.
ERROR: Error during the execution of function: config_globus_gridftp
install glite-initscript-globus-gridftp-1.0.2-1.noarch.rpm FIXED
-
Syntax error on line 242 of /opt/glite/etc/glite_wms_wmproxy_httpd.conf:
FastCgiConfig: invalid option: -intial-env
fix the template /opt/glite/etc/glite_wms_wmproxy_httpd.conf.template FIXED
- start/stop script di ice doesn't work
fix committed in CVS FIXED
- glite-yaim-wms ame:-ame:
The version of the yaim-wms is written directly into the Makefile during the CVS checkout (see keyword substitution). Since HEAD is used for the development of this component the keyword substitution is not correct. When a tag is available the version will be correctly define FIXED
-
Starting program: /opt/glite/bin/glite_wms_wmproxy_server
(no debugging symbols found)
[Thread debugging using libthread_db enabled]
[New Thread 0x2b227bb94260 (LWP 25595)]
Program received signal SIGSEGV, Segmentation fault.
0x00000000004e94a8 in __static_initialization_and_destruction_0 ()
(gdb) bt
#0 0x00000000004e94a8 in __static_initialization_and_destruction_0 ()
#1 0x0000000000565c26 in __do_global_ctors_aux ()
#2 0x000000000045665b in _init ()
#3 0x00002b2274847010 in __CTOR_LIST__ () from /opt/glite/lib64/libglite_lb_clientpp_gcc64dbg.so.4
#4 0x0000000000565ba7 in __libc_csu_init ()
#5 0x0000003835e1d92e in __libc_start_main () from /lib64/libc.so.6
#6 0x0000000000457fc9 in _start ()
(gdb)
FIXED
-
Warning - Unable to register the job to the service: https://cream-44.pd.infn.it:7443/glite_wms_wmproxy_server
Unable to create job local directory
(please contact server administrator)
[root@cream-03 ~]# ls -l /opt/glite/bin/glite_wms_wmproxy_dirmanager
-r-sr-xr-x 1 nobody nobody 46363 Jul 8 11:00 /opt/glite/bin/glite_wms_wmproxy_dirmanager
FIXED . Implemented postun in ETICS.
- Bug: #72573
[root@cream-03 ~]# service gLite status
[...]
*** glite-lb-locallogger:
glite-lb-logd not running
[...]
FIXED Using new LB tag (patch 4423)
- lbproxy not started
GLITE_LB_TYPE=proxy is the default behaviour FIXED
-
[Fri Jul 09 18:05:21 2010] [error] Certificate Verification: Error (24): invalid CA certificate
[Fri Jul 09 18:05:21 2010] [error] Certificate Verification: Error (26): unsupported certificate purpose
Trying a simple list-match. Is it a warning?
- In /opt/glite/etc/lcmaps/lcmaps.db substitute path = ${moddir} with path = /opt/glite/lib64/modules
FIXED
- Remove SDJRequirements and WmsRequirements from section WorkloadManagerProxy of /opt/glite/etc/glite_wms.conf
FIXED
- Put on section WorkloadManager of /opt/glite/etc/glite_wms.conf:
WmsRequirements = ( (ShortDeadlineJob =?= TRUE) ? RegExp(".*sdj$", other.GlueCEUniqueID) : !RegExp(".*sdj$", other.GlueCEUniqueID)) &&
(other.GlueCEPolicyMaxTotalJobs == 0 || other.GlueCEStateTotalJobs < other.GlueCEPolicyMaxTotalJobs);
FIXED
- Correct /opt/glite/etc/glite_wms_wmproxy.gacl: insert a / before the "vo" and the rule /"vo"/Role=NULL/Capability=NULL
FIXED
- Cleared event is not logged:
27 Jul, 17:41:22 -E- PID: 25792 - "wmputils::doPurge": [Error] remove_path(purger.cpp:256): LB event logging failed LB server (bkserver,lbproxy) store protocol error (1417) - edg_wll_LogEvent():
LB server (bkserver,lbproxy) store protocol error;; Logging library ERROR:
LB server (bkserver,lbproxy) store protocol error;; edg_wll_DoLogEvent(): edg_wll_log_connect error
Transport endpoint is not connected;; edg_wll_gss_connect();; System Error: Connection refused
27 Jul, 17:41:22 -S- PID: 25792 - "wmpcoreoperations::jobpurge": Unable to complete job purge
27 Jul, 17:41:22 -D- PID: 25792 - "wmpgsoapoperations::ns1__jobPurge": ------------------------------- Fault Description --------------------------------
27 Jul, 17:41:22 -D- PID: 25792 - "wmpgsoapoperations::ns1__jobPurge": Method: jobPurge
27 Jul, 17:41:22 -D- PID: 25792 - "wmpgsoapoperations::ns1__jobPurge": Code: 1202
27 Jul, 17:41:22 -D- PID: 25792 - "wmpgsoapoperations::ns1__jobPurge": Description: The Operation is not allowed: Unable to complete job purge
27 Jul, 17:41:22 -D- PID: 25792 - "wmpgsoapoperations::ns1__jobPurge": Stack:
27 Jul, 17:41:22 -D- PID: 25792 - "wmpgsoapoperations::ns1__jobPurge": JobOperationException: The Operation is not allowed: Unable to complete job purge
at jobpurge()[wmpcoreoperations.cpp:2640]
27 Jul, 17:41:22 -D- PID: 25792 - "wmpgsoapoperations::ns1__jobPurge": at jobpurge()[wmpcoreoperations.cpp:2546]
27 Jul, 17:41:22 -D- PID: 25792 - "wmpgsoapoperations::ns1__jobPurge": at jobPurge()[wmpcoreoperations.cpp:2667]
27 Jul, 17:41:22 -D- PID: 25792 - "wmpgsoapoperations::ns1__jobPurge": ----------------------------------------------------------------------------------
27 Jul, 17:41:22 -D- PID: 25792 - "wmpgsoapoperations::ns1__jobPurge": jobPurge operation completed
FIXED for normal jobs
- Submission to LCG CE doesn't work: Got a job held event, reason: Failed to initialize GAHP
Possibles solutions are
- install condor-lcg-1.1.0-1
- set GT2_GAHP = /opt/condor-7.4.1/sbin/gahp_server and GRID_MONITOR = /opt/condor-7.4.1/libexec/glite/grid_monitor.sh on /opt/condor-c/local.<$HOSTNAME>/condor_config.local
Waiting for Marteen investigation.... FIXED Decided to use the second option. Changes done in yaim-wms.
- BUG #73192
Submission failed:
[ale@egee-rb-03 UI]$ glite-wms-job-submit -a --config ~/UI/etc/wmp_cream-03.conf jdl/env.jdl
Connecting to the service https://cream-03.pd.infn.it:7443/glite_wms_wmproxy_server
Warning - Unable to register the job to the service: https://cream-03.pd.infn.it:7443/glite_wms_wmproxy_server
LB: :2652
Set logging job failed
edg_wll_SetLoggingJob
LB[Proxy] Error: GSSAPI Error
(failed to load GSI credentials: GSS Major Status: General failure
(GSS Minor Status Error Chain:
globus_gsi_gssapi: Error with GSI credential
globus_gsi_gssapi: Error with gss credential handle
globus_credential: Valid credentials could not be found in any of the possible locations specified by the credential search order.
Valid credentials could not be found in any of the possible locations specified by the credential search order.
Possibles (ugly) workarounds to solve this problem are:
- chown root.glite /etc/grid-security/host*.pem
- cp /etc/grid-security/hostcert.pem /home/glite/.globus/usercert.pem and cp /etc/grid-security/hostkey.pem /home/glite/.globus/userkey.pem
FIXED using new LB client (patch 4423)
- WM required huge amount of memory:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
15007 glite 25 0 2657m 2.8g 8320 S 0.0 77.0 2:34.79 glite-wms-workl
Monitoring.. Google malloc is used by default and correctly set by yaim
-
*** glite-lb-bkserverd:
Starting glite-lb-bkserver ...Warning: MySQL library version mismatch (compiled '50045', runtime '50077')
done
Salvet said: Installed MySQL library is slightly newer than library used for glite-lb-bkserver build. This is normal. No Problem
- Bug #73206
Collection doesn't work. The only suspicious messages are in /var/log/messages:
Sep 10 15:50:48 cream-44 glite_wms_wmproxy_server[6179]: ts=2010-09-10T13:50:48Z : event=wms.wmpserver_setJobFileSystem() : userid=18118 jobid=https://devel07.cnaf.infn.it:9000/pvRt15v6hiIBbEKj0UeeAw
Sep 10 15:50:48 cream-44 glite_wms_wmproxy_server[6179]: ts=2010-09-10T13:50:48Z : event=wms.wmpserver_setSubjobFileSystem() : userid=18118 jobid=https://devel07.cnaf.infn.it:9000/pvRt15v6hiIBbEKj0UeeAw
Sep 10 15:50:50 cream-44 glite_wms_wmproxy_server[6179]: ts=2010-09-10T13:50:50Z : event=wms.wmpserver_submit() : userid=18118 jobid=https://devel07.cnaf.infn.it:9000/pvRt15v6hiIBbEKj0UeeAw
Sep 10 15:50:50 cream-44 kernel: glite_wms_wmpro[6179] general protection rip:3e86c797e0 rsp:7fff60c65148 error:0
FIXED
- Bug #72970
I'm running on the WMS an lbserver in "both" mode:
9197 ? S 0:00 /opt/glite/bin/glite-lb-bkserverd --notif-il-sock=/tmp/glite-lb-notif.sock --notif-il-fprefix=/var/tmp/glite-lb-notif -c /home/glite/.certs/hostcert.pem -k /home/glite/.certs/hostkey.pem -i /var/glite/glite-lb-bkserverd.pid --dump-prefix /var/glite/dump --purge-prefix /var/glite/purge -B --proxy-il-sock /tmp/glite-lbproxy-ilog.sock --proxy-il-fprefix /tmp/glite-lbproxy-ilog_events --policy /opt/glite/etc/glite-lb/glite-lb-authz.conf
Then I started locallogger:
[root@devel08 glite]# /opt/glite/etc/init.d/glite-lb-locallogger start
Starting glite-lb-logd ...This is LocalLogger, part of Workload Management System in EU DataGrid & EGEE.
done
Starting glite-lb-interlogd ... done
but:
[root@devel08 glite]# /opt/glite/etc/init.d/glite-lb-locallogger status
glite-lb-logd not running
instead:
[root@devel08 ~]# ps ax | grep lb-log
10049 ? Ss 0:00 /opt/glite/bin/glite-lb-logd -i /var/glite/glite-lb-logd.pid -c /home/glite/.certs/hostcert.pem -k /home/glite/.certs/hostkey.pem
12135 pts/0 S+ 0:00 grep lb-log
FIXED Using new LB tag (path 4423)
- Problem during installation:
Error: Missing Dependency: org.glite.build.common-cpp >= 3.2.1 is needed by package glite-security-lcmaps-without-gsi-1.4.8-5.sl5.x86_64 (ETICS-volatile-build-5b42070f-c48b-4e7b-9819-48810222a0b3-sl5_x86_64_gcc412)
Error: Missing Dependency: mod_fastcgi >= 2.4.3 is needed by package glite-wms-wmproxy-3.3.0-3.sl5.x86_64 (ETICS-volatile-build-5b42070f-c48b-4e7b-9819-48810222a0b3-sl5_x86_64_gcc412)
Error: Missing Dependency: mod_fastcgi >= 2.4.3 is needed by package glite-WMS-3.3.0-0.sl5.x86_64 (ETICS-volatile-build-5b42070f-c48b-4e7b-9819-48810222a0b3-sl5_x86_64_gcc412)
two packages missing... FIXED
- Resubmission for jobs submitted with -r option is not done. Jobs are aborted with "hit job shallow retry count (0)" even if ShallowRetryCount was set in the JDL FIXED .
- Problems with DAG. DAG job reported as failed:
Current Status: Done (Exit Code !=0)
Exit code: 1
Status Reason: Warning: job exit code != 0
Destination: dagman
and nodes stuck in "Submitted". FIXED .
- BUG #75223
FIXED When a job failed logged reason is wrong
Event: Done
- Arrived = Thu Nov 11 14:06:54 2010 CET
- Exit code = 1
- Host = cream-46.pd.infn.it
- Reason = LM_log_done_beginThu Nov 11 14:03:17 CET 2010: prologue failed with error 1
- Source = LogMonitor
- Src instance = unique
- Status code = FAILED
- Timestamp = Thu Nov 11 14:06:54 2010 CET
Another type of error:
Event: Done
- Arrived = Fri Nov 26 12:15:30 2010 CET
- Exit code = 0
- Host = pamelawn23.na.infn.it
- Reason = Fri Nov 26 12:13:45 CET 2010: Cannot
- Source = LRMS
- Status code = OK
- Timestamp = Fri Nov 26 12:13:46 2010 CET
- User = /C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Alessio Gianelle
- Problem installing new build
(07/11/1010) FIXED .
--> Missing Dependency: mod_fastcgi >= 2.4.3 is needed by package glite-wms-wmproxy-3.3.0-4.sl5.x86_64 (ETICS-name-patch_2876_1)
- Problem starting bdii:
Starting BDII slapd: Traceback (most recent call last):
File "/usr/sbin/bdii-update", line 821, in ?
create_daemon(config['BDII_LOG_FILE'])
File "/usr/sbin/bdii-update", line 168, in create_daemom
e = os.open(log_file, os.O_WRONLY | os.O_APPEND | os.O_CREAT, 0644)
OSError: [Errno 13] Permission denied: '/var/log/bdii/bdii-update.log'
[ OK ]
BDII update process failed to startStarting BDII update pro[FAILED]
Be sure that the installed version cames from etics repo: bdii-5.0.9-1
- BUG #75099
FIXED .
2010-11-09 13:01:07,649 DEBUG - iceCommandEventQuery::execute() - TID=[203357600] Database ID=[1265375986000]
2010-11-09 13:01:07,650 DEBUG - iceCommandEventQuery::execute() - TID=[203357600] Exec time ID=[6]
2010-11-09 13:01:07,651 INFO - scoped_timer iceCommandEventQuery::processEvents() - TID=[203357600] All Events Proc Time 1289304067.650547 1289304067.651501 0.000954
terminate called after throwing an instance of 'std::out_of_range'
what(): basic_string::at
Aborted
It happen when shallow resubmission is set to -1.
- Submission failed Probably FIXED using a new LB server (2.1)
Warning - Unable to register the job to the service: https://cream-46.pd.infn.it:7443/glite_wms_wmproxy_server
HTTP Error
<html><head>
<title>500 Internal Server Error</title>
</head><body>
<h1>Internal Server Error</h1>
<p>The server encountered an internal error or
misconfiguration and was unable to complete
your request.</p>
<p>Please contact the server administrator,
[no address given] and inform them of the time the error occurred,
and anything you might have done that may have
caused the error.</p>
<p>More information about this error may be available
in the server error log.</p>
</body></html>
Error code: SOAP-ENV:Server
Looking into the /var/log/messages file of the WMS you should read:
Nov 10 17:10:44 cream-46 kernel: glite_wms_wmpro[15571]: segfault at 0000000000000000 rip 00002b6b3ca32548 rsp 00007fffa962f320 error 4
Instead the wmproxy log says:
10 Nov, 17:10:35 -S- PID: 15571 - "WMPEventlogger::registerJob": Register job failed
edg_wll_RegisterJobProxy
Exit code: 22
LB[Proxy] Error: Invalid argument
(edg_wll_RegisterJobMaster(): unable to register job
Resource temporarily unavailable;; Logging library ERROR:
Resource temporarily unavailable;; edg_wll_DoLogEventServer(): edg_wll_log_direct_read error
LB server (bkserver,lbproxy) store protocol error;; edg_wll_log_proto_client_direct(): error reading answer from L&B direct server
LB server (bkserver,lbproxy) store protocol error;; get_reply_gss(): error reading reply
LB server (bkserver,lbproxy) store protocol error;; gss_reader(): error reading message
Transport endpoint is not connected;; edg_wll_gss_read_full;; GSS Error: EOF occured;)
The httpd-wmproxy-errors.log says:
[Wed Nov 10 17:10:44 2010] [error] [client 193.206.210.108] FastCGI: incomplete headers (0 bytes) received from server "/opt/glite/bin/glite_wms_wmproxy_server"
- Cleared event not logged FIXED .
[ale@cream-03 ~]$ glite-wms-job-output https://devel07.cnaf.infn.it:9000/IvaPn8c4ezVSiHbQP8FJOg
Connecting to the service https://cream-46.pd.infn.it:7443/glite_wms_wmproxy_server
Warning - JobPurging not allowed
(The Operation is not allowed: Unable to complete job purge)
You need to set the WMS DN in both ACTION "READ_ALL" and "LOG_WMS_EVENTS" of glite-lb-authz.conf file of the LB Server
- Cleared event not logged FIXED (2011/01/19).
[ale@cream-03 ~]$ glite-wms-job-output https://devel07.cnaf.infn.it:9000/Hn0VS-oMpYlT7bdAIXNB7g
Connecting to the service https://cream-46.pd.infn.it:7443/glite_wms_wmproxy_server
Warning - JobPurging not allowed
(Proxy exception: The delegated Proxy has expired)
This happen when the job proxy has expired.
- Bug #75368
FIXED A "DONE OK" job is marked as "ABORTED". The problem is that a failed job should not been aborted (resubmission is possibile only for Done Failed jobs). Another case:
Event: Abort
- Arrived = Wed Nov 17 13:31:54 2010 CET
- Host = cream-46.pd.infn.it
- Reason = BLAH error: submission command failed (exit code = 1) (stdout:) (stderr:Connection timed out-qsub: cannot connect to server gridba3.ba.infn.it (errno=110) Connection timed out-TERM environment variable not set.-) N/A (jobId = CREAM809848230)
- Source = LogMonitor
- Timestamp = Wed Nov 17 13:31:53 2010 CET
- User = /C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Alessio Gianelle/CN=proxy/CN=proxy
---
Event: Resubmission
- Arrived = Wed Nov 17 13:31:54 2010 CET
- Host = cream-46.pd.infn.it
- Reason = Job resubmitted by ICE
- Result = WILLRESUB
- Source = LogMonitor
- Tag = unavailable
- Timestamp = Wed Nov 17 13:31:54 2010 CET
- User = /C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Alessio Gianelle/CN=proxy/CN=proxy
- St9exception FIXED (2011/02/01). After the "get-output" of a dag, the nodes are in "DONE-OK" state. If you try a get-output at this point for a node you have (see also 53.):
[ale@cream-03 ~]$ glite-wms-job-output https://devel17.cnaf.infn.it:9000/-ZpRldEJcENxhNiwMqidiw
Connecting to the service https://cream-46.pd.infn.it:7443/glite_wms_wmproxy_server
Error - getOutputFileList Error
(St9exception)
Instead the right message should be found in the wmproxy log file:
19 Jan, 12:55:46 -D- PID: 22916 - "wmpgsoapoperations::ns1__getOutputFileList": JobId Exception: The Operation is not allowed: The job has not been registered from this Workload Manager Proxy server (https://cream-46.pd.infn.it:7443/glite_wms_wmproxy_server) or it has been purged
19 Jan, 12:55:46 -D- PID: 22916 - "wmpgsoapoperations::ns1__getOutputFileList": ------------------------------- Fault Description --------------------------------
19 Jan, 12:55:46 -D- PID: 22916 - "wmpgsoapoperations::ns1__getOutputFileList": Method: getOutputFileList
19 Jan, 12:55:46 -D- PID: 22916 - "wmpgsoapoperations::ns1__getOutputFileList": Code: 1202
19 Jan, 12:55:46 -D- PID: 22916 - "wmpgsoapoperations::ns1__getOutputFileList": Description: St9exception
19 Jan, 12:55:46 -D- PID: 22916 - "wmpgsoapoperations::ns1__getOutputFileList": Stack:
19 Jan, 12:55:46 -D- PID: 22916 - "wmpgsoapoperations::ns1__getOutputFileList": ----------------------------------------------------------------------------------
19 Jan, 12:55:46 -D- PID: 22916 - "wmpgsoapoperations::ns1__getOutputFileList": getOutputFileList operation completed
- The state of the nodes doesn't change after the output retrieval of the parent of a collection. Bug #77876
FIXED (2011/02/14)
- Dag job doesn't work FIXED . Probably the name of the token is wrongly set.
- Bug #73715
FIXED the Really Running event is not logged from JC/LM (instead it works for ICE).
- The "type" attribute is case sensitive FIXED installing the new jdl's rpm on the UI
Error - AdSyntaxException
The following parsing error(s) have been found:
'node_type' must be "dag"
- Bug #75402
FIXED To verify with the new tag Synchronization loss between real validity of proxy and exp. time saved in ICE's database; this can happen when the copy of the new proxy fails
2010-11-15 10:57:41,869 INFO - DNProxyManager::setUserProxyIfLonger_Legacy() - Setting user proxy to [ /var/glite/SandboxDir/jI/https_3a_2f_2fdevel17.cnaf.infn.it_3a9000_2fjIwx1BAneLSLa93u3CEeAQ/user.proxy] copied to /var/glite/ice/persist_dir/B23D0D7177A8B6234F1985493FA09FF41A4FA98C.proxy] because the old one is less long-lived.
2010-11-15 10:57:42,019 ERROR - DNProxyManager::setUserProxyIfLonger_Legacy() - Error copying proxy [/var/glite/SandboxDir/jI/https_3a_2f_2fdevel17.cnaf.infn.it_3a9000_2fjIwx1BAneLSLa93u3CEeAQ/user.proxy] to [/var/glite/ice/persist_dir/B23D0D7177A8B6234F1985493FA09FF41A4FA98C.proxy].
2010-11-15 10:57:42,019 DEBUG - DNProxyManager::setUserProxyIfLonger_Legacy() - New proxy [/var/glite/SandboxDir/jI/https_3a_2f_2fdevel17.cnaf.infn.it_3a9000_2fjIwx1BAneLSLa93u3CEeAQ/user.proxy] has been copied into [/var/glite/ice/persist_dir/B23D0D7177A8B6234F1985493FA09FF41A4FA98C.proxy] - New Expiration Time is [Tue Nov 16 10:56:34 2010]
- Problem with job's proxy expired in ice FIXED To verify with the new tag . When proxy expired the jobs in ice queue are not correctly removed
2010-11-17 10:08:37,269 DEBUG - iceCommandLBLogging::execute - TID=[] Will not LOG anything to LB for Job [https://devel17.cnaf.infn.it:9000/ZSwHnJmAMblpLYKwz69mFQ] for reason: CreamJobID [CREAM898633002] disappeared from ICE database !
- Ice log Cancel events with wrong sequence code FIXED (2011-01-19)
- Ice Aborted TO Monitoring It happen that ICE aborted with these messages (see also 49.):
t1169246528:p13952: Fatal error: [Thread System] GLOBUSTHREAD: pthread_mutex_destroy() failed
[Thread System] mutex is locked (EBUSY)t1295124800:p13952: Fatal error: [Thread System] GLOBUSTHREAD: globus_thread_setspecific() failed
or
t1313102144:p12551: Fatal error: [Thread System] GLOBUSTHREAD: globus_thread_setspecific() failed
[Thread System] invalid value passed to thread interface (EINVAL)Aborted (core dumped)
- glite-wms-ice-db-rm, various errors FIXED (2011/01/19)
[root@cream-46 persist_dir]# /opt/glite/bin/glite-wms-ice-db-rm -h
/opt/glite/bin/glite-wms-ice-db-rm: invalid option -- h
Type /opt/glite/bin/glite-wms-ice-db-rm -h for help
[root@cream-46 persist_dir]# /opt/glite/bin/glite-wms-ice-db-rm
Must specify at least one of the options --from-file <pathfile> or <gridjobid>
[root@cream-46 persist_dir]# /opt/glite/bin/glite-wms-ice-db-rm --from-file a
/opt/glite/bin/glite-wms-ice-db-rm: unrecognized option `--from-file'
- feedback doesn't work FIXED The "old" token is not removed!
[root@devel09 ~]# ls -l /var/glite/SandboxDir/B4/https_3a_2f_2fdevel17.cnaf.infn.it_3a9000_2fB4ni_5fIMNzn6cY6W8llClSA/
total 20
drwxrwx--- 2 dteam035 glite 4096 Nov 30 10:17 input
-rw-r--r-- 1 dteam035 dteam 133 Nov 30 10:41 Maradona.output
drwxrwx--- 2 dteam035 glite 4096 Nov 30 10:17 output
drwxrwx--- 2 dteam035 glite 4096 Nov 30 10:17 peek
-rw-r--r-- 1 glite glite 0 Nov 30 10:19 token.txt_0_
-rw-r--r-- 1 glite glite 0 Nov 30 10:17 token.txt_1
- Misleading messages in Maradona file: FIXED (2011/02/02)
LM_log_done_beginThu Dec 2 12:08:53 CET 2010: 0
LM_log_done_end
jw exit status = 1
Then the job fails with this message:
- Standard output does not contain useful data.Cannot read JobWrapper output, both from Condor and from Maradona.
Thu Dec 2 12:08:53 CET 2010: 0LM_log_done_beginThu Dec 2 12:08:53 CET 2010: 0LM_log_done_end
- Feedback doesn't work with LCG-CE, so these CEs must be excluded in the MM FIXED (2011/02/10) Proposed fix is that yaim set:
WmsRequirements = ((ShortDeadlineJob =?= TRUE) ? RegExp(".*sdj$", other.GlueCEUniqueID) : !RegExp(".*sdj$", other.GlueCEUniqueID)) && (other.GlueCEPolicyMaxTotalJobs == 0 || other.GlueCEStateTotalJobs < other.GlueCEPolicyMaxTotalJobs) && ((EnableWmsFeedback =?= TRUE) ? (RegExp("cream", other.GlueCEImplementationName, "i")) : true)
- WM defines a rescheduled request as a "Submission" FIXED (2011/01/25)
- Bug: #76097
During the first mm of a node the "UserTag CEInfoHostName" is not logged FIXED
- Both WM and ICE give on high load: t1168148800:p30771: Fatal error: [Thread System] GLOBUSTHREAD: pthread_mutex_lock() failed. Supposedly fixed by the 'newest' GPT in glite 32. Pay attention. (see also 42.)
- repo 110114
Yaim fails FIXED (2011/01/31):
cp: cannot stat `/opt/glite/sbin/glite_wms_wmproxy_load_monitor.template': No such file or directory
chmod: cannot access `/opt/glite/sbin/glite_wms_wmproxy_load_monitor': No such file or directory
- repo 110114
Submission failed FIXED Problem was with fast-cgi in gsoap (2011/01/27) :
Status: 500 Internal Server Error
Server: gSOAP/2.7
Content-Type: text/xml; charset=utf-8
Content-Length: 831
Connection: close
<?xml version="1.0" encoding="UTF-8"?>
<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:jsdl="http://schemas.ggf.org/jsdl/2005/11/jsdl" xmlns:jsdlposix="http://schemas.ggf.org/jsdl/2005/11/jsdl-posix" xmlns:delegation1="http://www.gridsite.org/namespaces/delegation-1" xmlns:delegationns="http://www.gridsite.org/namespaces/delegation-2" xmlns:ns1="http://glite.org/wms/wmproxy"><SOAP-ENV:Body><SOAP-ENV:Fault SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"><faultcode>SOAP-ENV:Client</faultcode><faultstring>End of file or no input: 'Invalid argument'</faultstring></SOAP-ENV:Fault></SOAP-ENV:Body></SOAP-ENV:Envelope>[Fri Jan 14 16:45:46 2011] [error] [client 193.206.210.108] FastCGI: incomplete headers (0 bytes) received from server "/opt/glite/bin/glite_wms_wmproxy_server"
- repo 110114
Cannot take shallow resubmission token FIXED (2011/01/31). Infact in the sandbox dir of the job the token is named: token.txt_0. (note the dot)
- repo 110114
Misleading output message from UI FIXED (2011/02/01) When a user try to retrieve the output files of a job of another user the message is (see also 34.):
[ale@cream-03 UI]$ glite-wms-job-output https://devel17.cnaf.infn.it:9000/cxyj4EodCZx7qY2vretM6g
Connecting to the service https://cream-46.pd.infn.it:7443/glite_wms_wmproxy_server
Error - getOutputFileList Error
(St9exception)
Instead in the wmproxy log file you should find the right reason:
19 Jan, 12:16:55 -D- PID: 17941 - "wmpgsoapoperations::ns1__getOutputFileList": JobId Exception: User not authorized to perform this operation
19 Jan, 12:16:55 -D- PID: 17941 - "wmpgsoapoperations::ns1__getOutputFileList": ------------------------------- Fault Description --------------------------------
19 Jan, 12:16:55 -D- PID: 17941 - "wmpgsoapoperations::ns1__getOutputFileList": Method: getOutputFileList
19 Jan, 12:16:55 -D- PID: 17941 - "wmpgsoapoperations::ns1__getOutputFileList": Code: 1202
19 Jan, 12:16:55 -D- PID: 17941 - "wmpgsoapoperations::ns1__getOutputFileList": Description: St9exception
19 Jan, 12:16:55 -D- PID: 17941 - "wmpgsoapoperations::ns1__getOutputFileList": Stack:
19 Jan, 12:16:55 -D- PID: 17941 - "wmpgsoapoperations::ns1__getOutputFileList": ----------------------------------------------------------------------------------
19 Jan, 12:16:55 -D- PID: 17941 - "wmpgsoapoperations::ns1__getOutputFileList": getOutputFileList operation completed
- repo 110114
deep resubmission doesn't work FIXED (2011/02/07) In fact when the old token is grabbed, wm doesn't create the new one
31 Jan, 16:37:21 -I: [Info] operator()(dispatcher_utils.cpp:228): new jobresubmit for https://cream-44.pd.infn.it:9000/Ham9zx-h36iNR0FVtjT9Ng
31 Jan, 16:37:21 -E: [Error] operator()(submit_request.cpp:536): cannot rename temporary token /var/glite/SandboxDir/Ha/https_3a_2f_2fcream-44.pd.infn.it_3a9000_2fHam9zx-h36iNR0FVtjT9Ng/token.txt_0. (error 2)
31 Jan, 16:37:21 -I: [Info] postpone(submit_request.cpp:227): postponing https://cream-44.pd.infn.it:9000/Ham9zx-h36iNR0FVtjT9Ng (cannot create token for shallow resubmission)
- repo 110114
Bug: #77366
: Sometimes submission failed for LB error Hopefully FIXED (2011/02/10) :
Warning - Unable to register the job to the service: https://cream-46.pd.infn.it:7443/glite_wms_wmproxy_server
Register COLLECTIONfailed to LB server:devel17.cnaf.infn.it:9000
edg_wll_RegisterJobProxy/Sync
Exit code: 22
LB[Proxy] Error: Invalid argument
(edg_wll_RegisterJobMaster(): unable to register job
Resource temporarily unavailable;; Logging library ERROR:
Resource temporarily unavailable;; edg_wll_DoLogEventServer(): edg_wll_log_direct_read error
LB server (bkserver,lbproxy) store protocol error;; edg_wll_log_proto_client_direct(): error reading answer from L&B direct server
LB server (bkserver,lbproxy) store protocol error;; get_reply_gss(): error reading reply
LB server (bkserver,lbproxy) store protocol error;; gss_reader(): error reading message
Transport endpoint is not connected;; edg_wll_gss_read_full;; GSS Error: EOF occured;)
Method: jobRegister
Error - Operation failed
Unable to find any endpoint where to perform service request
- repo 110114
Bug #77055
FIXED (2011/01/31) "MyProxyServer: wrong type caught for attribute" for parametric jobs
- repo 110114
Feedback doesn't work FIXED (2011/02/07)
20 Jan, 13:28:13 -I: [Info] operator()(replanner.cpp:226): created replanning request for job https://cream-46.pd.infn.it:9000/P4kfEQQnyw3g9NL7qyfA3Q with token /var/glite/SandboxDir/P4/https_3a_2f_2fcream-46.pd.infn.it_3a9000_2fP4kfEQQnyw3g9NL7qyfA3Q/token.txt_1
20 Jan, 13:28:14 -I: [Info] operator()(dispatcher_utils.cpp:310): cannot create LB context (2) for [...]
- repo 110114
Before resubmission ICE must be sure that the job's proxy should be valid for at least "n" minutes. Hopefully FIXED (2011/02/10)
- WMS repo 110131
& LB patch #4623
Yaim error: sed: can't read /opt/glite/etc/gip/glite-info-generic.conf: No such file or directory FIXED (2011/02/07) Probably this package: glite-info-generic is missing. After configuration we have:
[root@cream-44 ~]# ls -l /opt/glite/etc/gip/glite-info-generic.conf
-rw-r--r-- 1 root root 0 Jan 31 15:45 /opt/glite/etc/gip/glite-info-generic.conf
- repo 110204
FIXED (2011/02/14) Cron Purger doesn't work:
[root@cream-46 ~]# /opt/glite/sbin/glite-wms-purgeStorage --help
glite-wms-purgeStorage: glite-wms-purgeStorage.cpp:87: bool<unnamed>::lb_proxy(): Assertion `f_conf' failed.
Aborted
- repo 110204
NOT FIXED (2011/02/11) All the queries done for replanning have timeout (open bug #78047
)
10 Feb, 10:35:20 -D: [Debug] get_scheduled_jobs(lb_utils.cpp:153): error (110) while querying for scheduled jobs
10 Feb, 10:35:20 -D: [Debug] operator()(replanner.cpp:382): no jobs in scheduled state for more than 1800 seconds for replanning
- repo 110204
NOT FIXED (2011/02/11) If a collection is aborted for "request expired" the nodes don't change status. The problem is due to LB query: (probably it is tied to point 61.)
10 Feb, 07:01:48 -E: [Error] unrecoverable_collection(submit_request.cpp:108): https://cream-46.pd.infn.it:9000/MRy3d5geSzmep67b0VXV1A: unable to retrieve children information from jobstatus
10 Feb, 07:01:48 -E: [Error] unrecoverable(submit_request.cpp:126): https://cream-46.pd.infn.it:9000/MRy3d5geSzmep67b0VXV1A failed (request expired)
- repo 110204
NOT FIXED (2011/02/11) Submitting a collection with 192 nodes you obtaing this error (see bug #70061
):
Status info for the Job : https://cream-44.pd.infn.it:9000/Tck2cFcOKuOjM4gfH9Ermg
Current Status: Waiting
Status Reason: jobid: unable to complete the operation: the attribute has not been initialised yet
Submitted: Fri Feb 11 17:06:23 2011 CET
- repo 110214
NOT FIXED (2011/02/14) glite-WMS metapackage is wrong (too much dependencies and forget google-perftools)
- repo 110214
NOT FIXED (2011/02/14) . There is a mistake in /opt/glite/yaim/functions/config_info_service_wms (forget a "\" at line 86), yaim reports this message:
sed: -e expression #3, char 84: unknown option to `s'
- repo 110214
NOT FIXED (2011/02/14) When you cancel a collection some nodes (the ones which where in state Running or DONE OK) are put in state "Cleared".
- repo 110214
NOT FIXED (2011/02/14) Wm dies with these messages:
14 Feb, 16:05:46 -E: [Error] unrecoverable(submit_request.cpp:126): https://devel09.cnaf.infn.it:9000/3hTRNHN65Dv64kYDLKjZkQ failed (hit job shallow retry count (2))
14 Feb, 16:05:46 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:107): MM for job: https://devel09.cnaf.infn.it:9000/dsX5dYARJ2rAnk--uyYcJg (19/2976 [0] )
14 Feb, 16:05:46 -E: [Error] handle_synch_signal(signal_handling.cpp:77): Got a synchronous signal (11), stack trace:
/opt/glite/bin/glite-wms-workload_manager
/lib64/libpthread.so.0
classad::EvalState::SetRootScope()
classad::ClassAd::EvaluateAttr(std::string const&, classad::Value&) const
glite::wmsutils::classads::evaluate_attribute(classad::ClassAd const&, std::string const&)
/opt/glite/lib64/libglite_wms_matchmaking.so.0(_ZN5glite3wms11matchmaking17matchmakerISMImpl16checkRequirementERN7classad7Class
glite::wms::broker::RBSimpleISMImpl::findSuitableCEs(classad::ClassAd const*)
glite::wms::broker::ResourceBroker::findSuitableCEs(classad::ClassAd const*)
/opt/glite/lib64/libglite_wms_helper_broker_ism.so
glite::wms::helper::broker::Helper::resolve(classad::ClassAd const*, boost::shared_ptr<std::string>) const
glite::wms::helper::Helper::resolve(classad::ClassAd const*, boost::shared_ptr<std::string>) const
glite::wms::helper::RequestStateMachine::next_step(classad::ClassAd const*, boost::shared_ptr<std::string>)
glite::wms::helper::Request::Impl::resolve()
glite::wms::manager::server::Plan(classad::ClassAd const&, boost::shared_ptr<std::string>)
glite::wms::manager::server::WMReal::submit(classad::ClassAd const&, boost::shared_ptr<_edg_wll_Context>, boost::shared_ptr<std::string>, bool)
glite::wms::manager::server::SubmitProcessor::operator()()
boost::function0<void, std::allocator<void> >::operator()() const
glite::wms::manager::server::Events::run()
boost::function0<void, std::allocator<boost::function_base> >::operator()() const
/usr/lib64/libboost_thread.so.2