Difference: TroubleshootingGuide (4 vs. 5)

Revision 52011-04-29 - MassimoSgaravatto

Line: 1 to 1
 
META TOPICPARENT name="SystemAdministratorDocumentation"

Troubleshooting guide for CREAM

Line: 45 to 43
 

0.1 Test gridftp

Changed:
<
<
Try a gsiftp (e.g. using globus-url-copy or uberftp) towards that CREAM CE. E.g.:
>
>
Try a gsiftp (e.g. using globus-url-copy or uberftp) towards that CREAM CE. E.g.:
 
uberftp hostname-of-cream-ce>  "ls /etc"
Line: 301 to 287
 
  • the grid-proxy-info executable was not found (or it was not in the path) on the WN
  • the which executable is not installed on the WN
Changed:
<
<

0.1 Cannot create the job's working directory! [failure reason = ">>> sudoers file: Alias `XYZ' already defined, line ABC <<<"]

>
>

0.1 Cannot create the job's working directory! [failure reason = ">>> sudoers file: Alias `XYZ' already defined, line ABC <<<"]

  This means that there is a syntax error in the sudoers file created by yaim-cream-ce. A likely reason if that the same VO as been enabled more than once in the siteinfo.def.
Line: 317 to 301
 

0.1 User ABC not authorized for operation XYZ

Added:
>
>
This means that there was an authorization problem.
 
Added:
>
>
If the authorization is managed via gJAF, check first of all if the relevant VOMS role has been enabled in the grid-mapfile.
 
Added:
>
>
Then check if you have all the relevant .lsc files in /etc/grid-security/vomsdir/<VO> (there must be a file for each "supported" VOMS server for that VO) and if they are correct (please note that the VOMS server certificate in /etc/grid-security/vomsdir is not needed anymore and it is used only if the relevant lsc file is not foun)d.
 
Changed:
<
<

1 Other problems

>
>
If this was not enough, edit /etc/glite-ce-cream/log4j.properties
 
Added:
>
>
replacing:
 
Changed:
<
<

0.1 Job failure with reason=999

>
>
log4j.logger.org.glite=info, fileout
 
Added:
>
>
with:

log4j.logger.org.glite=debug, fileout

and comment the following lines:

log4j.logger.org.glite.security=off
log4j.logger.org.glite.voms=off

Then restart tomcat

In glite-ce-cream.log the reason for the authorization problem should be explained

0.1 Cannot generate the job wrapper! the problem seems to be related to the jdl: Number mismatch for maxOutputSandboxSize = -1,000000000000000E+00"

This happens for submissions through the WMS to a CREAM CE deployed on a machine installed using a non-English ( en-US) language. This is because of different representations of decimal numbers. The workaround in this case is to uncomment the line:

LANG=en_US

in $CATALINA_HOME/conf/tomcat5.conf and then restart tomcat

1 Other problems

1.1 Job failure with reason=999

  This can happen with the new BLAH BLparser when it is not able to detect the status of the job for more than x seconds. The default value for x is 86400. This value can be modified setting the attribute alldone_interval in /etc/blah.config, e.g.:
Line: 344 to 364
 

0.1 Jobs submitted to LSF fails with errorcode 127

Changed:
<
<
This is likely a problem with staging of files from/to the CE node to/from the WN. Check if the relevant LSF daemons run properly.
>
>
This is likely a problem with staging of files from/to the CE node to/from the WN. Check if the relevant LSF daemons run properly.
 

1 Other troubleshooting hints

 
This site is powered by the TWiki collaboration platformCopyright © 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback