Known issues

  This will be fixed in EMI 2.

CREAM jobs are cancelled with status reason=3 in a GE system

If the environment present in a BUpdaterSGE process does not include the GE environment variables, the GE client commands (qstat, qconf) can not be executed by BUpdaterSGE.

As a consequence, BUpdaterSGE will assume that jobs have been cancelled (because it receives no information from qstat or qacct). You can check the environment for BUpdaterSGE process using the following commands and searching for the GE env variables (SGE_EXECD, SGE_QMASTER, SGE_ROOT, SGE_CLUSTER_NAME, SGE_CELL)

# ps xuawww | grep -i sge
tomcat    7423  0.6  0.5  37184 21328 ?        S    Nov23 103:56 /usr/bin/BUpdaterSGE
root     30622  0.0  0.0  61180   804 pts/0    R+   13:41   0:00 grep -i sge

# cat /proc/7423/environ 

This can happen if the BUpdaterSGE daemon is restarted by other user different than root (for example, tomcat starts the daemon at boot time and restarts it if the daemon is dead) without sourcing the proper environment. The workaround is to force the environment to be loaded in /etc/init.d/gLite and /etc/init.d/glite-ce-blahparser. This can be done simply by adding a line like the one bellow to be sourced at the beguinning of previous scripts

 . /etc/profile.d/sge.sh

Significant changes introduced with Torque 2.5.7-1

The updated EPEL5 build of torque-2.5.7-1 as compared to previous versions enables munge[1] as an inter node authentication method. Please see

