Difference: KnownIssues (13 vs. 14)

Revision 142011-11-26 - MassimoSgaravatto

Line: 1 to 1
 
META TOPICPARENT name="GeneralDocumentation"

Known issues

Line: 31 to 31
 
Deleted:
<
<

No dynamic info published for one VOview

For one VOView the lcg-info-dynamic-scheduler doesn't publish information, and therefore the values defined in the static ldif file is used.

As found by Jan Astalos (thanks !) this is because a missing blank line at the end of /var/lib/bdii/gip/ldif/static-file-CE.ldif created by YAIM.

Waiting for the fix, the workaround is simply doing:

echo >> /var/lib/bdii/gip/ldif/static-file-CE.ldif 

after having configured via yaim

Relevant bug: http://savannah.cern.ch/bugs/?86191

 

Significant changes introduced with Torque 2.5.7-1

The updated EPEL5 build of torque-2.5.7-1 as compared to previous versions enables munge[1] as an inter node authentication method. Please see

Line: 61 to 44
 
Deleted:
<
<

Problems if Torque is not configured to suppress mails

Torque should be configured to suppress all mails (mail_domain=never). Otherwise the bupdater process of the blparser will keep dying.

Relevant bug: https://savannah.cern.ch/bugs/index.php?86238

Condor and SGE support

Condor and SGE are not yet fully supported as batch system for CREAM.

 

Execution of DAG jobs

Execution of DAG jobs on the CREAM based CE through the gLite WMS is not implemented yet.

Deleted:
<
<

Memory issues with new BLAH Blparser

If the new Blparser is used (click here to check this) there can be issues if the blah registry becomes very large. The submission process can get slower and there can be problems with memory usage.

Waiting for the fix, there are two possible workarounds:

  • Reduce the number of multiple instances of blahpd (the default value is 50). This means changing the value cream_concurrency_level in cream-config.xml. To apply the change, you will then need to restart tomcat. This should help addressing the issue, but it will also mean less parallel instances interacting with the batch system (and so a possible reduction of the throughput in the submission to the batch system)
. Click here to get more details
  • Reduce the value for purge_interval in blah.config. This value is expressed in seconds. A job is removed from the BLAH registry (and therefore not managed anymore by BLAH and therefore CREAM) after purge_interval seconds since its submission. To apply the change, you will then need to restart the blparser (/etc/init.d/glite-ce-blahparser restart)

Relevant bug: https://savannah.cern.ch/bugs/index.php?75854

 

qsub crashes

Line: 181 to 140
  Problems in CREAM software or in other software modules affecting a CREAM based CE that have already been fixed (i.e. they are not affecting the latest release of the software released in EMI)
Added:
>
>

No dynamic info published for one VOview

For one VOView the lcg-info-dynamic-scheduler doesn't publish information, and therefore the values defined in the static ldif file is used.

As found by Jan Astalos (thanks !) this is because a missing blank line at the end of /var/lib/bdii/gip/ldif/static-file-CE.ldif created by YAIM.

Waiting for the fix, the workaround is simply doing:

echo >> /var/lib/bdii/gip/ldif/static-file-CE.ldif 

after having configured via yaim

Relevant bug: http://savannah.cern.ch/bugs/?86191

Fix provided with CREAM CE 1.13.3 (see http://savannah.cern.ch/task/?24022) released with EMI-1 Update 10

Problems if Torque is not configured to suppress mails

Torque should be configured to suppress all mails (mail_domain=never). Otherwise the bupdater process of the blparser will keep dying.

Relevant bug: https://savannah.cern.ch/bugs/index.php?86238

Fix provided with BLAH 1.16.3 (see http://savannah.cern.ch/task/?22845) released with EMI-1 Update 10

Memory issues with new BLAH Blparser

If the new Blparser is used (click here to check this) there can be issues if the blah registry becomes very large. The submission process can get slower and there can be problems with memory usage.

Waiting for the fix, there are two possible workarounds:

  • Reduce the number of multiple instances of blahpd (the default value is 50). This means changing the value cream_concurrency_level in cream-config.xml. To apply the change, you will then need to restart tomcat. This should help addressing the issue, but it will also mean less parallel instances interacting with the batch system (and so a possible reduction of the throughput in the submission to the batch system)
. Click here to get more details
  • Reduce the value for purge_interval in blah.config. This value is expressed in seconds. A job is removed from the BLAH registry (and therefore not managed anymore by BLAH and therefore CREAM) after purge_interval seconds since its submission. To apply the change, you will then need to restart the blparser (/etc/init.d/glite-ce-blahparser restart)

Relevant bug: https://savannah.cern.ch/bugs/index.php?75854

Fix provided with BLAH 1.16.3 (see http://savannah.cern.ch/task/?22845) released with EMI-1 Update 10

 

Problems with Torque 2.5.7-1

 
This site is powered by the TWiki collaboration platformCopyright © 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback