Difference: NotesAboutCreamWithoutTorqueWithMPI-EMI-2SL6 (1 vs. 4)

Revision 42012-08-09 - PaoloVeronesi

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Notes about Installation and Configuration of a CREAM Computing Element - EMI-2 - SL6 (external Torque, external Argus, MPI enabled)

  • These notes are provided by site admins on a best effort base as a contribution to the IGI communities and MUST not be considered as a subsitute of the Official IGI documentation.

Revision 32012-05-31 - PaoloVeronesi

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Notes about Installation and Configuration of a CREAM Computing Element - EMI-2 - SL6 (external Torque, external Argus, MPI enabled)

  • These notes are provided by site admins on a best effort base as a contribution to the IGI communities and MUST not be considered as a subsitute of the Official IGI documentation.
Line: 71 to 71
 

vo.d directory

Create the directory siteinfo/vo.d and fill it with a file for each supported VO. You can download them from HERE and here an example for some VOs. Information about the several VOs are available at the CENTRAL OPERATIONS PORTAL.
Added:
>
>
# cat /root/siteinfo/vo.d/comput-er.it
SW_DIR=$VO_SW_DIR/computer
DEFAULT_SE=$SE_HOST
STORAGE_DIR=$CLASSIC_STORAGE_DIR/computer
VOMS_SERVERS="'vomss://voms2.cnaf.infn.it:8443/voms/comput-er.it?/comput-er.it'"
VOMSES="'comput-er.it voms2.cnaf.infn.it 15007 /C=IT/O=INFN/OU=Host/L=CNAF/CN=voms2.cnaf.infn.it comput-er.it' 'comput-er.it voms-02.pd.infn.it 15007 /C=IT/O=INFN/OU=Host/L=Padova/CN=voms-02.pd.infn.it comput-er.it'"
VOMS_CA_DN="'/C=IT/O=INFN/CN=INFN CA' '/C=IT/O=INFN/CN=INFN CA'"

# cat /root/siteinfo/vo.d/dteam
SW_DIR=$VO_SW_DIR/dteam
DEFAULT_SE=$SE_HOST
STORAGE_DIR=$CLASSIC_STORAGE_DIR/dteam
VOMS_SERVERS='vomss://voms.hellasgrid.gr:8443/voms/dteam?/dteam/'
VOMSES="'dteam lcg-voms.cern.ch 15004 /DC=ch/DC=cern/OU=computers/CN=lcg-voms.cern.ch dteam 24' 'dteam voms.cern.ch 15004 /DC=ch/DC=cern/OU=computers/CN=voms.cern.ch dteam 24' 'dteam voms.hellasgrid.gr 15004 /C=GR/O=HellasGrid/OU=hellasgrid.gr/CN=voms.hellasgrid.gr dteam 24' 'dteam voms2.hellasgrid.gr 15004 /C=GR/O=HellasGrid/OU=hellasgrid.gr/CN=voms2.hellasgrid.gr dteam 24'"
VOMS_CA_DN="'/DC=ch/DC=cern/CN=CERN Trusted Certification Authority' '/DC=ch/DC=cern/CN=CERN Trusted Certification Authority' '/C=GR/O=HellasGrid/OU=Certification Authorities/CN=HellasGrid CA 2006' '/C=GR/O=HellasGrid/OU=Certification Authorities/CN=HellasGrid CA 2006'"

# cat /root/siteinfo/vo.d/gridit
SW_DIR=$VO_SW_DIR/gridit
DEFAULT_SE=$SE_HOST
STORAGE_DIR=$CLASSIC_STORAGE_DIR/gridit
VOMS_SERVERS="'vomss://voms.cnaf.infn.it:8443/voms/gridit?/gridit' 'vomss://voms-01.pd.infn.it:8443/voms/gridit?/gridit'"
VOMSES="'gridit voms.cnaf.infn.it 15008 /C=IT/O=INFN/OU=Host/L=CNAF/CN=voms.cnaf.infn.it gridit' 'gridit voms-01.pd.infn.it 15008 /C=IT/O=INFN/OU=Host/L=Padova/CN=voms-01.pd.infn.it gridit'"
VOMS_CA_DN="'/C=IT/O=INFN/CN=INFN CA' '/C=IT/O=INFN/CN=INFN CA'"

# cat /root/siteinfo/vo.d/igi.italiangrid.it
SW_DIR=$VO_SW_DIR/igi
DEFAULT_SE=$SE_HOST
STORAGE_DIR=$CLASSIC_STORAGE_DIR/igi
VOMS_SERVERS="'vomss://vomsmania.cnaf.infn.it:8443/voms/igi.italiangrid.it?/igi.italiangrid.it'"
VOMSES="'igi.italiangrid.it vomsmania.cnaf.infn.it 15003 /C=IT/O=INFN/OU=Host/L=CNAF/CN=vomsmania.cnaf.infn.it igi.italiangrid.it'"
VOMS_CA_DN="'/C=IT/O=INFN/CN=INFN CA'"

# cat /root/siteinfo/vo.d/infngrid
SW_DIR=$VO_SW_DIR/infngrid
DEFAULT_SE=$SE_HOST
STORAGE_DIR=$CLASSIC_STORAGE_DIR/infngrid
VOMS_SERVERS="'vomss://voms.cnaf.infn.it:8443/voms/infngrid?/infngrid' 'vomss://voms-01.pd.infn.it:8443/voms/infngrid?/infngrid'"
VOMSES="'infngrid voms.cnaf.infn.it 15000 /C=IT/O=INFN/OU=Host/L=CNAF/CN=voms.cnaf.infn.it infngrid' 'infngrid voms-01.pd.infn.it 15000 /C=IT/O=INFN/OU=Host/L=Padova/CN=voms-01.pd.infn.it infngrid'"
VOMS_CA_DN="'/C=IT/O=INFN/CN=INFN CA' '/C=IT/O=INFN/CN=INFN CA'"

# cat /root/siteinfo/vo.d/ops
SW_DIR=$VO_SW_DIR/ops
DEFAULT_SE=$SE_HOST
STORAGE_DIR=$CLASSIC_STORAGE_DIR/ops
VOMS_SERVERS="vomss://voms.cern.ch:8443/voms/ops?/ops/"
VOMSES="'ops lcg-voms.cern.ch 15009 /DC=ch/DC=cern/OU=computers/CN=lcg-voms.cern.ch ops 24' 'ops voms.cern.ch 15009 /DC=ch/DC=cern/OU=computers/CN=voms.cern.ch ops 24'"
VOMS_CA_DN="'/DC=ch/DC=cern/CN=CERN Trusted Certification Authority' '/DC=ch/DC=cern/CN=CERN Trusted Certification Authority'"
 

users and groups

You can download them from HERE.

Revision 22012-05-30 - PaoloVeronesi

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Notes about Installation and Configuration of a CREAM Computing Element - EMI-2 - SL6 (external Torque, external Argus, MPI enabled)

  • These notes are provided by site admins on a best effort base as a contribution to the IGI communities and MUST not be considered as a subsitute of the Official IGI documentation.
Line: 52 to 52
 
# yum clean all 
Changed:
<
<
# yum install ca-policy-egi-core emi-cream-ce emi-torque-utils
>
>
# yum install ca-policy-egi-core emi-cream-ce emi-torque-utils glite-mpi
 

Service configuration

Line: 61 to 61
 # cp -vr /opt/glite/yaim/examples/siteinfo .
Added:
>
>

host certificate

# ll /etc/grid-security/host*
-rw-r--r-- 1 root root 1440 Oct 18 09:31 /etc/grid-security/hostcert.pem
-r-------- 1 root root  887 Oct 18 09:31 /etc/grid-security/hostkey.pem
 

vo.d directory

Create the directory siteinfo/vo.d and fill it with a file for each supported VO. You can download them from HERE and here an example for some VOs. Information about the several VOs are available at the CENTRAL OPERATIONS PORTAL.
Line: 83 to 90
 

site-info.def

KISS: Keep it simple, stupid! For your convenience there is an explanation of each yaim variable. For more details look HERE.
Added:
>
>
SUGGESTION: use the same site-info.def for CREAM and WNs: for this reason in this example file there are yaim variable used by CREAM, TORQUE or emi-WN.
 
Changed:
<
<
# cat siteinfo/site-info.def BATCH_SERVER=batch.cnaf.infn.it
>
>
# cat site-info.def
 CE_HOST=cream-01.cnaf.infn.it
Added:
>
>
SITE_NAME=IGI-BOLOGNA

BATCH_SERVER=batch.cnaf.infn.it BATCH_LOG_DIR=/var/torque

#BDII_HOST=egee-bdii.cnaf.infn.it

CE_BATCH_SYS=torque JOB_MANAGER=pbs BATCH_VERSION=torque-2.5.7 #CE_DATADIR=

CE_INBOUNDIP=FALSE CE_OUTBOUNDIP=TRUE CE_OS="ScientificSL" CE_OS_RELEASE=6.2 CE_OS_VERSION="Carbon"

CE_RUNTIMEENV="IGI-BOLOGNA"

CE_PHYSCPU=8 CE_LOGCPU=16 CE_MINPHYSMEM=16000 CE_MINVIRTMEM=32000

 CE_SMPSIZE=8
Added:
>
>
CE_CPU_MODEL=Xeon CE_CPU_SPEED=2493 CE_CPU_VENDOR=intel CE_CAPABILITY="CPUScalingReferenceSI00=1039 glexec" CE_OTHERDESCR="Cores=1,Benchmark=4.156-HEP-SPEC06" CE_SF00=951 CE_SI00=1039 CE_OS_ARCH=x86_64

CREAM_PEPC_RESOURCEID="http://cnaf.infn.it/cremino"

 USERS_CONF=/root/siteinfo/ig-users.conf GROUPS_CONF=/root/siteinfo/ig-users.conf
Line: 95 to 139
 QUEUES="cert prod" CERT_GROUP_ENABLE="dteam infngrid ops /dteam/ROLE=lcgadmin /dteam/ROLE=production /ops/ROLE=lcgadmin /ops/ROLE=pilot /infngrid/ROLE=SoftwareManager /infngrid/ROLE=pilot" PROD_GROUP_ENABLE="comput-er.it gridit igi.italiangrid.it /comput-er.it/ROLE=SoftwareManager /gridit/ROLE=SoftwareManager /igi.italiangrid.it/ROLE=SoftwareManager"
Added:
>
>
VO_SW_DIR=/opt/exp_soft
  WN_LIST="/root/siteinfo/wn-list.conf" MUNGE_KEY_FILE=/etc/munge/munge.key CONFIG_MAUI="no"
Changed:
<
<
SITE_NAME=IGI-BOLOGNA
>
>
MYSQL_PASSWORD=*********************************
 APEL_DB_PASSWORD=not_used APEL_MYSQL_HOST=not_used
Added:
>
>
SE_LIST="darkstorm.cnaf.infn.it" SE_MOUNT_INFO_LIST="none"
 

WN list

Line: 113 to 160
 wn06.cnaf.infn.it
Added:
>
>

services/glite-mpi_ce

# cp /opt/glite/yaim/examples/siteinfo/services/glite-mpi_ce /root/siteinfo/services/
 
Changed:
<
<

site-info.def

SUGGESTION: use the same site-info.def for CREAM and WNs: for this reason in this example file there are yaim variable used by CREAM, TORQUE or emi-WN.

It is also included the settings of some VOs

>
>
# cat services/glite-mpi_ce # Setup configuration variables that are common to both the CE and WN
 
Changed:
<
<
For your convenience there is an explanation of each yaim variable. For more details look at [8, 9, 10] </>
<--/twistyPlugin-->
>
>
if [ -r ${config_dir}/services/glite-mpi ]; then source ${config_dir}/services/glite-mpi fi

# The MPI CE config function can create a submit filter for # Torque to ensure that CPU allocation is performed correctly. # Change this variable to "yes" to have YAIM create this filter. # Warning: if you have an existing torque.cfg it will be modified. MPI_SUBMIT_FILTER=${MPI_SUBMIT_FILTER:-"yes"}

 
Deleted:
<
<
<--/twistyPlugin twikiMakeVisibleInline-->
 

services/glite-creamce

Added:
>
>
# cat /root/siteinfo/services/glite-creamce
 # # YAIM creamCE specific variables #
Deleted:
<
<
# LSF settings: path where lsf.conf is located #BATCH_CONF_DIR=lsf_install_path/conf
 # # CE-monitor host (by default CE-monitor is installed on the same machine as # cream-CE) CEMON_HOST=$CE_HOST # # CREAM database user
Changed:
<
<
CREAM_DB_USER=*********
>
>
CREAM_DB_USER=******************** CREAM_DB_PASSWORD=****************************
 #
Deleted:
<
<
CREAM_DB_PASSWORD=*********
 # Machine hosting the BLAH blparser. # In this machine batch system logs must be accessible.
Deleted:
<
<
#BLPARSER_HOST=set_to_fully_qualified_host_name_of_machine_hosting_blparser_server
 BLPARSER_HOST=$CE_HOST
Deleted:
<
<
<--/twistyPlugin-->

<--/twistyPlugin twikiMakeVisibleInline-->

services/dgas_sensors

#
# YAIM DGAS Sensors specific variables
#


################################
# DGAS configuration variables #
################################
# For any details about DGAS variables please refer to the guide:
# http://igrelease.forge.cnaf.infn.it/doku.php?id=doc:guides:dgas

# Reference Resource HLR for the site.
DGAS_HLR_RESOURCE="prod-hlr-01.pd.infn.it"

# Specify the type of job which the CE has to process.
# Set ”all” on “the main CE” of the site, ”grid” on the others.
# Default value: all
#DGAS_JOBS_TO_PROCESS="all"

# This parameter can be used to specify the list of VOs to publish.
# If the parameter is specified, the sensors (pushd) will forward
# to the Site HLR just records belonging to one of the specified VOs.
# Leave commented if you want to send records for ALL VOs
# Default value: parameter not specified
#DGAS_VO_TO_PROCESS="vo1;vo2;vo3..."


# Bound date on jobs backward processing.
# The backward processing does not consider jobs prior to that date.
# Default value: 2009-01-01.
#DGAS_IGNORE_JOBS_LOGGED_BEFORE="2011-11-01"

# Main CE of the site.
# ATTENTION: set this variable only in the case of site with a “singleLRMS”
# in which there are more than one CEs or local submission hosts (i.e. host
# from which you may submit jobs directly to the batch system).
# In this case, DGAS_USE_CE_HOSTNAME parameter must be set to the same value
# for all hosts sharing the lrms and this value can be arbitrary chosen among
# these submitting hostnames (you may choose the best one).
# Otherwise leave it commented.
# we have 2 CEs, cremino is the main one
DGAS_USE_CE_HOSTNAME="cremino.cnaf.infn.it"

# Path for the batch-system log files.
# * for torque/pbs:
# DGAS_ACCT_DIR=/var/torque/server_priv/accounting
# * for LSF:
# DGAS_ACCT_DIR=lsf_install_path/work/cluster_name/logdir
# * for SGE:
# DGAS_ACCT_DIR=/opt/sge/default/common/
DGAS_ACCT_DIR=/var/torque/server_priv/accounting

# Full path to the 'condor_history' command, used to gather DGAS usage records
# when Condor is used as a batch system. Otherwise leave it commented.
#DGAS_CONDOR_HISTORY_COMMAND=""
<--/twistyPlugin-->

<--/twistyPlugin twikiMakeVisibleInline-->
---+++ host certificate
# ll /etc/grid-security/host*
-rw-r--r-- 1 root root 1440 Oct 18 09:31 /etc/grid-security/hostcert.pem
-r-------- 1 root root  887 Oct 18 09:31 /etc/grid-security/hostkey.pem
<--/twistyPlugin-->
 
Changed:
<
<
<--/twistyPlugin twikiMakeVisibleInline-->

munge configuration

IMPORTANT: The updated EPEL5 build of torque-2.5.7-1 as compared to previous versions enables munge as an inter node authentication method.

  • verify that munge is correctly installed:
# rpm -qa | grep munge
munge-libs-0.5.8-8.el5
munge-0.5.8-8.el5
  • On one host (for example the batch server) generate a key by launching:
# /usr/sbin/create-munge-key

# ls -ltr /etc/munge/
total 4
-r-------- 1 munge munge 1024 Jan 13 14:32 munge.key
  • Copy the key, /etc/munge/munge.key to every host of your cluster, adjusting the permissions:
# chown munge:munge /etc/munge/munge.key
  • Start the munge daemon on each node:
# service munge start
Starting MUNGE:                                            [  OK  ]

# chkconfig munge on
>
>
# Value to be published as GlueCEStateStatus instead of Production #CREAM_CE_STATE=Special
 
Changed:
<
<
<--/twistyPlugin-->
>
>

services/dgas_sensors (not available yet)

TODO
 
Changed:
<
<
<--/twistyPlugin twikiMakeVisibleInline-->
>
>

yaim check

 Verify to have set all the yaim variables by launching:
Changed:
<
<
# /opt/glite/yaim/bin/yaim -v -s site-info_cremino.def -n creamCE -n TORQUE_server -n TORQUE_utils -n DGAS_sensors
>
>
# /opt/glite/yaim/bin/yaim -v -s /root/siteinfo/site-info.def -n creamCE -n TORQUE_utils
 
Changed:
<
<
see details
<--/twistyPlugin-->

<--/twistyPlugin twikiMakeVisibleInline-->

# /opt/glite/yaim/bin/yaim -c -s site-info_cremino.def -n creamCE -n TORQUE_server -n TORQUE_utils -n DGAS_sensors

see details

<--/twistyPlugin-->

<--/twistyPlugin twikiMakeVisibleInline-->

Software Area settings

If the Software Area is hosted on your CE, you have to create it and export to the WNs in the site.def we set:

VO_SW_DIR=/opt/exp_soft
  • directory creation
mkdir /opt/exp_soft/
  • edit /etc/exports creating a line like the following:
/opt/exp_soft/ *.cnaf.infn.it(rw,sync,no_root_squash)
  • check nfs and portmap status
# service nfs status
rpc.mountd is stopped
nfsd is stopped

# service portmap status
portmap is stopped

# service portmap start
Starting portmap:                                          [  OK  ]

# service nfs start
Starting NFS services:                                     [  OK  ]
Starting NFS daemon:                                       [  OK  ]
Starting NFS mountd:                                       [  OK  ]
Starting RPC idmapd:                                       [  OK  ]

# chkconfig nfs on
# chkconfig portmap on
  • after any modification in /etc/exports you can launch
# exportfs -ra
or simply restart nfs daemon

<--/twistyPlugin-->

<--/twistyPlugin twikiMakeVisibleInline-->

walltime workaround

If on the queues there is published:
GlueCEStateWaitingJobs: 444444
and in the log /var/log/bdii/bdii-update.log you notice errors like the folllowing:
Traceback (most recent call last):
  File "/usr/libexec/lcg-info-dynamic-scheduler", line 435, in ?
    wrt = qwt * nwait
TypeError: unsupported operand type(s) for *: 'NoneType' and 'int'
probably the queues have no "resources_default.walltime" parameter configured.

So define it for each queue by launching, for example:

>
>

yaim config

 
Changed:
<
<
# qmgr -c "set queue prod resources_default.walltime = 01:00:00" # qmgr -c "set queue cert resources_default.walltime = 01:00:00" # qmgr -c "set queue cloudtf resources_default.walltime = 01:00:00"
>
>
# /opt/glite/yaim/bin/yaim -c -s /root/siteinfo/site-info.def -n creamCE -n TORQUE_utils
 
Deleted:
<
<
<--/twistyPlugin-->
 

Service Checks

Deleted:
<
<
<--/twistyPlugin twikiMakeVisibleInline-->
 
  • After service installation to have a look if all were installed in a proper way, you could have a look to Service CREAM Reference Card
  • You can also perform some checks after the installation and configuration of your CREAM
Deleted:
<
<

TORQUE checks:

  • check the pbs settings:
# qmgr -c 'p s'
  • check the WNs state
# pbsnodes -a

<--/twistyPlugin-->

<--/twistyPlugin twikiMakeVisibleInline-->

maui settings

In order to reserve a job slot for test jobs, you need to apply some settings in the maui configuration (/var/spool/maui/maui.cfg)

Suppose you have enabled the test VOs (ops, dteam and infngrid) on the "cert" queue and that you have 8 job slots available. Add the following lines in the /var/spool/maui/maui.cfg file:

CLASSWEIGHT 1
QOSWEIGHT 1

QOSCFG[normal] MAXJOB=7

CLASSCFG[prod] QDEF=normal
CLASSCFG[cert] PRIORITY=5000

After the modification restart maui.

In order to avoid that yaim overwrites this file during the host reconfiguration, set:

CONFIG_MAUI="no"

in your site.def (the first time you launch the yaim script, it has to be set to "yes"

<--/twistyPlugin-->
 

Revisions

Date Comment By
2012-05-25 First draft Paolo Veronesi

Revision 12012-05-25 - PaoloVeronesi

Line: 1 to 1
Added:
>
>
META TOPICPARENT name="WebHome"

Notes about Installation and Configuration of a CREAM Computing Element - EMI-2 - SL6 (external Torque, external Argus, MPI enabled)

  • These notes are provided by site admins on a best effort base as a contribution to the IGI communities and MUST not be considered as a subsitute of the Official IGI documentation.
  • This document is addressed to site administrators responsible for middleware installation and configuration.
  • The goal of this page is to provide some hints and examples on how to install and configure an EMI-2 CREAM CE service based on EMI middleware, in no cluster mode, with TORQUE as batch system installed on a different host, using an external ARGUS server for the users authorization and with MPI enabled.
CREAM CLUSTER MODE ARGUS MPI TORQUE WNODES
EMI-2 SL6 no external server enabled external server TODO

References

  1. About IGI - Italian Grid infrastructure
    1. About IGI Release
    2. IGI Official Installation and Configuration guide
  2. EMI-2 Release
    1. CREAM
    2. CREAM TORQUE module
  3. Yaim Guide
    1. site-info.def yaim variables
    2. CREAM yaim variables
    3. TORQUE Yaim variables
  4. Troubleshooting Guide for Operational Errors on EGI Sites
  5. Grid Administration FAQs page

Service installation

O.S. and Repos

  • Starts from a fresh installation of Scientific Linux 6.x (x86_64).
# cat /etc/redhat-release 
Scientific Linux release 6.2 (Carbon)

  • Install the additional repositories: EPEL, Certification Authority, EMI-2

# yum install yum-priorities yum-protectbase epel-release
# rpm -ivh http://emisoft.web.cern.ch/emisoft/dist/EMI/2/sl6/x86_64/base/emi-release-2.0.0-1.sl6.noarch.rpm

# cd /etc/yum.repos.d/
# wget http://repo-pd.italiangrid.it/mrepo/repos/egi-trustanchors.repo

  • Be sure that SELINUX is disabled (or permissive). Details on how to disable SELINUX are here:

# getenforce 
Disabled

yum install

# yum clean all 

# yum install ca-policy-egi-core  emi-cream-ce emi-torque-utils

Service configuration

You have to copy the configuration files in another path, for example root, and set them properly (see later):
# cp -vr /opt/glite/yaim/examples/siteinfo .

vo.d directory

Create the directory siteinfo/vo.d and fill it with a file for each supported VO. You can download them from HERE and here an example for some VOs. Information about the several VOs are available at the CENTRAL OPERATIONS PORTAL.

users and groups

You can download them from HERE.

Munge

Copy the key /etc/munge/munge.key from the Torque server to every host of your cluster, adjust the permissions and start the service
# chown munge:munge /etc/munge/munge.key

# ls -ltr /etc/munge/
total 4
-r-------- 1 munge munge 1024 Jan 13 14:32 munge.key

# chkconfig munge on
# /etc/init.d/munge restart

site-info.def

KISS: Keep it simple, stupid! For your convenience there is an explanation of each yaim variable. For more details look HERE.
# cat siteinfo/site-info.def 
BATCH_SERVER=batch.cnaf.infn.it
CE_HOST=cream-01.cnaf.infn.it
CE_SMPSIZE=8
USERS_CONF=/root/siteinfo/ig-users.conf
GROUPS_CONF=/root/siteinfo/ig-users.conf

VOS="comput-er.it dteam igi.italiangrid.it infngrid ops gridit"
QUEUES="cert prod"
CERT_GROUP_ENABLE="dteam infngrid ops /dteam/ROLE=lcgadmin /dteam/ROLE=production /ops/ROLE=lcgadmin /ops/ROLE=pilot /infngrid/ROLE=SoftwareManager /infngrid/ROLE=pilot"
PROD_GROUP_ENABLE="comput-er.it gridit igi.italiangrid.it /comput-er.it/ROLE=SoftwareManager /gridit/ROLE=SoftwareManager /igi.italiangrid.it/ROLE=SoftwareManager"

WN_LIST="/root/siteinfo/wn-list.conf"
MUNGE_KEY_FILE=/etc/munge/munge.key
CONFIG_MAUI="no"

SITE_NAME=IGI-BOLOGNA
APEL_DB_PASSWORD=not_used
APEL_MYSQL_HOST=not_used

WN list

Set in this file the WNs list, for example:
# less /root/siteinfo/wn-list.conf
wn05.cnaf.infn.it
wn06.cnaf.infn.it

site-info.def

SUGGESTION: use the same site-info.def for CREAM and WNs: for this reason in this example file there are yaim variable used by CREAM, TORQUE or emi-WN.

It is also included the settings of some VOs

For your convenience there is an explanation of each yaim variable. For more details look at [8, 9, 10] </>

<--/twistyPlugin-->

<--/twistyPlugin twikiMakeVisibleInline-->

services/glite-creamce

#
# YAIM creamCE specific variables
#

# LSF settings: path where lsf.conf is located
#BATCH_CONF_DIR=lsf_install_path/conf
#
# CE-monitor host (by default CE-monitor is installed on the same machine as 
# cream-CE)
CEMON_HOST=$CE_HOST
#
# CREAM database user
CREAM_DB_USER=*********
#
CREAM_DB_PASSWORD=*********
# Machine hosting the BLAH blparser.
# In this machine batch system logs must be accessible.
#BLPARSER_HOST=set_to_fully_qualified_host_name_of_machine_hosting_blparser_server
BLPARSER_HOST=$CE_HOST
<--/twistyPlugin-->

<--/twistyPlugin twikiMakeVisibleInline-->

services/dgas_sensors

#
# YAIM DGAS Sensors specific variables
#


################################
# DGAS configuration variables #
################################
# For any details about DGAS variables please refer to the guide:
# http://igrelease.forge.cnaf.infn.it/doku.php?id=doc:guides:dgas

# Reference Resource HLR for the site.
DGAS_HLR_RESOURCE="prod-hlr-01.pd.infn.it"

# Specify the type of job which the CE has to process.
# Set ”all” on “the main CE” of the site, ”grid” on the others.
# Default value: all
#DGAS_JOBS_TO_PROCESS="all"

# This parameter can be used to specify the list of VOs to publish.
# If the parameter is specified, the sensors (pushd) will forward
# to the Site HLR just records belonging to one of the specified VOs.
# Leave commented if you want to send records for ALL VOs
# Default value: parameter not specified
#DGAS_VO_TO_PROCESS="vo1;vo2;vo3..."


# Bound date on jobs backward processing.
# The backward processing does not consider jobs prior to that date.
# Default value: 2009-01-01.
#DGAS_IGNORE_JOBS_LOGGED_BEFORE="2011-11-01"

# Main CE of the site.
# ATTENTION: set this variable only in the case of site with a “singleLRMS”
# in which there are more than one CEs or local submission hosts (i.e. host
# from which you may submit jobs directly to the batch system).
# In this case, DGAS_USE_CE_HOSTNAME parameter must be set to the same value
# for all hosts sharing the lrms and this value can be arbitrary chosen among
# these submitting hostnames (you may choose the best one).
# Otherwise leave it commented.
# we have 2 CEs, cremino is the main one
DGAS_USE_CE_HOSTNAME="cremino.cnaf.infn.it"

# Path for the batch-system log files.
# * for torque/pbs:
# DGAS_ACCT_DIR=/var/torque/server_priv/accounting
# * for LSF:
# DGAS_ACCT_DIR=lsf_install_path/work/cluster_name/logdir
# * for SGE:
# DGAS_ACCT_DIR=/opt/sge/default/common/
DGAS_ACCT_DIR=/var/torque/server_priv/accounting

# Full path to the 'condor_history' command, used to gather DGAS usage records
# when Condor is used as a batch system. Otherwise leave it commented.
#DGAS_CONDOR_HISTORY_COMMAND=""
<--/twistyPlugin-->

<--/twistyPlugin twikiMakeVisibleInline-->
---+++ host certificate
# ll /etc/grid-security/host*
-rw-r--r-- 1 root root 1440 Oct 18 09:31 /etc/grid-security/hostcert.pem
-r-------- 1 root root  887 Oct 18 09:31 /etc/grid-security/hostkey.pem
<--/twistyPlugin-->

<--/twistyPlugin twikiMakeVisibleInline-->

munge configuration

IMPORTANT: The updated EPEL5 build of torque-2.5.7-1 as compared to previous versions enables munge as an inter node authentication method.

  • verify that munge is correctly installed:
# rpm -qa | grep munge
munge-libs-0.5.8-8.el5
munge-0.5.8-8.el5
  • On one host (for example the batch server) generate a key by launching:
# /usr/sbin/create-munge-key

# ls -ltr /etc/munge/
total 4
-r-------- 1 munge munge 1024 Jan 13 14:32 munge.key
  • Copy the key, /etc/munge/munge.key to every host of your cluster, adjusting the permissions:
# chown munge:munge /etc/munge/munge.key
  • Start the munge daemon on each node:
# service munge start
Starting MUNGE:                                            [  OK  ]

# chkconfig munge on

<--/twistyPlugin-->

<--/twistyPlugin twikiMakeVisibleInline-->
Verify to have set all the yaim variables by launching:
# /opt/glite/yaim/bin/yaim -v -s site-info_cremino.def -n creamCE -n TORQUE_server -n TORQUE_utils -n DGAS_sensors

see details

<--/twistyPlugin-->

<--/twistyPlugin twikiMakeVisibleInline-->

# /opt/glite/yaim/bin/yaim -c -s site-info_cremino.def -n creamCE -n TORQUE_server -n TORQUE_utils -n DGAS_sensors

see details

<--/twistyPlugin-->

<--/twistyPlugin twikiMakeVisibleInline-->

Software Area settings

If the Software Area is hosted on your CE, you have to create it and export to the WNs in the site.def we set:

VO_SW_DIR=/opt/exp_soft
  • directory creation
mkdir /opt/exp_soft/
  • edit /etc/exports creating a line like the following:
/opt/exp_soft/ *.cnaf.infn.it(rw,sync,no_root_squash)
  • check nfs and portmap status
# service nfs status
rpc.mountd is stopped
nfsd is stopped

# service portmap status
portmap is stopped

# service portmap start
Starting portmap:                                          [  OK  ]

# service nfs start
Starting NFS services:                                     [  OK  ]
Starting NFS daemon:                                       [  OK  ]
Starting NFS mountd:                                       [  OK  ]
Starting RPC idmapd:                                       [  OK  ]

# chkconfig nfs on
# chkconfig portmap on
  • after any modification in /etc/exports you can launch
# exportfs -ra
or simply restart nfs daemon

<--/twistyPlugin-->

<--/twistyPlugin twikiMakeVisibleInline-->

walltime workaround

If on the queues there is published:
GlueCEStateWaitingJobs: 444444
and in the log /var/log/bdii/bdii-update.log you notice errors like the folllowing:
Traceback (most recent call last):
  File "/usr/libexec/lcg-info-dynamic-scheduler", line 435, in ?
    wrt = qwt * nwait
TypeError: unsupported operand type(s) for *: 'NoneType' and 'int'
probably the queues have no "resources_default.walltime" parameter configured.

So define it for each queue by launching, for example:

# qmgr -c "set queue prod resources_default.walltime = 01:00:00"
# qmgr -c "set queue cert resources_default.walltime = 01:00:00"
# qmgr -c "set queue cloudtf resources_default.walltime = 01:00:00"
<--/twistyPlugin-->

Service Checks

<--/twistyPlugin twikiMakeVisibleInline-->

  • After service installation to have a look if all were installed in a proper way, you could have a look to Service CREAM Reference Card
  • You can also perform some checks after the installation and configuration of your CREAM

TORQUE checks:

  • check the pbs settings:
# qmgr -c 'p s'
  • check the WNs state
# pbsnodes -a

<--/twistyPlugin-->

<--/twistyPlugin twikiMakeVisibleInline-->

maui settings

In order to reserve a job slot for test jobs, you need to apply some settings in the maui configuration (/var/spool/maui/maui.cfg)

Suppose you have enabled the test VOs (ops, dteam and infngrid) on the "cert" queue and that you have 8 job slots available. Add the following lines in the /var/spool/maui/maui.cfg file:

CLASSWEIGHT 1
QOSWEIGHT 1

QOSCFG[normal] MAXJOB=7

CLASSCFG[prod] QDEF=normal
CLASSCFG[cert] PRIORITY=5000

After the modification restart maui.

In order to avoid that yaim overwrites this file during the host reconfiguration, set:

CONFIG_MAUI="no"

in your site.def (the first time you launch the yaim script, it has to be set to "yes"

<--/twistyPlugin-->

Revisions

Date Comment By
2012-05-25 First draft Paolo Veronesi

-- PaoloVeronesi - 2012-05-25

 
This site is powered by the TWiki collaboration platformCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback