Notes about Installation and Configuration of a CREAM Computing Element using an external Torque server as batch system and ARGUS as authorization method

  • These notes are provided by site admins on a best effort base as a contribution to the IGI communities and MUST not be considered as a subsitute of the Official IGI documentation.
  • This document is addressed to site administrators responsible for middleware installation and configuration.
  • The goal of this page is to provide some hints and examples on how to install and configure an IGI CREAM CE service based on EMI middleware, in no cluster mode, with TORQUE as batch system on a different host and using an external ARGUS server for the users authorization

References

  1. About IGI - Italian Grid infrastructure
  2. About IGI Release
  3. IGI Official Installation and Configuration guide
  4. EMI CREAM System Administrator Guide
  5. Yaim Guide
  6. site-info.def yaim variables
  7. CREAM yaim variables
  8. TORQUE Yaim variables
  9. CREAM v.1.13
  10. CREAM TORQUE module v. 1.0.0-1
  11. Troubleshooting Guide for Operational Errors on EGI Sites
  12. Grid Administration FAQs page

Service installation

O.S. and Repos

  • Starts from a fresh installation of Scientific Linux 5.x (x86_64).
# cat /etc/redhat-release 
Scientific Linux SL release 5.7 (Boron) 

* Install the additional repositories: EPEL, Certification Authority, UMD

# yum install yum-priorities yum-protectbase
# cd /etc/yum.repos.d/
# rpm -ivh http://mirror.switch.ch/ftp/mirror/epel//5/x86_64/epel-release-5-4.noarch.rpm
# wget http://repo-pd.italiangrid.it/mrepo/repos/egi-trustanchors.repo
# rpm -ivh http://repo-pd.italiangrid.it/mrepo/EMI/1/sl5/x86_64/updates/emi-release-1.0.1-1.sl5.noarch.rpm
# wget http://repo-pd.italiangrid.it/mrepo/repos/igi/sl5/x86_64/igi-emi.repo

  • Be sure that SELINUX is disabled (or permissive). Details on how to disable SELINUX are here:

# getenforce 
Disabled

  • Check the repos list (sl-*.repo are the repos of the O.S. and they should be present by default).

# ls /etc/yum.repos.d/
egi-trustanchors.repo  
emi1-third-party.repo emi1-base.repo emi1-updates.repo
igi-emi.repo
epel.repo epel-testing.repo  
sl-contrib.repo sl-fastbugs.repo sl-security.repo sl-testing.repo sl-debuginfo.repo sl.repo sl-srpms.repo
IMPORTANT: remove the dag repository if present

yum install

# yum clean all
Loaded plugins: downloadonly, kernel-module, priorities, protect-packages, protectbase, security, verify, versionlock
Cleaning up Everything

# yum install ca-policy-egi-core
# yum install xml-commons-apis
# yum install emi-cream-ce
# yum install emi-torque-utils
# yum install glite-dgas-common glite-dgas-hlr-clients glite-dgas-hlr-sensors glite-dgas-hlr-sensors-producers yaim-dgas
# yum install nfs-utils

see here for details

Service configuration

You have to copy the configuration files in another path, for example root, and set them properly (see later):
# cp -r /opt/glite/yaim/examples/siteinfo/* .

vo.d directory

Create the vo.d directory for the VO configuration file (you can decide if keep the VO information in the site.def or putting them in the vo.d directory)
# mkdir vo.d
here an example for some VOs.

Information about the several VOs are available at the CENTRAL OPERATIONS PORTAL.

users and groups configuration

here an example on how to define pool accounts (ig-users.conf) and groups (ig-groups.conf) for several VOs

wn-list.conf

Set in this file the WNs list, for example:

# less wn-list.conf 
wn01.cnaf.infn.it
wn02.cnaf.infn.it
wn03.cnaf.infn.it
wn04.cnaf.infn.it

site-info.def

SUGGESTION: you can use the same site-info.def used for the main CREAM computing element and for WNs, with just a few changements
CE_HOST=cremoso.$MY_DOMAIN
CE_PHYSCPU=0
CE_LOGCPU=0
BATCH_SERVER=cremino.cnaf.infn.it
For your convenience there is an explanation of each yaim variable. For more details look at [8, 9, 10]

services/glite-creamce

#
# YAIM creamCE specific variables
#

# LSF settings: path where lsf.conf is located
#BATCH_CONF_DIR=lsf_install_path/conf
#
# CE-monitor host (by default CE-monitor is installed on the same machine as 
# cream-CE)
CEMON_HOST=$CE_HOST
#
# CREAM database user
CREAM_DB_USER=*********
#
CREAM_DB_PASSWORD=*********
# Machine hosting the BLAH blparser.
# In this machine batch system logs must be accessible.
#BLPARSER_HOST=set_to_fully_qualified_host_name_of_machine_hosting_blparser_server
BLPARSER_HOST=$CE_HOST

services/dgas_sensors

#
# YAIM DGAS Sensors specific variables
#


################################
# DGAS configuration variables #
################################
# For any details about DGAS variables please refer to the guide:
# http://igrelease.forge.cnaf.infn.it/doku.php?id=doc:guides:dgas

# Reference Resource HLR for the site.
DGAS_HLR_RESOURCE="prod-hlr-01.pd.infn.it"

# Specify the type of job which the CE has to process.
# Set ”all” on “the main CE” of the site, ”grid” on the others.
# Default value: all
DGAS_JOBS_TO_PROCESS="grid"

# This parameter can be used to specify the list of VOs to publish.
# If the parameter is specified, the sensors (pushd) will forward
# to the Site HLR just records belonging to one of the specified VOs.
# Leave commented if you want to send records for ALL VOs
# Default value: parameter not specified
#DGAS_VO_TO_PROCESS="vo1;vo2;vo3..."


# Bound date on jobs backward processing.
# The backward processing does not consider jobs prior to that date.
# Default value: 2009-01-01.
#DGAS_IGNORE_JOBS_LOGGED_BEFORE="2011-11-01"

# Main CE of the site.
# ATTENTION: set this variable only in the case of site with a “singleLRMS”
# in which there are more than one CEs or local submission hosts (i.e. host
# from which you may submit jobs directly to the batch system).
# In this case, DGAS_USE_CE_HOSTNAME parameter must be set to the same value
# for all hosts sharing the lrms and this value can be arbitrary chosen among
# these submitting hostnames (you may choose the best one).
# Otherwise leave it commented.
# we have 2 CEs, cremino is the main one
DGAS_USE_CE_HOSTNAME="cremino.cnaf.infn.it"

# Path for the batch-system log files.
# * for torque/pbs:
# DGAS_ACCT_DIR=/var/torque/server_priv/accounting
# * for LSF:
# DGAS_ACCT_DIR=lsf_install_path/work/cluster_name/logdir
# * for SGE:
# DGAS_ACCT_DIR=/opt/sge/default/common/
DGAS_ACCT_DIR=/var/torque/server_priv/accounting

# Full path to the 'condor_history' command, used to gather DGAS usage records
# when Condor is used as a batch system. Otherwise leave it commented.
#DGAS_CONDOR_HISTORY_COMMAND=""

---+++ host certificate
# ll /etc/grid-security/host*
-rw-r--r-- 1 root root 1440 Oct 18 09:31 /etc/grid-security/hostcert.pem
-r-------- 1 root root  887 Oct 18 09:31 /etc/grid-security/hostkey.pem

authorization on the batch server

In order to allow the submission from the second CE, do the following actions on the "main CE / batch server":

  • edit the files /etc/hosts.equiv and /etc/ssh/shosts.equiv adding the FQDN of the second CE

  • define the parameter authorized_users in the pbs server:
# qmgr -c "set server authorized_users += *@cremoso.cnaf.infn.it"

munge configuration

IMPORTANT: The updated EPEL5 build of torque-2.5.7-1 as compared to previous versions enables munge as an inter node authentication method.

  • verify that munge is correctly installed:
# rpm -qa | grep munge
munge-libs-0.5.8-8.el5
munge-0.5.8-8.el5
  • On one host (for example the batch server) generate a key by launching:
# /usr/sbin/create-munge-key

# ls -ltr /etc/munge/
total 4
-r-------- 1 munge munge 1024 Jan 13 14:32 munge.key
  • Copy the key, /etc/munge/munge.key to every host of your cluster, adjusting the permissions:
# chown munge:munge /etc/munge/munge.key
  • Start the munge daemon on each node:
# service munge start
Starting MUNGE:                                            [  OK  ]

# chkconfig munge on

ssh configuration for the cluster hosts

The ssh access of the second CE to the cluster isn't completely handled by yaim, so you have to do some actions in order to configure it.

  • login to your "main CE / batch server" and do a ssh to the second CE, then exit:
[root@cremino ~]# ssh cremoso.cnaf.infn.it
The authenticity of host 'cremoso.cnaf.infn.it (131.154.101.48)' can't be established.
RSA key fingerprint is b6:5e:1f:aa:45:2f:5f:f0:73:d2:8f:9d:a1:86:bb:7e.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'cremoso.cnaf.infn.it,131.154.101.48' (RSA) to the list of known hosts.
root@cremoso.cnaf.infn.it's password: 
Last login: Fri Feb 10 09:33:47 2012 from pcpaolini.cnaf.infn.it
 ___ _   _ _____ _   _        ____ _   _    _    _____
|_ _| \ | |  ___| \ | |      / ___| \ | |  / \  |  ___|
 | ||  \| | |_  |  \| |_____| |   |  \| | / _ \ | |_
 | || |\  |  _| | |\  |_____| |___| |\  |/ ___ \|  _|
|___|_| \_|_|   |_| \_|      \____|_| \_/_/   \_\_|
[root@cremoso ~]# exit
logout
Connection to cremoso.cnaf.infn.it closed.

  • The key produced and stored in /root/.ssh/known_hosts for the second CE should be added in the /etc/ssh/ssh_known_hosts file of the main CE and the WNs of the site.
# cat /root/.ssh/known_hosts |grep cremoso >> /etc/ssh/ssh_known_hosts

  • Assuming that the main CE /etc/ssh/ssh_known_hosts file contains the keys of all WNs perform a copy of it into the second CE:
# scp /etc/ssh/ssh_known_hosts cremoso:/etc/ssh/

import from main CE

you have to import several things from the main CE / batch server: gridmapdir, torque path, and the software area and tags.

On the main CE

First of all you have to export the proper directories from the main CE:

  • edit the file /etc/exports adding the following lines:
/opt/exp_soft/ *.cnaf.infn.it(rw,sync,no_root_squash)
/etc/grid-security/gridmapdir cremoso.cnaf.infn.it(rw,sync,no_root_squash)
/var/torque/ cremoso.cnaf.infn.it(rw,sync,no_root_squash)
/opt/edg/var/info/ cremoso.cnaf.infn.it(rw,sync,no_root_squash)
  • make active the modification by launching:
# exportfs -ra

On the second CE

  • Edit the file /etc/fstab by adding lines like the following:
cremino.cnaf.infn.it:/opt/exp_soft/ /opt/exp_soft/ nfs rw,defaults 0 0
cremino.cnaf.infn.it:/etc/grid-security/gridmapdir /etc/grid-security/gridmapdir nfs rw,defaults 0 0
cremino.cnaf.infn.it:/var/torque/ /var/torque/ nfs rw,defaults 0 0
cremino.cnaf.infn.it:/opt/edg/var/info/ /opt/edg/var/info/ nfs rw,defaults 0 0
Remember to create those directories if they don't exist yet

  • check nfs and portmap status
# service nfs status
rpc.mountd is stopped
nfsd is stopped

# service portmap status
portmap is stopped

# service portmap start
Starting portmap:                                          [  OK  ]

# service nfs start
Starting NFS services:                                     [  OK  ]
Starting NFS daemon:                                       [  OK  ]
Starting NFS mountd:                                       [  OK  ]
Starting RPC idmapd:                                       [  OK  ]

# chkconfig nfs on
# chkconfig portmap on
  • after any modification in /etc/fstab launch
mount -a
  • verify the mount:
# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
                       16G  2.8G   12G  19% /
/dev/vda1              99M   20M   75M  21% /boot
tmpfs                1006M     0 1006M   0% /dev/shm
cremino.cnaf.infn.it:/opt/exp_soft/
                       65G  4.2G   57G   7% /opt/exp_soft
cremino.cnaf.infn.it:/etc/grid-security/gridmapdir
                       65G  4.2G   57G   7% /etc/grid-security/gridmapdir
cremino.cnaf.infn.it:/var/torque/
                       65G  4.2G   57G   7% /var/torque
cremino.cnaf.infn.it:/opt/edg/var/info/
                       65G  4.2G   57G   7% /opt/edg/var/info

yaim check

Verify to have set all the yaim variables by launching:
# /opt/glite/yaim/bin/yaim -v -s site-info_cremoso.def -n creamCE -n TORQUE_utils -n DGAS_sensors

see details

yaim config

# /opt/glite/yaim/bin/yaim -c -s site-info_cremoso.def -n creamCE -n TORQUE_utils -n DGAS_sensors

see details

Service Checks

  • After service installation to check if all were installed in a proper way, you could have a look to Service CREAM Reference Card
  • You can also perform some checks after the installation and configuration of your CREAM

TORQUE checks:

  • check if the interaction with the batch server is properly working, launching some pbs commands, for example:
# qstat -q

# pbsnodes -a

ssh checks

  • ssh should work passwordless from WNs to CE when using a pool account

Revisions

Date Comment By
2012-02-15 installation notes completed Alessandro Paolini
2012-02-09 First draft Alessandro Paolini

-- AlessandroPaolini - 2012-02-09

Edit | Attach | PDF | History: r8 < r7 < r6 < r5 < r4 | Backlinks | Raw View | More topic actions
Topic revision: r8 - 2012-04-18 - AlessandroPaolini
 
This site is powered by the TWiki collaboration platformCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback