Tags:
create new tag
,
view all tags
---+!! Notes about Installation and Configuration of a stand alone Torque server * *These notes are provided by site admins on a best effort base as a contribution to the IGI communities and MUST not be considered as a subsitute of the [[http://wiki.italiangrid.it/twiki/bin/view/IGIRelease/IgiEmi][Official IGI documentation]].* * This document is addressed to site administrators responsible for middleware installation and configuration. * The goal of this page is to provide some hints and examples on how to install and configure a stand alone *EMI TORQUE* server %TOC% ---++ References 1. [[http://www.italiangrid.it/][About IGI - Italian Grid infrastructure]] 1. [[http://wiki.italiangrid.it/twiki/bin/view/IGIRelease/WebHome][About IGI Release]] 1. [[http://wiki.italiangrid.it/twiki/bin/view/IGIRelease/IgiEmi][IGI Official Installation and Configuration guide]] 1. [[https://twiki.cern.ch/twiki/bin/view/EMI/GenericInstallationConfigurationEMI1][Generic Installation & Configuration for EMI 1]] 1. [[https://twiki.cern.ch/twiki/bin/view/LCG/YaimGuide400][ Yaim Guide]] 1. [[https://twiki.cern.ch/twiki/bin/view/LCG/Site-info_configuration_variables#site_info_def][site-info.def yaim variables]] 1. [[https://twiki.cern.ch/twiki/bin/view/LCG/Site-info_configuration_variables#MPI][MPI yaim variables]] 1. [[https://twiki.cern.ch/twiki/bin/view/LCG/Site-info_configuration_variables#WN][WN yaim variables]] 1. [[https://twiki.cern.ch/twiki/bin/view/LCG/Site-info_configuration_variables#TORQUE][TORQUE Yaim variables]] 1. [[http://www.eu-emi.eu/products/-/asset_publisher/z2MT/content/emi-wn][EMI-WN v.1.0.0]] 1. [[http://www.eu-emi.eu/products/-/asset_publisher/z2MT/content/glite-mpi][gLite-MPI v.1.0.0]] 1. [[https://wiki.egi.eu/wiki/MAN03 ][MPI-Start Installation and Configuration]] 1. [[https://wiki.egi.eu/wiki/Tools/Manuals/SiteProblemsFollowUp][Troubleshooting Guide for Operational Errors on EGI Sites]] 1. [[https://wiki.egi.eu/wiki/Tools/Manuals/AdministrationFaq][Grid Administration FAQs page]] ---++ Service installation %TWISTY{ mode="div" showlink=" *O.S. and Repos* " hidelink=" *O.S. and Repos* " remember="off" firststart="hide" showimgright="%ICONURLPATH{toggleopen}%" hideimgright="%ICONURLPATH{toggleclose}%" }% ---+++ O.S. and Repos * Starts from a fresh installation of Scientific Linux 5.x (x86_64). <verbatim> # cat /etc/redhat-release Scientific Linux SL release 5.7 (Boron) </verbatim> * Install the additional repositories: EPEL, Certification Authority, UMD <verbatim> # yum install yum-priorities yum-protectbase # cd /etc/yum.repos.d/ # rpm -ivh http://mirror.switch.ch/ftp/mirror/epel//5/x86_64/epel-release-5-4.noarch.rpm # wget http://repo-pd.italiangrid.it/mrepo/repos/egi-trustanchors.repo # rpm -ivh http://repo-pd.italiangrid.it/mrepo/EMI/1/sl5/x86_64/updates/emi-release-1.0.1-1.sl5.noarch.rpm # wget http://repo-pd.italiangrid.it/mrepo/repos/igi/sl5/x86_64/igi-emi.repo </verbatim> * Be sure that SELINUX is disabled (or permissive). Details on how to disable SELINUX are [[http://fedoraproject.org/wiki/SELinux/setenforce][here]]: <verbatim> # getenforce Disabled </verbatim> * Check the repos list (sl-*.repo are the repos of the O.S. and they should be present by default). <verbatim> # ls /etc/yum.repos.d/ egi-trustanchors.repo emi1-third-party.repo emi1-base.repo emi1-updates.repo epel.repo epel-testing.repo igi-emi.repo sl-contrib.repo sl-fastbugs.repo sl-security.repo sl-testing.repo sl-debuginfo.repo sl.repo sl-srpms.repo </verbatim> *IMPORTANT*: remove the dag repository if present %ENDTWISTY% %TWISTY{ mode="div" showlink=" *yum install* " hidelink=" *yum install* " remember="off" firststart="hide" showimgright="%ICONURLPATH{toggleopen}%" hideimgright="%ICONURLPATH{toggleclose}%" }% ---+++ yum install <verbatim> # yum clean all Loaded plugins: downloadonly, kernel-module, priorities, protect-packages, protectbase, security, verify, versionlock Cleaning up Everything # yum install emi-torque-server emi-torque-utils # yum install yaim-addons # yum install nfs-utils </verbatim> see [[YumTorqueServer][here]] for details %ENDTWISTY% ---++ Service configuration You have to copy the configuration files in another path, for example root, and set them properly (see later): <verbatim> # cp -r /opt/glite/yaim/examples/siteinfo/* . </verbatim> %TWISTY{ mode="div" showlink=" *vo.d directory* " hidelink=" *vo.d directory* " remember="off" firststart="hide" showimgright="%ICONURLPATH{toggleopen}%" hideimgright="%ICONURLPATH{toggleclose}%" }% ---+++ vo.d directory Create the vo.d directory for the VO configuration file (you can decide if keep the VO information in the site.def or putting them in the vo.d directory) <verbatim> # mkdir vo.d </verbatim> [[VoDirContent][here]] an example for some VOs. Information about the several VOs are available at the [[http://operations-portal.in2p3.fr/vo][CENTRAL OPERATIONS PORTAL]]. %ENDTWISTY% %TWISTY{ mode="div" showlink=" *users and groups configuration* " hidelink=" *users and groups configuration* " remember="off" firststart="hide" showimgright="%ICONURLPATH{toggleopen}%" hideimgright="%ICONURLPATH{toggleclose}%" }% ---+++ users and groups configuration here an example on how to define pool accounts ([[https://forge.cnaf.infn.it/plugins/scmsvn/viewcvs.php/branches/BRANCH-4_0_X/ig-yaim/examples/ig-users.conf?rev=6195&root=igrelease&view=markup][ig-users.conf]]) and groups ([[https://forge.cnaf.infn.it/plugins/scmsvn/viewcvs.php/*checkout*/branches/BRANCH-4_0_X/ig-yaim/examples/ig-groups.conf?rev=6193&root=igrelease][ig-groups.conf]]) for several VOs %ENDTWISTY% %TWISTY{ mode="div" showlink=" *wn-list.conf* " hidelink=" *wn-list.conf* " remember="off" firststart="hide" showimgright="%ICONURLPATH{toggleopen}%" hideimgright="%ICONURLPATH{toggleclose}%" }% ---+++ wn-list.conf Set in this file the WNs list, for example: <verbatim> # less wn-list.conf wn05.cnaf.infn.it wn06.cnaf.infn.it </verbatim> %ENDTWISTY% %TWISTY{ mode="div" showlink=" *site-info.def* " hidelink=" *site-info.def* " remember="off" firststart="hide" showimgright="%ICONURLPATH{toggleopen}%" hideimgright="%ICONURLPATH{toggleclose}%" }% ---+++ site-info.def SUGGESTION: use the same [[SiteDefCreamWNMPI][site-info.def]] for CREAM and WNs: for this reason in this example file there are yaim variable used by CREAM, TORQUE or emi-WN. It is also included the settings of some VOs For your convenience there is an explanation of each yaim variable. For more details look at [8, 9, 10] %ENDTWISTY% %TWISTY{ mode="div" showlink=" *host certificate* " hidelink=" *host certificate* " remember="off" firststart="hide" showimgright="%ICONURLPATH{toggleopen}%" hideimgright="%ICONURLPATH{toggleclose}%" }% ---+++ host certificate <verbatim> # ll /etc/grid-security/host* -rw-r--r-- 1 root root 1440 Oct 18 09:31 /etc/grid-security/hostcert.pem -r-------- 1 root root 887 Oct 18 09:31 /etc/grid-security/hostkey.pem </verbatim> %ENDTWISTY% %TWISTY{ mode="div" showlink=" *munge configuration* " hidelink=" *munge configuration* " remember="off" firststart="hide" showimgright="%ICONURLPATH{toggleopen}%" hideimgright="%ICONURLPATH{toggleclose}%" }% ---+++ munge configuration *IMPORTANT*: The updated EPEL5 build of torque-2.5.7-1 as compared to previous versions enables munge as an inter node authentication method. * verify that munge is correctly installed: <verbatim> # rpm -qa | grep munge munge-libs-0.5.8-8.el5 munge-0.5.8-8.el5 </verbatim> * On one host (for example the batch server) generate a key by launching: <verbatim> # /usr/sbin/create-munge-key # ls -ltr /etc/munge/ total 4 -r-------- 1 munge munge 1024 Jan 13 14:32 munge.key </verbatim> * Copy the key, /etc/munge/munge.key to every host of your cluster, adjusting the permissions: <verbatim> # chown munge:munge /etc/munge/munge.key </verbatim> * Start the munge daemon on each node: <verbatim> # service munge start Starting MUNGE: [ OK ] # chkconfig munge on </verbatim> %ENDTWISTY% %TWISTY{ mode="div" showlink=" *yaim check* " hidelink=" *yaim check* " remember="off" firststart="hide" showimgright="%ICONURLPATH{toggleopen}%" hideimgright="%ICONURLPATH{toggleclose}%" }% Verify to have set all the yaim variables by launching: <verbatim> # /opt/glite/yaim/bin/yaim -v -s site-info_batch.def -n TORQUE_server -n TORQUE_utils </verbatim> see [[TorqueServerYaimVerConf#YAIM_Verification][details]] %ENDTWISTY% %TWISTY{ mode="div" showlink=" *yaim config* " hidelink=" *yaim config* " remember="off" firststart="hide" showimgright="%ICONURLPATH{toggleopen}%" hideimgright="%ICONURLPATH{toggleclose}%" }% <verbatim> # /opt/glite/yaim/bin/yaim -c -s site-info_batch.def -n TORQUE_server -n TORQUE_utils </verbatim> see [[TorqueServerYaimVerConf#YAIM_Configuration][details]] %ENDTWISTY% %TWISTY{ mode="div" showlink=" *tomcat and ldap users* " hidelink=" *tomcat and ldap users* " remember="off" firststart="hide" showimgright="%ICONURLPATH{toggleopen}%" hideimgright="%ICONURLPATH{toggleclose}%" }% ---+++ tomcat and ldap users It is necessary to create tomcat and ldap users on the torque server, otherwise the computing elements will fail in connecting the server. When those users doesn't exist on the server, on the CE you will see errors like the following <verbatim> 2012-04-24 15:37:29 lcg-info-dynamic-scheduler: LRMS backend command returned nonzero exit status 2012-04-24 15:37:29 lcg-info-dynamic-scheduler: Exiting without output, GIP will use static values Can not obtain pbs version from host [...] </verbatim> instead, on the torque server: <verbatim> 04/24/2012 14:00:46;0080;PBS_Server;Req;req_reject;Reject reply code=15021(Invalid credential), aux=0, type=StatusJob, from tomcat@cream-01.cnaf.infn.it 04/24/2012 14:01:02;0080;PBS_Server;Req;req_reject;Reject reply code=15021(Invalid credential), aux=0, type=StatusJob, from ldap@cream-01.cnaf.infn.it </verbatim> *Solution* is to add tomcat and ldap users/groups to torque host and restart pbs_server - as they exists only on CreamCE host. <verbatim> # echo 'tomcat:x:91:91:Tomcat:/usr/share/tomcat5:/bin/sh' >> /etc/passwd # echo 'ldap:x:55:55:LDAP User:/var/lib/ldap:/bin/false' >> /etc/passwd # echo 'tomcat:x:91:' >> /etc/group # echo 'ldap:x:55:' >> /etc/group </verbatim> %ENDTWISTY% %TWISTY{ mode="div" showlink=" *Software Area settings* " hidelink=" *Software Area settings* " remember="off" firststart="hide" showimgright="%ICONURLPATH{toggleopen}%" hideimgright="%ICONURLPATH{toggleclose}%" }% ---+++ Software Area settings If the Software Area is hosted on your CE, you have to create it and export to the WNs in the site.def we set: <verbatim> VO_SW_DIR=/opt/exp_soft </verbatim> * directory creation <verbatim> mkdir /opt/exp_soft/ </verbatim> * edit /etc/exports creating a line like the following: <verbatim> /opt/exp_soft/ *.cnaf.infn.it(rw,sync,no_root_squash) </verbatim> * check nfs and portmap status <verbatim> # service nfs status rpc.mountd is stopped nfsd is stopped # service portmap status portmap is stopped # service portmap start Starting portmap: [ OK ] # service nfs start Starting NFS services: [ OK ] Starting NFS daemon: [ OK ] Starting NFS mountd: [ OK ] Starting RPC idmapd: [ OK ] # chkconfig nfs on # chkconfig portmap on </verbatim> * after any modification in /etc/exports you can launch <verbatim> # exportfs -ra </verbatim> or simply restart nfs daemon %ENDTWISTY% %TWISTY{ mode="div" showlink=" *walltime workaround* " hidelink=" *walltime workaround* " remember="off" firststart="hide" showimgright="%ICONURLPATH{toggleopen}%" hideimgright="%ICONURLPATH{toggleclose}%" }% ---+++ walltime workaround If on the CE queues there is published: <verbatim> GlueCEStateWaitingJobs: 444444 </verbatim> and in the log /var/log/bdii/bdii-update.log on CE you notice errors like the folllowing: <verbatim> Traceback (most recent call last): File "/usr/libexec/lcg-info-dynamic-scheduler", line 435, in ? wrt = qwt * nwait TypeError: unsupported operand type(s) for *: 'NoneType' and 'int' </verbatim> probably the queues have no "resources_default.walltime" parameter configured. So define it for each queue by launching, for example: <verbatim> # qmgr -c "set queue prod resources_default.walltime = 01:00:00" # qmgr -c "set queue cert resources_default.walltime = 01:00:00" # qmgr -c "set queue cloudtf resources_default.walltime = 01:00:00" </verbatim> %ENDTWISTY% %TWISTY{ mode="div" showlink=" *adding a second CE* " hidelink=" *adding a second CE* " remember="off" firststart="hide" showimgright="%ICONURLPATH{toggleopen}%" hideimgright="%ICONURLPATH{toggleclose}%" }% ---+++ adding a second CE In order to allow the submission from a second CE, do the following actions on the *batch server*: * edit the files */etc/hosts.equiv* and */etc/ssh/shosts.equiv* adding the FQDN of the second CE * define the parameter authorized_users in the pbs server: <verbatim> # qmgr -c "set server authorized_users += *@cream-02.cnaf.infn.it" </verbatim> Regarding the *ssh configuration*, have a look [[NotesAboutInstallationAndConfigurationOfCREAMForTORQUE]] %ENDTWISTY% ---++ Service Checks %TWISTY{ mode="div" showlink=" *checks* " hidelink=" *checks* " remember="off" firststart="hide" showimgright="%ICONURLPATH{toggleopen}%" hideimgright="%ICONURLPATH{toggleclose}%" }% * After service installation to have a look if all were installed in a proper way, you could have a look to [[http://wiki.italiangrid.it/twiki/bin/view/CREAM/ServiceReferenceCard][Service CREAM Reference Card]] * You can also perform some [[http://wiki.italiangrid.it/twiki/bin/view/CREAM/TroubleshootingGuide#1_Checks_to_be_done_after_instal][checks]] after the installation and configuration of your CREAM ---+++ TORQUE checks: * check the pbs settings: <verbatim> # qmgr -c 'p s' </verbatim> * check the WNs state <verbatim> # pbsnodes -a </verbatim> %ENDTWISTY% %TWISTY{ mode="div" showlink=" *maui settings* " hidelink=" *maui settings* " remember="off" firststart="hide" showimgright="%ICONURLPATH{toggleopen}%" hideimgright="%ICONURLPATH{toggleclose}%" }% ---+++ maui settings In order to reserve a job slot for test jobs, you need to apply some settings in the maui configuration (/var/spool/maui/maui.cfg) Suppose you have enabled the test VOs (ops, dteam and infngrid) on the "cert" queue and that you have 8 job slots available. Add the following lines in the maui.cfg files: <verbatim> CLASSWEIGHT 1 QOSWEIGHT 1 QOSCFG[normal] MAXJOB=7 CLASSCFG[prod] QDEF=normal CLASSCFG[cert] PRIORITY=5000 </verbatim> After the modification restart maui. In order to avoid that yaim overwrites this file during the host reconfiguration, set: <verbatim> CONFIG_MAUI="no" </verbatim> in your site.def (the first time you launch the yaim script, it has to be set to "yes" %ENDTWISTY% ---++ Revisions | *Date* | *Comment* | *By* | | 2012-05-31 | installation notes completed | Alessandro Paolini | | 2012-05-25 | First draft | Alessandro Paolini | -- Main.AlessandroPaolini - 2012-05-25
E
dit
|
A
ttach
|
PDF
|
H
istory
: r2
<
r1
|
B
acklinks
|
V
iew topic
|
M
ore topic actions
Topic revision: r2 - 2012-06-13
-
AlessandroPaolini
Home
Site map
CEMon web
CREAM web
Cloud web
Cyclops web
DGAS web
EgeeJra1It web
Gows web
GridOversight web
IGIPortal web
IGIRelease web
MPI web
Main web
MarcheCloud web
MarcheCloudPilotaCNAF web
Middleware web
Operations web
Sandbox web
Security web
SiteAdminCorner web
TWiki web
Training web
UserSupport web
VOMS web
WMS web
WMSMonitor web
WeNMR web
SiteAdminCorner Web
Create New Topic
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
View
Raw View
Print version
Find backlinks
History
More topic actions
Edit
Raw edit
Attach file or image
Edit topic preference settings
Set new parent
More topic actions
Account
Log In
E
dit
A
ttach
Copyright © 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback