Line: 1 to 1 | |||||||||
---|---|---|---|---|---|---|---|---|---|
System Administrator Guide for CREAM for EMI-2 release | |||||||||
Line: 826 to 826 | |||||||||
0.0.0.1 Installation | |||||||||
Added: | |||||||||
> > | If the CREAM-CE has to be also the torque server, install the emi-torque-server metapackage: | ||||||||
Added: | |||||||||
> > |
emi-torque-utils metapackage:
0.0.0.1 Yaim ConfigurationSet yoursiteinfo.def file, which is the input file used by yaim.
The CREAM CE Torque integration is then configured running YAIM:
0.0.1 LSF0.0.1.1 RequirementsYou have to install and configure the LSF batch system software before installing and configuring the CREAM software.0.0.1.2 InstallationIf you are running LSF, install theemi-lsf-utils metapackage:
0.0.1.3 Yaim ConfigurationSet yoursiteinfo.def file, which is the input file used by yaim.
The CREAM CE LSF integration is then configured running YAIM:
0.0.2 Grid Engine0.0.2.1 RequirementsYou have to install and configure the GE batch system software before installing and configuring the CREAM software. The CREAM CE integration was tested with GE 6.2u5 but it should work with any forked version of the original GE software. The support of the GE batch system software (or any of its forked versions) is out of the scope of this activity. Before proceeding, please take note of the following remarks:
0.0.2.2 Integration pluginsThe GE integration with CREAM CE consists in deploying specific BLAH plugins and configure them to properly interoperate with Grid Engine batch system. The following GE BLAH plugins are deployed with CREAM CE installation: BUpdaterSGE, sge_hold.sh, sge_submit.sh, sge_resume.sh, sge_status.sh and sge_cancel.0.0.2.3 InstallationIf you are running GE, install theemi-ge-utils metapackage:
0.0.2.4 Yaim ConfigurationSet yoursiteinfo.def file, which is the input file used by yaim. Documentation about yaim variables relevant for CREAM CE and GE is available at
SGE_SHARED_INSTALL=yes in your site-info.def , otherwise YAIM may change your setup according to the definitions in your site-info.def .
The CREAM CE GE integration is then configured running YAIM:
0.0.2.5 Important notes0.0.2.5.1 File transfersBesides the input/output sandbox files (transfered via GFTP) there are some other files that need to be transferred from/to the CREAM sandbox directory on the CE node to/from the Worker Node, namely:
# diff -Nua sge_filestaging.modified sge_filestaging.orig --- sge_filestaging.modified 2010-03-25 19:38:11.000000000 +0000 +++ sge_filestaging.orig 2010-03-25 19:05:43.000000000 +0000 | ||||||||
Line: 21 to 21 | |||||||||
Added: | |||||||||
> > | my $remotefile = $3;
if ( $STAGEIN ) {
- system( 'cp', $remotefile, $localfile );
+ system( 'scp', "$remotemachine:$remotefile", $localfile );
} else {
- system( 'cp', $localfile, $remotefile" );
+ system( 'scp', $localfile, "$remotemachine:$remotefile" );
}
}
0.0.0.0.1 GE accounting fileBUpdaterSGE needs to consult the GE accounting file to determine how did a given job ended. Therefore, the GE accounting file must be shared between the GE SERVER / QMASTER and the CREAM CE. Moreover, to guarantee that the accounting file is updated on the fly, the GE configuration should be tunned (using qconf -mconf) in order to add under the reporting_params the following definitions:accounting=true accounting_flush_time=00:00:00
0.0.0.0.2 GE SERVER (QMASTER) tuningThe following suggestions should be implemented to achieve better performance when integrating with CREAM CE:
1 PostconfigurationHave a look at the Known issue page![]() 2 Operating the system2.1 Tomcat configuration guidelinesIn/etc/tomcat5/tomcat5.conf , there are some settings related to heap. They are in the JAVA_OPTS setting (see -Xms and -Xmx ).
It is suggested to customize such settings taking into account how much physical memory is available, as indicated in the following table (which refers to 64bit architectures):
2.2 MySQL database configuration guidelinesDefault values of some MySQL settings are likely to be suboptimal especially for large machines. In particular some parameters could improve the overall performance if carefully tuned.In this context one relevant parameter to be set is the innodb_buffer_poll_size which specifies the size of the buffer pool (the default value is 8MB).
The benefits obtained by using a proper value for this parameter are principally: an appreciable performance improving and the reduced amount of disk I/O needed for accessing the data in the tables. The optimal value depends on the amount of physical memory and the CPU architecture available in the host machine.
The maximum value depends on the CPU architecture, 32-bit or 64-bit. For 32-bit systems, the CPU architecture and operating system sometimes impose a lower practical maximum size. Larger this value is set, less disk I/O is needed to access data in tables. On a dedicated database server, it is possible to set this to up to 80% of the machine physical memory size. Scale back this value whether one of the following issues occur:
/etc/my.cnf , in particular within the [mysqld] section, it is suggested to customize the innodb_buffer_pool_size parameter taking into account how much physical memory is available.
Example:
[mysqld] innodb_buffer_pool_size=512MBAfter that, it's necessary to restart the mysql service for applying the change: /etc/init.d/mysqld restartFinally, the following sql command (root rights are needed) could be used for checking if the new value was applied successfully: SHOW VARIABLES like 'innodb_buffer_pool_size'; 2.3 MySQL database: How to resize Innodb log filesIf the following error occurs (see the mysql log file: /var/log/mysqld.log)InnoDB: ERROR: the age of the last checkpoint is , InnoDB: which exceeds the log group capacity . InnoDB: If you are using big BLOB or TEXT rows,you must set the InnoDB: combined size of log files at least 10 times bigger than the InnoDB: largest such row.then you must resize the innodb log files. Follow these steps:
SHOW VARIABLES like "innodb_log_file_size";
service mysqld stop
[mysqld] innodb_log_file_size=64M
mv /var/lib/mysql/ib_logfile* /tmp
service mysqld start
ls -lrth /var/lib/mysql/ib_logfile* 2.4 How to start the CREAM serviceA site admin can start the CREAM service just starting the CREAM container: For sl5_x86_64:/etc/init.d/tomcat5 startIn case the new BLAH blparser is used, this will also start it (if not already running). If for some reason it necessary to explicitly start the new BLAH blparser, the following command can be used: /etc/init.d/glite-ce-blah-parser startIf instead the old BLAH blparser is used, before starting tomcat it is necessary to start it on the BLPARSER_HOST using the command: /etc/init.d/glite-ce-blah-parser startTo stop the CREAM service, it is just necessary to stop the CREAM container. For sl5_x86_64: /etc/init.d/tomcat5 stop 2.5 DaemonsInformation about daemons running in the CREAM CE is available in TBC2.6 Init scriptsInformation about init scripts in the CREAM CE is available in the TBC2.7 Configuration filesInformation about configuration files in the CREAM CE is available in TBC2.8 Log filesInformation about log files in the CREAM CE is available in TBC2.9 Network portsInformation about ports used in the CREAM CE is available in TBC2.10 Cron jobsInformation about cron jobs used in the CREAM CE is available in TBC2.11 Security related operations2.11.1 How to enable a certain VO for a certain CREAM CE in ArgusLet's consider that a certain CREAM CE has been configured to use ARGUS as authorization system. Let's suppose that we chosehttp://pd.infn.it/cream-18 as the id of the CREAM CE (i.e. yaim variable CREAM_PEPC_RESOURCEID is http://pd.infn.it/cream-18 ).
On the ARGUS box (identified by the yaim variable ARGUS_PEPD_ENDPOINTS ) to enable the VO XYZ, it is necessary to define the following policy:
resource "http://pd.infn.it/cream-18" { obligation "http://glite.org/xacml/obligation/local-environment-map" {} action ".*" { rule permit { vo = "XYZ" } } } 2.11.2 Security recommendationsSecurity recommendations relevant for the CREAM CE is available in TBC2.11.3 How to block/ban a userInformation about how to ban users is available in TBC2.11.4 How to block/ban a VOTo ban a VO, it is suggested to reconfigure the service via yaim without that VO in thesiteinfo.def
2.11.5 How to define a CREAM administratorA CREAM administrator (aka super-user) can manage (e.g. cancel, check the status, etc.) also the jobs submitted by other people. Moreover he/she can issue some privileged operations, in particular the ones to disable the new job submissions (glite-ce-disable-submission ) and then to re-enable them ( glite-ce-disable-submission )
To define a CREAM CE administrator for a specific CREAM CE, the DN of this person must be specified in the /etc/grid-security/admin-list of this CREAM CE node, e.g.:
"/C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Massimo Sgaravatto"Please note that including the DN between " is important 2.12 Input and Output Sandbox files transfer between the CREAM CE and the WNThe input and output sandbox files (unless they have to be copied from/to remote servers) are copied between the CREAM CE node and the Worker Node. These files transfers can be done in two possible ways:
SANDBOX_TRANSFER_METHOD_BETWEEN_CE_WN . Possible values are:
2.13 Sharing of the CREAM sandbox area between the CREAM CE and the WNBesides the input/output sandbox files there are some other files that need to be transferred from/to the CREAM sandbox directory on the CE node to/from the Worker Node:
2.13.1 Sharing of the CREAM sandbox area between the CREAM CE and the WN for TorqueWhen Torque is used as batch system, to share the CREAM sandbox area between the CREAM CE node and the WNs:
$usecp <CE node>://var/cream_sandbox /cream_sandboxThis $usecp line means that every time Torque will have to copy a file from/t the cream_sandbox directory on the CE (which is the case during the stage in/stage out phase), it will have to use a cp from /cream_sandbox instead.
2.14 Self-limiting CREAM behaviorCREAM is able to protect itself if the load, memory usage, etc. is too high. This happens disabling new job submissions, while the other commands are still allowed. The whole stuff is implemented via a limiter script (/usr/bin/glite_cream_load_monitor ) very similar to the one used in the WMS.
Basically this limiter script check the values for some system and CREAM specific parameters, and compare them against some thresholds. If one or more threshold is exceeded, new job submissions get disabled. If a new submission is attempted when submissions are disabled, an error message is returned, e.g.:
TBC
2.15 How to drain a CREAM CEThe administrator of a CREAM CE can decide to drain a CREAM CE, that is disabling new job submissions while allowing the other commands. This can be useful for example because of scheduled shutdown of the CREAM CE. This can be achieved via theglite-ce-disable-submission command (provided by the CREAM CLI package installed on the UI), that can be issued only by a CREAM CE administrator, that is the DN of this person must be listed in the /etc/grid-security/admin-list file of the CE.
If newer job submissions are attempted, users will get an error message such as:
TBC
2.16 How to trace a specific jobTo trace a specific job, first of all get the CREAMjobid. If the job was submitted through the WMS, you can get its CREAMjobdid in the following way:glite-wms-job-logging-info -v 2 <gridjobdid> | grep "Dest jobid"If the job is not yours and you are not LB admin, you can get the CREAMjobid of that gridjobid if you have access to the CREAM logs doing: grep <gridjobid> /var/log/cream.glite-ce-cream.log*Grep the "last part" of the CREAMjobid in the CREAM log file (e.g. if the CREAMjobid is https://cream-07.pd.infn.it:8443/CREAM383606450 ![]() grep CREAM383606450 /var/log/cream/glite-ce-cream.log*This will return all the information relevant for this job 2.17 How to check if you are using the old or the new blparserIf you want to quickly check if you are using the old or the new BLAH Blparser, do agrep registry blah.config . If you see something like:
# grep registry blah.config job_registry=/var/blah/user_blah_job_registry.bjryou are using the new BLAH blparser. Otherwise you are using the old one. 2.18 Job purgingPurging a CREAM job means removing it from the CREAM database and removing from the CREAM CE any information relevant for that job (e.g. the job sandbox area). When a job has been purged, it is not possible to manage it anymore (e.g. it is not possible to check anymore its status). A job can be purged:
/etc/glite_wms.conf ) the attribute purge_jobs in the ICE section is set to false .
2.18.1 Automatic job purgingThe automatic CREAM job purger is responsible to purge old - forgotten jobs, according to a policy specified in the CREAM configuration file (/etc/glite-ce-cream/cream-config.xml ).
This policy is specified by the attribute JOB_PURGE_POLICY .
For example, if JOB_PURGE_POLICY is the following:
<parameter name="JOB_PURGE_POLICY" value="ABORTED 1 days; CANCELLED 2 days; DONE-OK 3 days; DONE-FAILED 4 days; REGISTERED 5 days;" />then the job purger will purge jobs which are:
2.18.2 Purging jobs in a non terminal statusThe (manual or automatic) purge operation can be issued only for jobs which are in a terminal status. If it is necessary to purge a job which has been terminated but which is for CREAM in a non terminal status (e.g. RUNNING, REALLY_RUNNING) because of some bugs/problems/..., a specific utility (JobAdminPurger ) provided with the glite-ce-cream package can be used.
JobAdminPurger allows to purge jobs based on their CREAM jobids and/or their status (considering how long the job is in that status).
TBC
2.19 Proxy purgingExpired delegation proxies are automatically purged:
/etc/glite-ce-cream/cream-config.xml ) there is a property called delegation_purge_rate which defines how often the proxy purger is run. The default value is 720 (720 minutes, that is 12 hours).
If the value is changed, it is then necessary to restart tomcat.
Setting that value to -1 means disabling the proxy purger.
2.20 Job wrapper management2.20.1 Customization pointsThe CREAM JobWrapper running on the WN execute some scripts (to be provided by the local administrators) if they exist. These are calledcustomization points .
There are 3 customization points:
2.20.2 Customization of the CREAM Job wrapperTo customize the CREAM job wrapper it is just necessary to edit as appropriate the template file/etc/glite-ce-cream/jobwrapper.tpl .
When done, tomcat must be restarted.
2.20.3 Customization of the Input/Output Sandbox file transfersThe CREAM job wrapper, besides running the user payload, is also responsible for other operations, such as the transfer of the input and output sandbox files from/to remote gridftp servers. If in such transfers there is a failure, the operation is retried after a while. The sleep time between the first attempt and the second one is the “initial wait time” specified in the CREAM configuration file. In every next attempt the sleep time is doubled. In the CREAM configuration file (/etc/glite-ce-cream/cream-config.xml ) it is possible to set:
<parameter name="JOB_WRAPPER_COPY_RETRY_COUNT_ISB" value="2" /> <parameter name="JOB_WRAPPER_COPY_RETRY_FIRST_WAIT_ISB" value="60" /> <!-- sec. --> <parameter name="JOB_WRAPPER_COPY_RETRY_COUNT_OSB" value="6" /> <parameter name="JOB_WRAPPER_COPY_RETRY_FIRST_WAIT_OSB" value="300" /> <!-- sec. -->If one or more of these values are changed, it is then necessary to restart tomcat. 2.21 Managing the forwarding of requirements to the batch systemThe CREAM CE allows to forward, via tha BLAH component, requirements to the batch system. From a site administrator point of view, this requires creating and properly filling some scripts (/usr/bin/xxx_local_submit_attributes.sh ).
The relevant documentation is available at TBC
2.22 Querying the CREAM Database2.22.1 Check how many jobs are stored in the CREAM databaseThe following mysql query can be used to check how many jobs (along with their status) are reported in the CREAM database:mysql> select jstd.name, count(*) from job, job_status_type_description jstd, job_status AS status LEFT OUTER JOIN job_status AS latest ON latest.jobId=status.jobId AND status.id < latest.id WHERE latest.id IS null and job.id=status.jobId and jstd.type=status.type group by jstd.name; | ||||||||
-- MassimoSgaravatto - 2011-12-20 |