Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
| ||||||||
Line: 124 to 124 | ||||||||
JOB Submission: | ||||||||
Added: | ||||||||
> > | First Test (simple job): | |||||||
From an EMI UI creating proxy and submission:
-bash-3.2$ glite-ce-job-submit -r cert-09.pd.infn.it:8443/cream-pbs-cert -a testCream.jdl | ||||||||
Line: 147 to 148 | ||||||||
**** JobID=[https://cert-09.pd.infn.it:8443/CREAM043342708] Status = [DONE-OK] ExitCode = [0] | ||||||||
Added: | ||||||||
> > |
First Test : PASSED | |||||||
Changed: | ||||||||
< < | ||||||||
> > | Second Test (simple job MPI with 2 core): | |||||||
-bash-3.2$ glite-ce-job-submit -r cert-09.pd.infn.it:8443/cream-pbs-cert -a mpi-start-wrapper_Cream.jdl
https://cert-09.pd.infn.it:8443/CREAM986827158![]() ![]() | ||||||||
Line: 261 to 263 | ||||||||
(1329228578) | ||||||||
Changed: | ||||||||
< < | Test submission : PASSED | |||||||
> > | Second Test : PASSED | |||||||
Added: | ||||||||
> > | Third Test: (MPI 4 core required) | |||||||
Test Submission with 4 core required:
| ||||||||
Line: 277 to 280 | ||||||||
ExitCode = [] FailureReason = [BLAH error: submission command failed (exit code = 1) (stdout:) (stderr:qsub: Job exceeds queue resource limits MSG=cannot locate feasible nodes-) N/A (jobId = CREAM888211702)] | ||||||||
Changed: | ||||||||
< < | Test submission : NOT PASSED | |||||||
> > | Third Test : NOT PASSED Third Test (bis): (MPI 4 core required)Execute again the third test submitting a jobs using 4 cores:Modified one YAIM variable in services/glite-mpi_ce file : MPI_SUBMIT_FILTER=${MPI_SUBMIT_FILTER:-"yes"}and reconfigure the CE -bash-3.2$ glite-ce-job-submit -r cert-09.pd.infn.it:8443/cream-pbs-cert -a mpi-start-wrapper_Cream.jdl https://cert-09.pd.infn.it:8443/CREAM115768488 -bash-3.2$ glite-ce-job-status https://cert-09.pd.infn.it:8443/CREAM115768488 ****** JobID=[https://cert-09.pd.infn.it:8443/CREAM115768488] Status = [IDLE] -bash-3.2$ glite-ce-job-status https://cert-09.pd.infn.it:8443/CREAM115768488 ****** JobID=[https://cert-09.pd.infn.it:8443/CREAM115768488] Status = [RUNNING] -bash-3.2$ glite-ce-job-status https://cert-09.pd.infn.it:8443/CREAM115768488 ****** JobID=[https://cert-09.pd.infn.it:8443/CREAM115768488] Status = [REALLY-RUNNING] -bash-3.2$ glite-ce-job-status https://cert-09.pd.infn.it:8443/CREAM115768488 ****** JobID=[https://cert-09.pd.infn.it:8443/CREAM115768488] Status = [DONE-OK] ExitCode = [0] -bash-3.2$ glite-ce-job-status -L 2 https://cert-09.pd.infn.it:8443/CREAM115768488 ****** JobID=[https://cert-09.pd.infn.it:8443/CREAM115768488] Current Status = [DONE-OK] Working Dir = [[reserved]] ExitCode = [0] Grid JobID = [N/A] LRMS Abs JobID = [[reserved]] LRMS JobID = [[reserved]] Deleg Proxy ID = [e1444f9cc9df997f65b2b6d247d1dc582814c451] DelegProxyInfo = [Valid From : 2/15/12 2:12 PM (GMT) Valid To : 2/15/12 9:26 PM (GMT) Holder Subject : /C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Sergio Traldi Holder CA : /C=IT/O=INFN/CN=INFN CA VO : dteam AC Issuer : CN=voms2.hellasgrid.gr,OU=hellasgrid.gr,O=HellasGrid,C=GR Attribute : /dteam/Role=NULL/Capability=NULL /dteam/NGI_IT/Role=NULL/Capability=NULL ] Worker Node = [cert-wn64-08.pn.pd.infn.it] Local User = [dteam009] CREAM ISB URI = [gsiftp://cert-09.pd.infn.it/var/cream_sandbox/dteam/_C_IT_O_INFN_OU_Personal_Certificate_L_Padova_CN_Sergio_Traldi_dteam_Role_NULL_Capability_NULL_dteam009/11/CREAM115768488/ISB] CREAM OSB URI = [gsiftp://cert-09.pd.infn.it/var/cream_sandbox/dteam/_C_IT_O_INFN_OU_Personal_Certificate_L_Padova_CN_Sergio_Traldi_dteam_Role_NULL_Capability_NULL_dteam009/11/CREAM115768488/OSB] JDL = [[ Arguments = "name_mpi OPENMPI"; QueueName = "cert"; JobType = "Normal"; Executable = "mpi-start-wrapper.sh"; VirtualOrganisation = "dteam"; InputSandbox = { "/home/traldi/JOB_MPI/mpi-start-wrapper.sh","/home/traldi/JOB_MPI/mpi-hooks.sh","/home/traldi/JOB_MPI/name_mpi.c" }; CPUNumber = 4; StdOutput = "std.out"; Type = "Job"; OutputSandboxBaseDestUri = "gsiftp://prod-se-01.pd.infn.it/tmp"; StdError = "std.err"; BatchSystem = "pbs"; OutputSandbox = { "std.err","std.out" } ]] Type = [Normal] Job status changes: ------------------- Status = [REGISTERED] - [Wed 15 Feb 2012 15:18:00] (1329315480) Status = [PENDING] - [Wed 15 Feb 2012 15:18:04] (1329315484) Status = [IDLE] - [Wed 15 Feb 2012 15:18:04] (1329315484) Status = [RUNNING] - [Wed 15 Feb 2012 15:18:11] (1329315491) Status = [REALLY-RUNNING] - [Wed 15 Feb 2012 15:18:14] (1329315494) Status = [DONE-OK] - [Wed 15 Feb 2012 15:23:24] (1329315804) Issued Commands: ------------------- *** Command Name = [JOB_REGISTER] Command Category = [JOB_MANAGEMENT] Command Status = [SUCCESSFULL] Creation Time = [Wed 15 Feb 2012 15:17:59] (1329315479) Start Scheduling Time = [Wed 15 Feb 2012 15:17:59] (1329315479) Start Processing Time = [Wed 15 Feb 2012 15:17:59] (1329315479) Execution Completed Time = [Wed 15 Feb 2012 15:18:01] (1329315481) *** Command Name = [JOB_START] Command Category = [JOB_MANAGEMENT] Command Status = [SUCCESSFULL] Creation Time = [Wed 15 Feb 2012 15:18:04] (1329315484) Start Scheduling Time = [Wed 15 Feb 2012 15:18:04] (1329315484) Start Processing Time = [Wed 15 Feb 2012 15:18:04] (1329315484) Execution Completed Time = [Wed 15 Feb 2012 15:18:11] (1329315491)Inside CE: [root@cert-09 ~]# qstat -n cert-09.pd.infn.it: Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time -------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - ----- 5.cert-09.pd.inf dteam009 cert cream_115768488 29356 2 4 -- -- R 00:02 cert-wn64-08+cert-wn64-08+cert-wn64-07+cert-wn64-07Third Test (bis) : PASSED | |||||||
-- SergioTraldi - 2012-02-15 |
Line: 1 to 1 | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Added: | |||||||||||||||
> > |
EMI-CREAM-Torque using last Torque (2.5.7-7) and Maui 3.3-4 installation with MPI (CE and WN)On CE Host (also BATCH Master PBS/Torque)Here some steps to install the last Cream CE using the Torque Staged-Rollout release in ig/gLite distribution.INSTALLATION:Repository settings:cd /etc/yum.repos.d/ mv dag.repo dag.repo.orig wget http://repo-pd.italiangrid.it/mrepo/repos/egi-trustanchors.repo wget http://repo-pd.italiangrid.it/mrepo/repos/igi/sl5/x86_64/igi-cert-emi.repo cd /root/ wget http://download.fedoraproject.org/pub/epel/5/i386/epel-release-5-4.noarch.rpm wget http://emisoft.web.cern.ch/emisoft/dist/EMI/1/sl5/x86_64/updates/emi-release-1.0.1-1.sl5.noarch.rpm Packages installation (CA, epel, emi, cream, torque,...):yum install ca-policy-egi-core yum localinstall *.rpm yum install xml-commons-apis yum install emi-cream-ce yum install emi-torque-server emi-torque-utils yum install mpi-start glite-yaim-mpi Munge configuration/usr/sbin/create-munge-key service munge start chkconfig munge on Starting PBS/etc/init.d/pbs_server start File LOGInstallation File on CE - cert-09.pd.infn.it Work LogCONFIGURATION:/opt/glite/yaim/bin/yaim -c -d 6 -s /usr/local/nfs/cert-3_2/rtc_mpi/rtc-site-info.def -n MPI_CE -n creamCE -n TORQUE_server -n TORQUE_utils 2>&1 | tee /root/conf_EMI_CREAM_Torque_MPI.`hostname -s`.`date +%Y%m%d-%H%M%S`.log SSH Customization:Modify the file /etc/ssh/sshd_config as the example attached hereModify the file /etc/ssh/shosts.equiv as the example attached here service sshd restart File LOG:Yaim Configuration File on CE cert-09.pd.infn.itOn WN Hosts:INSTALLATION:Repository settings:cd /etc/yum.repos.d/ mv dag.repo dag.repo.orig wget http://repo-pd.italiangrid.it/mrepo/repos/egi-trustanchors.repo wget http://repo-pd.italiangrid.it/mrepo/repos/igi/sl5/x86_64/igi-cert-emi.repo cd /root/ wget http://download.fedoraproject.org/pub/epel/5/i386/epel-release-5-4.noarch.rpm wget http://emisoft.web.cern.ch/emisoft/dist/EMI/1/sl5/x86_64/updates/emi-release-1.0.1-1.sl5.noarch.rpm Packages installation (CA, epel, emi, cream, torque,...):yum install ca-policy-egi-core yum localinstall *.rpm yum install igi-wn_torque_noafs yum install mpi-start glite-yaim-mpi yum install openmpi openmpi-devel mpich2 Munge configuration:scp cert-09:/etc/munge/munge.key /etc/munge/ chown munge.munge /etc/munge/munge.key service munge start File LOG:Installation File on WN - cert-wn64-08.pn.pd.infn.it Work LogCONFIGURATION:/opt/glite/yaim/bin/yaim -c -d 6 -s /usr/local/nfs/cert-3_2/rtc_mpi/rtc-site-info.def -n MPI_WN -n WN_torque_noafs 2>&1 | tee /root/conf_WN_Torque_MPI.`hostname -s`.`date +%Y%m%d-%H%M%S`.log SSH Customization:Modify the file /etc/ssh/sshd_config as the example attached hereModify the file /etc/ssh/shosts.equiv as the example attached here service sshd restart File LOG:Yaim Configuration File on WN cert-wn64-08.pn.pd.infn.itTESTING:JOB Submission:From an EMI UI creating proxy and submission:-bash-3.2$ glite-ce-job-submit -r cert-09.pd.infn.it:8443/cream-pbs-cert -a testCream.jdl https://cert-09.pd.infn.it:8443/CREAM043342708 -bash-3.2$ glite-ce-job-status https://cert-09.pd.infn.it:8443/CREAM043342708 ****** JobID=[https://cert-09.pd.infn.it:8443/CREAM043342708] Status = [RUNNING] -bash-3.2$ glite-ce-job-status https://cert-09.pd.infn.it:8443/CREAM043342708 ****** JobID=[https://cert-09.pd.infn.it:8443/CREAM043342708] Status = [REALLY-RUNNING] -bash-3.2$ glite-ce-job-status https://cert-09.pd.infn.it:8443/CREAM043342708 ****** JobID=[https://cert-09.pd.infn.it:8443/CREAM043342708] Status = [DONE-OK] ExitCode = [0] -bash-3.2$ glite-ce-job-submit -r cert-09.pd.infn.it:8443/cream-pbs-cert -a mpi-start-wrapper_Cream.jdl https://cert-09.pd.infn.it:8443/CREAM986827158 -bash-3.2$ glite-ce-job-status https://cert-09.pd.infn.it:8443/CREAM986827158 ****** JobID=[https://cert-09.pd.infn.it:8443/CREAM986827158] Status = [IDLE] -bash-3.2$ glite-ce-job-status https://cert-09.pd.infn.it:8443/CREAM986827158 ****** JobID=[https://cert-09.pd.infn.it:8443/CREAM986827158] Status = [RUNNING] -bash-3.2$ glite-ce-job-status https://cert-09.pd.infn.it:8443/CREAM986827158 ****** JobID=[https://cert-09.pd.infn.it:8443/CREAM986827158] Status = [REALLY-RUNNING] -bash-3.2$ glite-ce-job-status https://cert-09.pd.infn.it:8443/CREAM986827158 ****** JobID=[https://cert-09.pd.infn.it:8443/CREAM986827158] Status = [REALLY-RUNNING] -bash-3.2$ glite-ce-job-status https://cert-09.pd.infn.it:8443/CREAM986827158 ****** JobID=[https://cert-09.pd.infn.it:8443/CREAM986827158] Status = [DONE-OK] ExitCode = [1] -bash-3.2$ glite-ce-job-status -L 2 https://cert-09.pd.infn.it:8443/CREAM986827158 ****** JobID=[https://cert-09.pd.infn.it:8443/CREAM986827158] Current Status = [DONE-OK] Working Dir = [[reserved]] ExitCode = [1] Grid JobID = [N/A] LRMS Abs JobID = [[reserved]] LRMS JobID = [[reserved]] Deleg Proxy ID = [613f3558926a0ba8642c51cfcaeb019b88a5b791] DelegProxyInfo = [Valid From : 2/14/12 2:04 PM (GMT) Valid To : 2/14/12 11:14 PM (GMT) Holder Subject : /C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Sergio Traldi Holder CA : /C=IT/O=INFN/CN=INFN CA VO : dteam AC Issuer : CN=voms2.hellasgrid.gr,OU=hellasgrid.gr,O=HellasGrid,C=GR Attribute : /dteam/Role=NULL/Capability=NULL /dteam/NGI_IT/Role=NULL/Capability=NULL ] Worker Node = [cert-wn64-08.pn.pd.infn.it] Local User = [dteam009] CREAM ISB URI = [gsiftp://cert-09.pd.infn.it/var/cream_sandbox/dteam/_C_IT_O_INFN_OU_Personal_Certificate_L_Padova_CN_Sergio_Traldi_dteam_Role_NULL_Capability_NULL_dteam009/98/CREAM986827158/ISB] CREAM OSB URI = [gsiftp://cert-09.pd.infn.it/var/cream_sandbox/dteam/_C_IT_O_INFN_OU_Personal_Certificate_L_Padova_CN_Sergio_Traldi_dteam_Role_NULL_Capability_NULL_dteam009/98/CREAM986827158/OSB] JDL = [[ Arguments = "name_mpi OPENMPI"; QueueName = "cert"; JobType = "Normal"; Executable = "mpi-start-wrapper.sh"; VirtualOrganisation = "dteam"; InputSandbox = { "/home/traldi/JOB_MPI/mpi-start-wrapper.sh","/home/traldi/JOB_MPI/mpi-hooks.sh","/home/traldi/JOB_MPI/name_mpi.c" }; CPUNumber = 2; StdOutput = "std.out"; Type = "Job"; OutputSandboxBaseDestUri = "gsiftp://prod-se-01.pd.infn.it/tmp"; StdError = "std.err"; BatchSystem = "pbs"; OutputSandbox = { "std.err","std.out" } ]] Type = [Normal] Job status changes: ------------------- Status = [REGISTERED] - [Tue 14 Feb 2012 15:09:29] (1329228569) Status = [PENDING] - [Tue 14 Feb 2012 15:09:31] (1329228571) Status = [IDLE] - [Tue 14 Feb 2012 15:09:31] (1329228571) Status = [RUNNING] - [Tue 14 Feb 2012 15:09:36] (1329228576) Status = [REALLY-RUNNING] - [Tue 14 Feb 2012 15:09:40] (1329228580) Status = [DONE-OK] - [Tue 14 Feb 2012 15:09:43] (1329228583) Issued Commands: ------------------- *** Command Name = [JOB_REGISTER] Command Category = [JOB_MANAGEMENT] Command Status = [SUCCESSFULL] Creation Time = [Tue 14 Feb 2012 15:09:29] (1329228569) Start Scheduling Time = [Tue 14 Feb 2012 15:09:29] (1329228569) Start Processing Time = [Tue 14 Feb 2012 15:09:29] (1329228569) Execution Completed Time = [Tue 14 Feb 2012 15:09:29] (1329228569) *** Command Name = [JOB_START] Command Category = [JOB_MANAGEMENT] Command Status = [SUCCESSFULL] Creation Time = [Tue 14 Feb 2012 15:09:31] (1329228571) Start Scheduling Time = [Tue 14 Feb 2012 15:09:31] (1329228571) Start Processing Time = [Tue 14 Feb 2012 15:09:31] (1329228571) Execution Completed Time = [Tue 14 Feb 2012 15:09:38] (1329228578)Test submission : PASSED Test Submission with 4 core required: -bash-3.2$ glite-ce-job-submit -r cert-09.pd.infn.it:8443/cream-pbs-cert -a mpi-start-wrapper_Cream.jdl https://cert-09.pd.infn.it:8443/CREAM888211702 -bash-3.2$ glite-ce-job-status https://cert-09.pd.infn.it:8443/CREAM888211702 ****** JobID=[https://cert-09.pd.infn.it:8443/CREAM888211702] Status = [ABORTED] ExitCode = [] FailureReason = [BLAH error: submission command failed (exit code = 1) (stdout:) (stderr:qsub: Job exceeds queue resource limits MSG=cannot locate feasible nodes-) N/A (jobId = CREAM888211702)]Test submission : NOT PASSED -- SergioTraldi - 2012-02-15
|