Glue2 support in CREAM
CREAM CE provided with EMI-1 provides some initial support for glue2, that needs to be finalized
1 Introduction
The Glue 2.0 specification document is available
here.
The Glue 2.0 specification can also be found
here.
The GLUE v. 2.0 – Reference Implementation of an LDAP Schema document can be found
here.
Glue 2.0 schema in SVN
2 Target scenario
2.1 Objectclasses
This section reports the most significant information concerning the implementation of the Glue 2 objectclasses wrt the CREAM CE
-
GLUE2ServiceID
is the value ComputingServiceId
in /etc/glite-ce-glue2/glite-ce-glue2.conf
. This value is the one specified by the yaim variable COMPUTING_SERVICE_ID
, if specified (this variable is mandatory in cluster mode, see below). Otherwise it is ${CE_HOST}_ComputingElement
-
GLUE2ServiceCapability
is executionmanagement.jobexecution
-
GLUE2ServiceType
is org.glite.ce.CREAM
-
GLUE2ServiceAdminDomainForeignKey
is the value of SiteId
in /etc/glite-ce-glue2/glite-ce-glue2.conf
. This value is the one specified by the yaim variable SITE_NAME
TBD
A ComputingShare corresponds to a Glue1 VOView.
If needed, besides the VOViews we will also represent batch system queues as ComputingShares (this will have some impact on the WMS matchmaker)
TBC
TBD
2.1.5 Benchmark
TBD
TBD
TBD
TBD
We don't implement the ComputingActivity objectclass, since we don't publish information regarding jobs
TBD
2.2 CREAM CE in no cluster mode
We assume that a CREAM CE is configured in cluster mode if that is the only CREAM CE available in the site. I.e. sites with multiple CREAM CEs (submitting to the same batch system) should
always have a cluster node and therefore be configured in cluster mode.
If a CREAM CE is configured in no cluster mode, all the Glue2 object classes are published by the resource BDII running on the CREAM CE.
These objectclasses are:
- ComputingService (done)
- ComputingEndPoint (done)
- AccessPolicy (done)
- ComputingManager (done)
- ComputingShare (done)
- MappingPolicy (done)
- ExecutionEnvironment (done)
- Benchmark (done)
- ToStorageService (done)
- ApplicationEnvironment (todo)
- EndPoint for RTEPublihser (done)
- "Child" of ComputingService
- EndPoint for CEMon (done)
- "Child" of ComputingService
- Published only if CEMon is deployed
The ComputingServiceId is the one specified by the yaim variable
COMPUTING_SERVICE_ID
(if specified). Otherwise it is
${CE_HOST}_ComputingElement
..
The EndpointId! is hostname + "_org.glite.ce.CREAM".
2.3 CREAM CE in cluster mode
Sites with multiple CREAM CEs (submitting to the same batch system) should
always have a cluster node and therefore be configured in cluster mode.
If a CREAM CE is configured in cluster mode:
- The resource BDII running on the CREAM CE publishes just the following objectclasses:
- ComputingEndpoint (done)
- AccessPolicy (done)
- All the other objectclasses are published by the resource BDII running on the gLite-CLUSTER node:
- ComputingService (done)
- ComputingManager (done)
- ComputingShare (todo)
- MappingPolicy (todo)
- ExecutionEnvironment (done)
- Benchmark (done)
- ToStorageService (done)
- ApplicationEnvironment (todo)
- EndPoint for RTEPublihser (done)
- "Child" of ComputingService
- The ServiceId is the the one specified by the yaim variable
COMPUTING_SERVICE_ID
which is such scenario is mandatory. This variable must have the same value in all the relevant nodes (in the cluster node and in all the CREAM CEs)
The EndpointId is
hostname + "_org.glite.ce.CREAM"
.
3 Batch system dynamic information
3.1 Torque/PBS
The PBS dynamic plugin for Glue1 publishes for each batch system queue something like:
dn: GlueCEUniqueID=cream-38.pd.infn.it:8443/cream-pbs-creamtest1,mds-vo-name=resource,o=grid
GlueCEInfoLRMSVersion: 2.5.7
GlueCEInfoTotalCPUs: 5
GlueCEPolicyAssignedJobSlots: 5
GlueCEStateFreeCPUs: 5
GlueCEPolicyMaxCPUTime: 2880
GlueCEPolicyMaxWallClockTime: 4320
GlueCEStateStatus: Production
The generic dynamic scheduler plugin, for Glue1 publishes for each VOView something like:
dn: GlueVOViewLocalID=alice,GlueCEUniqueID=cream-38.pd.infn.it:8443/cream-pbs-creamtest1,mds-vo-name=resource,o=grid
GlueVOViewLocalID: alice
GlueCEStateRunningJobs: 0
GlueCEStateWaitingJobs: 0
GlueCEStateTotalJobs: 0
GlueCEStateFreeJobSlots: 5
GlueCEStateEstimatedResponseTime: 0
GlueCEStateWorstResponseTime: 0
and for each queue publishes something like:
dn: GlueCEUniqueID=cream-38.pd.infn.it:8443/cream-pbs-creamtest1,mds-vo-name=resource,o=grid
GlueCEStateFreeJobSlots: 5
GlueCEStateFreeCPUs: 5
GlueCEStateRunningJobs: 0
GlueCEStateWaitingJobs: 0
GlueCEStateTotalJobs: 0
GlueCEStateEstimatedResponseTime: 0
GlueCEStateWorstResponseTime: 0
3.2 LSF
The LSF dynamic plugin for Glue1 publishes for each batch system queue something like:
dn: GlueCEUniqueID=cream-29.pd.infn.it:8443/cream-lsf-creamcert2,mds-vo-name=resource,o=grid
GlueCEInfoLRMSVersion: 7 Update 5
GlueCEInfoTotalCPUs: 216
GlueCEPolicyAssignedJobSlots: 216
GlueCEPolicyMaxRunningJobs: 216
GlueCEPolicyMaxCPUTime: 9999999999
GlueCEPolicyMaxWallClockTime: 9999999999
GlueCEPolicyPriority: -20
GlueCEStateFreeCPUs: 6
GlueCEStateFreeJobSlots: 216
GlueCEStateStatus: Production
The generic dynamic scheduler plugin, for Glue1 publishes for each VOView something like:
dn: GlueVOViewLocalID=alice,GlueCEUniqueID=cream-29.pd.infn.it:8443/cream-lsf-creamcert1,mds-vo-name=resource,o=grid
GlueVOViewLocalID: alice
GlueCEStateRunningJobs: 0
GlueCEStateWaitingJobs: 0
GlueCEStateTotalJobs: 0
GlueCEStateFreeJobSlots: 216
GlueCEStateEstimatedResponseTime: 0
GlueCEStateWorstResponseTime: 0
and for each queue publishes something like:
dn: GlueCEUniqueID=cream-29.pd.infn.it:8443/cream-lsf-creamcert1,mds-vo-name=resource,o=grid
GlueCEStateFreeJobSlots: 216
GlueCEStateFreeCPUs: 216
GlueCEStateRunningJobs: 0
GlueCEStateWaitingJobs: 0
GlueCEStateTotalJobs: 0
GlueCEStateEstimatedResponseTime: 0
GlueCEStateWorstResponseTime: 0
3.3 SGE
The SGE dynamic plugin for Glue1 publishes for each batch system queue something like:
dn: GlueCEUniqueID=sa3-ce.egee.cesga.es:8443/cream-sge-ops,mds-vo-name=resource,o=grid
GlueCEInfoLRMSVersion: 6.1u3
GlueCEPolicyAssignedJobSlots: 1
GlueCEPolicyMaxRunningJobs: 1
GlueCEInfoTotalCPUs: 1
GlueCEStateFreeJobSlots: 1
GlueCEStateFreeCPUs: 1
GlueCEPolicyMaxCPUTime: 4320
GlueCEPolicyMaxWallClockTime: 9000
GlueCEStateStatus: Production
The generic dynamic scheduler plugin, for Glue1 publishes for each VOView something like:
xyz
and for each queue publishes something like:
xyz
TBC
3.4 Work to be done to support Glue2 publication
3.4.1 Work to be done in the PBS/Torque information provider
- The value published in Glue1 as
GlueCEInfoLRMSVersion
should be published in Glue2 as GLUE2ManagerProductVersion
(ComputingManager
objectclass)
- The value published in Glue1 as
GlueCEPolicyMaxCPUTime
should be published in Glue2 as GLUE2ComputingShareMaxCPUTime
(ComputingShare
objectclass)
- For all the ComputingShares referring to that batch system queue
- The value published in Glue1 as
GlueCEPolicyMaxWallClockTime
should be published in Glue2 as GLUE2ComputingShareMaxWallTime
(ComputingShare
objectclass) * For all the ComputingShares referring to that batch system queue
- The value published in Glue1 as
GlueCEStateStatus
should be published in Glue2 as GLUE2ComputingShareServingState
(ComputingShare
objectclass)
- For all the ComputingShares referring to that batch system queue
3.4.2 Work to be done in the LSF information provider
- The value published in Glue1 as
GlueCEInfoLRMSVersion
should be published in Glue2 as GLUE2ManagerProductVersion
(ComputingManager
objectclass)
- The value published in Glue1 as
GlueCEPolicyMaxCPUTime
should be published in Glue2 as GLUE2ComputingShareMaxCPUTime
(ComputingShare
objectclass)
- For all the ComputingShares referring to that batch system queue
- The value published in Glue1 as
GlueCEPolicyMaxWallClockTime
should be published in Glue2 as GLUE2ComputingShareMaxWallTime
(ComputingShare
objectclass)
- For all the ComputingShares referring to that batch system queue
- The value published in Glue1 as
GlueCEPolicyMaxRunningJobs
should be published in Glue2 as GLUE2ComputingShareMaxRunningJobs
(ComputingShare
objectclass)
- For all the ComputingShares referring to that batch system queue
- The value published in Glue1 as
GlueCEStateStatus
should be published in Glue2 as GLUE2ComputingShareServingState
(ComputingShare
objectclass)
- For all the ComputingShares referring to that batch system queue
3.4.3 Work to be done in the generic dynamic scheduler
- The value published in Glue1 as
GlueCEStateRunningJobs
for the VOView objectclass should be published in Glue2 as GLUE2ComputingShareRunningJobs
(ComputingShare
objectclass)
- The value published in Glue1 as
GlueCEStateWaitingJobs
for the VOView objectclass should be published in Glue2 as GLUE2ComputingShareWaitingJobs
(ComputingShare
objectclass)
- The value published in Glue1 as
GlueCEStateTotalJobs
for the VOView objectclass should be published in Glue2 as GLUE2ComputingShareTotalJobs
(ComputingShare
objectclass)
- The value published in Glue1 as
GlueCEStateFreeJobSlots
for the VOView objectclass should be published in Glue2 as GLUE2ComputingShareFreeSlots
(ComputingShare
objectclass)
- The value published in Glue1 as
GlueCEStateEstimatedResponseTime
for the VOView should be published in Glue2 as GLUE2ComputingShareEstimatedAverageWaitingTime
(ComputingShare
objectclass)
- The value published in Glue1 as
GlueCEStateWorstResponseTime
for the VOView should be published in Glue2 as GLUE2ComputingShareEstimatedWorstWaitingTime
(ComputingShare
objectclass)
3.4.4 Work to be done in the SGE information provider
TBD
4 Relevant RFCs
5 Testbed
The following machines are being used for testing
- cream-38.pd.infn.it (Torque)
- cream-29.pd.infn.it (LSF)
--
MassimoSgaravatto - 2011-06-21