Tags:
, view all tags

Glue2 support in CREAM

CREAM CE provided with EMI-1 provides some initial support for glue2, that needs to be finalized

1 Introduction

The Glue 2.0 specification document is available here.

The Glue 2.0 specification can also be found here.

The GLUE v. 2.0 – Reference Implementation of an LDAP Schema document can be found here.

Glue 2.0 schema in SVN

2 Target scenario

2.1 Objectclasses

This section reports the most significant information concerning the implementation of the Glue 2 objectclasses wrt the CREAM CE

2.1.1 ComputingService

  • GLUE2ServiceID is the value ComputingServiceId in /etc/glite-ce-glue2/glite-ce-glue2.conf. This value is the one specified by the yaim variable COMPUTING_SERVICE_ID, if specified (this variable is mandatory in cluster mode, see below). Otherwise it is ${CE_HOST}_ComputingElement
  • GLUE2EntityCreationTime is the timestamp when the ldif file was created
  • GLUE2EntityName is "Computing Service on <host>"
  • GLUE2EntityOtherInfo includes information concerning the info provider
  • GLUE2ServiceCapability is executionmanagement.jobexecution
  • GLUE2ServiceType is org.glite.ce.CREAM
  • GLUE2ServiceQualityLevel is production
  • GLUE2ServiceComplexity indicates the number of endpoints (2 or 3: one for CREAM, one for the RTEPublisher and one for CEMon, if deployed), the number of shares, and the number of resources (i.e. the number of Execution Environments)
  • GLUE2ServiceAdminDomainForeignKey is the value of SiteId in /etc/glite-ce-glue2/glite-ce-glue2.conf. This value is the one specified by the yaim variable SITE_NAME

2.1.2 ComputingEndpoint

  • GLUE2EndpointID is <hostname> + "_org.glite.ce.CREAM"
  • GLUE2EntityCreationTime is the timestamp when the ldif file was created
  • GLUE2EntityName is the EndPointId
  • GLUE2EntityOtherInfo includes the host DN, the EMI middleware version, and information concerning the info provider
  • GLUE2EndpointURL is the endpoint URL of the CREAM CE, that is: "https://" + <host> + "8443/ce-cream/services"
  • GLUE2EndpointCapability is executionmanagement.jobexecution
  • GLUE2EndpointTechnology is webservice
  • GLUE2EndpointInterfaceName is org.glite.ce.CREAM
  • GLUE2EndpointInterfaceVersion is read from the CREAM configuration file (attribute cream_interface_version)
  • GLUE2EndpointWSDL is got from the service itself, i.e. "https://" + <host> + ":8443/ce-cream/services/CREAM2?wsdl";
  • GLUE2EndpointSupportedProfile is http://www.ws-i.org/Profiles/BasicProfile-1.0.html
  • GLUE2EndpointSemantics is the link to the CREAM user guide
  • GLUE2EndpointImplementor is gLite
  • GLUE2EndpointImplementationName is CREAM
  • GLUE2EndpointImplementationVersion is read from the CREAM configuration file (attribute cream_service_version)
  • GLUE2EndpointQualityLevel=s is =production
  • GLUE2EndpointHealthState is unknown in the ldif static file; it is overwritten by the glite-ce-glue2-endpoint-dynamic plugin (which check the glite-info-service-test output and the status of the tomcat service)
  • GLUE2EndpointHealthStateInfo is N/A in the ldif static file; it is overwritten by the glite-ce-glue2-endpoint-dynamic plugin (which check the glite-info-service-test output)
  • GLUE2EndpointServingState
    • Now is a static value, read from /etc/glite-ce-glue2/glite-ce-glue2.conf (attribute ServingState). This value is the one specified by the yaim variable CREAM_CE_STATE
    • Target scenario: is a value provided by a dynamic plugin; it is checked if submissions are disabled (by the limiter, or explicitly by the admin). If so, it is published draining (see also http://savannah.cern.ch/bugs/?69854). Otherwise the value is read from /etc/glite-ce-glue2/glite-ce-glue2.conf (attribute ServingState), which is initially filled by yaim considering the yaim variable CREAM_CE_STATE. The admin can edit this file if he wants to publish a specific value without reconfiguring.
  • GLUE2EndpointIssuerCA is found with a "openssl x509 -issuer -noout -in" on the host certificate
  • GLUE2EndpointTrustedCA is IGTF
  • GLUE2EndpointDownTimeInfo is "See the GOC DB for downtimes: https://goc.egi.eu"
  • GLUE2ComputingEndpointStaging is staginginout
  • GLUE2ComputingEndpointJobDescription is glite:jdl
Todo: start time ?

2.1.2.1 Policy for the Computing EndPoint

For each ComputingEndpoint, there are as many as policies as the policy rules that must be defined

  • GLUE2EntityCreationTime is the timestamp when the ldif file was created
  • "GLUE2EntityName is "Access control rules for Endpoint EndPointId"
  • GLUE2EntityOtherInfo includes information concerning the info provider
  • GLUE2PolicyScheme is org.glite.standard
  • GLUE2PolicyRule is an element of ACBR in /etc/glite-ce-glue2/glite-ce-glue2.conf (e.g. VO:cms). It is ALL if there are no policies
  • GLUE2PolicyUserDomainForeignKey is an element of Owner in /etc/glite-ce-glue2/glite-ce-glue2.conf (e.g. cms)

2.1.3 ComputingShare

A ComputingShare corresponds to a Glue1 VOView.

If needed, besides the VOViews we will also represent batch system queues as ComputingShares (this will have some impact on the WMS matchmaker)

  • GLUE2ShareID is concatenation of queue name + owner + ServiceId
  • GLUE2EntityCreationTime is the timestamp when the ldif file was created
  • GLUE2EntityOtherInfo includes the CEId and information concerning the info provider
  • GLUE2ShareDescription is "Share of <queuename> for <VO>"
  • GLUE2ComputingShareMappingQueue is the batch system queue name
  • GLUE2ComputingShareMaxWallTime is 999999999 in the ldif static file; it is supposed to be overwritten by the batch system specific dynamic plugin
  • GLUE2ComputingShareMaxCPUTime is 999999999 in the ldif static file; it is supposed to be overwritten by the batch system specific dynamic plugin
  • GLUE2ComputingShareMaxRunningJobs is 999999999 in the ldif static file; it is supposed to be overwritten by the batch system specific dynamic plugin
  • GLUE2ComputingShareServingState is production in the ldif static file; it is supposed to be overwritten by the batch system specific dynamic plugin
  • GLUE2ComputingShareTotalJobs is 0 in the ldif static file; it is supposed to be overwritten by the generic dynamic scheduler plugin
  • GLUE2ComputingShareRunningJobs is 0 in the ldif static file; it is supposed to be overwritten by the generic dynamic scheduler plugin
  • GLUE2ComputingShareWaitingJobs is 444444 in the ldif static file; it is supposed to be overwritten by the generic dynamic scheduler plugin
  • GLUE2ComputingShareEstimatedAverageWaitingTime is 2146660842 in the ldif static file; it is supposed to be overwritten by the generic dynamic scheduler plugin
  • GLUE2ComputingShareEstimatedWorstWaitingTime is 2146660842 in the ldif static file; it is supposed to be overwritten by the generic dynamic scheduler plugin
  • GLUE2ComputingShareFreeSlots is 0 in the ldif static file; it is supposed to be overwritten by the generic dynamic scheduler plugin

2.1.4 ComputingManager

  • GLUE2ManagerID is ServiceId + "_Manager"
  • GLUE2EntityCreationTime is the timestamp when the ldif file was created
  • GLUE2EntityName is: "Computing Manager on <host>"
  • GLUE2EntityOtherInfo includes information concerning the info provider
  • GLUE2ManagerProductName is the value CE_BATCH_SYS in /etc/glite-ce-glue2/glite-ce-glue2.conf. This value is the one specified by the yaim variable CE_BATCH_SYS=
  • GLUE2ManagerProductVersion in the ldif file is the value BATCH_VERSION in /etc/glite-ce-glue2/glite-ce-glue2.conf. This value is the one specified by the yaim variable BATCH_VERSION. It is supposed to be overwritten by the batch system specific dynamic plugin

2.1.5 Benchmark

For each ExecutionEnvironment, a Benchmark objectclass is created for each benchmark that must be represented.

In /etc/glite-ce-glue2/glite-ce-glue2.conf (filled by yaim) the following is defined:

ExecutionEnvironment_<ExecutionEnvironmentId>_Benchmarks = (Benchmark1, Benchmark2, .., Benchmarkn)

where the format of Benchmarki is: (Type Value)

This is then used to produce the ldif file with the Benchmark objectclasses.

The benchmark that are now represented are:

  • specfp2000 (using the yaim variable CE_SF00)
  • specint2000 (using the yaim variable CE_SI0)
  • HEP-SPEC06 (if the yaim variable CE_OTHERDESCR reports the value for this benchmark)

  • GLUE2BenchmarkID is the concatenation of ResourceId and type of benchmark
  • GLUE2EntityCreationTime is the timestamp when the ldif file was created
  • Glue2EntityName is "Benchmark" + the type of bechmark
  • GLUE2EntityOtherInfo includes information concerning the info provider
  • GLUE2BenchmarkType is the type of benchmark (specfp2000, specint2000, ..)
  • GLUE2BenchmarkValue is the value for that benchmark

2.1.6 ExecutionEnvironment

TBD

2.1.7 ApplicationEnvironment

TBD

2.1.8 ApplicationHandle

TBD

2.1.9 ComputingActivity

We don't implement the ComputingActivity objectclass, since we don't publish information regarding jobs

2.1.10 ToStorageService

TBD

2.2 CREAM CE in no cluster mode

We assume that a CREAM CE is configured in cluster mode if that is the only CREAM CE available in the site. I.e. sites with multiple CREAM CEs (submitting to the same batch system) should always have a cluster node and therefore be configured in cluster mode.

If a CREAM CE is configured in no cluster mode, all the Glue2 object classes are published by the resource BDII running on the CREAM CE.

These objectclasses are:

  • ComputingService (done)
  • ComputingEndPoint (done)
  • AccessPolicy (done)
  • ComputingManager (done)
  • ComputingShare (done)
  • MappingPolicy (done)
  • ExecutionEnvironment (done)
  • Benchmark (done)
  • ToStorageService (done)
  • ApplicationEnvironment (todo)
  • EndPoint for RTEPublihser (done)
    • "Child" of ComputingService
  • EndPoint for CEMon (done)
    • "Child" of ComputingService
    • Published only if CEMon is deployed
The ComputingServiceId is the one specified by the yaim variable COMPUTING_SERVICE_ID (if specified). Otherwise it is ${CE_HOST}_ComputingElement..

The EndpointId! is hostname + "_org.glite.ce.CREAM".

2.3 CREAM CE in cluster mode

Sites with multiple CREAM CEs (submitting to the same batch system) should always have a cluster node and therefore be configured in cluster mode.

If a CREAM CE is configured in cluster mode:

  • The resource BDII running on the CREAM CE publishes just the following objectclasses:
    • ComputingEndpoint (done)
    • AccessPolicy (done)
  • All the other objectclasses are published by the resource BDII running on the gLite-CLUSTER node:
    • ComputingService (done)
    • ComputingManager (done)
    • ComputingShare (todo)
    • MappingPolicy (todo)
    • ExecutionEnvironment (done)
    • Benchmark (done)
    • ToStorageService (done)
    • ApplicationEnvironment (todo)
    • EndPoint for RTEPublihser (done)
      • "Child" of ComputingService

  • The ServiceId is the the one specified by the yaim variable COMPUTING_SERVICE_ID which is such scenario is mandatory. This variable must have the same value in all the relevant nodes (in the cluster node and in all the CREAM CEs)
The EndpointId is hostname + "_org.glite.ce.CREAM".

3 Batch system dynamic information

3.1 Torque/PBS

The PBS dynamic plugin for Glue1 publishes for each batch system queue something like:

dn: GlueCEUniqueID=cream-38.pd.infn.it:8443/cream-pbs-creamtest1,mds-vo-name=resource,o=grid
GlueCEInfoLRMSVersion: 2.5.7
GlueCEInfoTotalCPUs: 5
GlueCEPolicyAssignedJobSlots: 5
GlueCEStateFreeCPUs: 5
GlueCEPolicyMaxCPUTime: 2880
GlueCEPolicyMaxWallClockTime: 4320
GlueCEStateStatus: Production

The generic dynamic scheduler plugin, for Glue1 publishes for each VOView something like:

dn: GlueVOViewLocalID=alice,GlueCEUniqueID=cream-38.pd.infn.it:8443/cream-pbs-creamtest1,mds-vo-name=resource,o=grid
GlueVOViewLocalID: alice
GlueCEStateRunningJobs: 0
GlueCEStateWaitingJobs: 0
GlueCEStateTotalJobs: 0
GlueCEStateFreeJobSlots: 5
GlueCEStateEstimatedResponseTime: 0
GlueCEStateWorstResponseTime: 0

and for each queue publishes something like:

dn: GlueCEUniqueID=cream-38.pd.infn.it:8443/cream-pbs-creamtest1,mds-vo-name=resource,o=grid
GlueCEStateFreeJobSlots: 5
GlueCEStateFreeCPUs: 5
GlueCEStateRunningJobs: 0
GlueCEStateWaitingJobs: 0
GlueCEStateTotalJobs: 0
GlueCEStateEstimatedResponseTime: 0
GlueCEStateWorstResponseTime: 0

3.2 LSF

The LSF dynamic plugin for Glue1 publishes for each batch system queue something like:

dn: GlueCEUniqueID=cream-29.pd.infn.it:8443/cream-lsf-creamcert2,mds-vo-name=resource,o=grid
GlueCEInfoLRMSVersion: 7 Update 5
GlueCEInfoTotalCPUs: 216
GlueCEPolicyAssignedJobSlots: 216
GlueCEPolicyMaxRunningJobs: 216
GlueCEPolicyMaxCPUTime: 9999999999
GlueCEPolicyMaxWallClockTime: 9999999999
GlueCEPolicyPriority: -20
GlueCEStateFreeCPUs: 6
GlueCEStateFreeJobSlots: 216
GlueCEStateStatus: Production

The generic dynamic scheduler plugin, for Glue1 publishes for each VOView something like:

dn: GlueVOViewLocalID=alice,GlueCEUniqueID=cream-29.pd.infn.it:8443/cream-lsf-creamcert1,mds-vo-name=resource,o=grid
GlueVOViewLocalID: alice
GlueCEStateRunningJobs: 0
GlueCEStateWaitingJobs: 0
GlueCEStateTotalJobs: 0
GlueCEStateFreeJobSlots: 216
GlueCEStateEstimatedResponseTime: 0
GlueCEStateWorstResponseTime: 0

and for each queue publishes something like:

dn: GlueCEUniqueID=cream-29.pd.infn.it:8443/cream-lsf-creamcert1,mds-vo-name=resource,o=grid
GlueCEStateFreeJobSlots: 216
GlueCEStateFreeCPUs: 216
GlueCEStateRunningJobs: 0
GlueCEStateWaitingJobs: 0
GlueCEStateTotalJobs: 0
GlueCEStateEstimatedResponseTime: 0
GlueCEStateWorstResponseTime: 0

3.3 SGE

The SGE dynamic plugin for Glue1 publishes for each batch system queue something like:

dn: GlueCEUniqueID=sa3-ce.egee.cesga.es:8443/cream-sge-ops,mds-vo-name=resource,o=grid
GlueCEInfoLRMSVersion: 6.1u3
GlueCEPolicyAssignedJobSlots: 1
GlueCEPolicyMaxRunningJobs: 1
GlueCEInfoTotalCPUs: 1
GlueCEStateFreeJobSlots: 1
GlueCEStateFreeCPUs: 1
GlueCEPolicyMaxCPUTime: 4320
GlueCEPolicyMaxWallClockTime: 9000
GlueCEStateStatus: Production

The generic dynamic scheduler plugin, for Glue1 publishes for each VOView something like:

dn: GlueVOViewLocalID=biomed,GlueCEUniqueID=sa3-ce.egee.cesga.es:8443/cream-sge-biomed,mds-vo-name=resource,o=grid
GlueVOViewLocalID: biomed
GlueCEStateRunningJobs: 0
GlueCEStateWaitingJobs: 0
GlueCEStateTotalJobs: 0
GlueCEStateFreeJobSlots: 1
GlueCEStateEstimatedResponseTime: 0
GlueCEStateWorstResponseTime: 0

and for each queue publishes something like:

dn: GlueCEUniqueID=sa3-ce.egee.cesga.es:8443/cream-sge-biomed,mds-vo-name=resource,o=grid
GlueCEStateFreeJobSlots: 1
GlueCEStateFreeCPUs: 1
GlueCEStateRunningJobs: 0
GlueCEStateWaitingJobs: 0
GlueCEStateTotalJobs: 0
GlueCEStateEstimatedResponseTime: 0
GlueCEStateWorstResponseTime: 0

3.4 Work to be done to support Glue2 publication

3.4.1 Work to be done in the PBS/Torque information provider

  • The value published in Glue1 as GlueCEInfoLRMSVersion should be published in Glue2 as GLUE2ManagerProductVersion ( ComputingManager objectclass)
  • The value published in Glue1 as GlueCEPolicyMaxCPUTime should be published in Glue2 as GLUE2ComputingShareMaxCPUTime ( ComputingShare objectclass)
    • For all the ComputingShares referring to that batch system queue
  • The value published in Glue1 as GlueCEPolicyMaxWallClockTime should be published in Glue2 as GLUE2ComputingShareMaxWallTime ( ComputingShare objectclass) * For all the ComputingShares referring to that batch system queue
  • The value published in Glue1 as GlueCEStateStatus should be published in Glue2 as GLUE2ComputingShareServingState ( ComputingShare objectclass)
    • For all the ComputingShares referring to that batch system queue

3.4.2 Work to be done in the LSF information provider

  • The value published in Glue1 as GlueCEInfoLRMSVersion should be published in Glue2 as GLUE2ManagerProductVersion ( ComputingManager objectclass)
  • The value published in Glue1 as GlueCEPolicyMaxCPUTime should be published in Glue2 as GLUE2ComputingShareMaxCPUTime ( ComputingShare objectclass)
    • For all the ComputingShares referring to that batch system queue
  • The value published in Glue1 as GlueCEPolicyMaxWallClockTime should be published in Glue2 as GLUE2ComputingShareMaxWallTime ( ComputingShare objectclass)
    • For all the ComputingShares referring to that batch system queue
  • The value published in Glue1 as GlueCEPolicyMaxRunningJobs should be published in Glue2 as GLUE2ComputingShareMaxRunningJobs ( ComputingShare objectclass)
    • For all the ComputingShares referring to that batch system queue
  • The value published in Glue1 as GlueCEStateStatus should be published in Glue2 as GLUE2ComputingShareServingState ( ComputingShare objectclass)
    • For all the ComputingShares referring to that batch system queue

3.4.3 Work to be done in the SGE information provider

  • The value published in Glue1 as GlueCEInfoLRMSVersion should be published in Glue2 as GLUE2ManagerProductVersion ( ComputingManager objectclass)
  • The value published in Glue1 as GlueCEPolicyMaxCPUTime should be published in Glue2 as GLUE2ComputingShareMaxCPUTime ( ComputingShare objectclass)
    • For all the ComputingShares referring to that batch system queue
  • The value published in Glue1 as GlueCEPolicyMaxWallClockTime should be published in Glue2 as GLUE2ComputingShareMaxWallTime ( ComputingShare objectclass)
    • For all the ComputingShares referring to that batch system queue
  • The value published in Glue1 as GlueCEPolicyMaxRunningJobs should be published in Glue2 as GLUE2ComputingShareMaxRunningJobs ( ComputingShare objectclass)
    • For all the ComputingShares referring to that batch system queue
  • The value published in Glue1 as GlueCEStateStatus should be published in Glue2 as GLUE2ComputingShareServingState ( ComputingShare objectclass)
    • For all the ComputingShares referring to that batch system queue

3.4.4 Work to be done in the generic dynamic scheduler

  • The value published in Glue1 as GlueCEStateRunningJobs for the VOView objectclass should be published in Glue2 as GLUE2ComputingShareRunningJobs ( ComputingShare objectclass)
  • The value published in Glue1 as GlueCEStateWaitingJobs for the VOView objectclass should be published in Glue2 as GLUE2ComputingShareWaitingJobs ( ComputingShare objectclass)
  • The value published in Glue1 as GlueCEStateTotalJobs for the VOView objectclass should be published in Glue2 as GLUE2ComputingShareTotalJobs ( ComputingShare objectclass)
  • The value published in Glue1 as GlueCEStateFreeJobSlots for the VOView objectclass should be published in Glue2 as GLUE2ComputingShareFreeSlots ( ComputingShare objectclass)
  • The value published in Glue1 as GlueCEStateEstimatedResponseTime for the VOView should be published in Glue2 as GLUE2ComputingShareEstimatedAverageWaitingTime ( ComputingShare objectclass)
  • The value published in Glue1 as GlueCEStateWorstResponseTime for the VOView should be published in Glue2 as GLUE2ComputingShareEstimatedWorstWaitingTime ( ComputingShare objectclass)

4 Relevant RFCs

5 Testbed

The following machines are being used for testing

  • cream-38.pd.infn.it (Torque)
  • cream-29.pd.infn.it (LSF)

-- MassimoSgaravatto - 2011-06-21

Edit | Attach | PDF | History: r48 | r27 < r26 < r25 < r24 | Backlinks | Raw View | More topic actions...
Topic revision: r25 - 2011-09-23 - MassimoSgaravatto
 

  • Edit
  • Attach
This site is powered by the TWiki collaboration platformCopyright © 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback