Savannah patch 2597/3044 stands for a significant redesign of the gLite WMS which is basically (but absolutely not only) the end result of the code restructuring activity which took place during EGEE-II. It includes a redesign of the core architecture, implementation of a parallel match-making algorithm, restructuring of the WMProxy interface and a reviewed ISM structure which allows for an optimized algorithm for match-making with data.

Among the major problems revealed by the previous releases of the WMS was the core architecture and the match-making performance, which the subsequent difficulty to recover from complex situations involving pending jobs, resubmissions, denials of service, timeouts and so on. Of course, the WMS has to cover a wide range of submission use-cases, and, even after the positive introduction of bulk submission and match-making for collections (as required by the experiments), both WMS stability and performance continued to be affected by an internal architecture prone to race conditions and needing huge locks throughput.

In this respect, the new architecture implemented in this release can be utilized in a wider scenario of submission use-cases. Prior to the present release single job processing&match-making was done almost serially, because of "big" locks on concurrently accessed data structures. This redesign will also allow for faster operations on the ISM, such as dumping or updating (deleting expired entries and so on). The so-called Task Queue is replaced by a prioritized event queue model where requests queues up waiting to be processed in a as stateless as possible fashion. Two different queues are needed to accomodate for this new model. Periodic activities, handled as "timed" events, will re-schedule themselves to show-up at a given time in the "ready" queue. The number of threads is kept controlled, in such a way that each and every request for processing - a "submit" as well a "purchase from BDII" request - is processed within the same thread pool as a generic functor. The WM core becomes essentially a multi-threaded function executor, without the clear-cut dispatcher/request handler distinction which also used to require syncronization (locks) on several data structures.

Situations like the following, with basically serialized match-making, are now prevented by design:


09 Oct, 19:41:16 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: (0/6042 [12.67, 16.6] )

09 Oct, 19:41:20 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: (0/6042 [10.84, 15.47] )

09 Oct, 19:41:28 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: (0/6042 [12.39, 19.95] )

09 Oct, 19:41:32 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: (2/6042 [16.76, 20.33] )

09 Oct, 19:41:39 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: (0/6042 [19.25, 26.89] )

09 Oct, 19:41:46 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: (0/6042 [21.81, 28.56] )

09 Oct, 19:41:50 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: (0/6042 [25.48, 29.72] )


This patch also provides an update of ICE, improving the performance and the scalability of submissions to CREAM based CE via the WMS, even if there are still some other scalability issues, which appear when there are many (thousands) active jobs being managed by ICE

Newly introduced features

  • parallel match-making

  • ISM: restructured algorithm for matchmaking in case of data requirements specified in the JDL.

The new algorithm performs a reverse search starting from data resolution reducing the size of the search space to only those computing resources mounting the storage providing the specified files. Integration of the new restructured data structure within the broker/helper/brokerinfo modules allowing a faster collection of all the data relevant to the construction of the brokerinfo file while performing the MM. This allows the generation of the brokerinfo file without any further query to the ISM for extracting storage infomation.

  • added support for IPv6

  • improved error reporting for DAGs

  • cancel implemented also for collections

  • run-time selection of LB type: server or proxy.

Typically for small VOs (but not only as we will see later), it can make sense to install both WMS and LB on a single machine. In such circumstances, the use of LB proxy (a cache of jobs under processing) is discouraged to avoid storing twice the same events (this will change with the advent of LB 2.0 wherever LBserver and LBProxy are colocated). Also, configuring the whole system to work with LBserver instead of LB proxy needs to be done once for each and every component for correct job state computation. The previous versions allowed (not correctly) to mix up use of LBserver and proxy. See the "configuration changes" section for more details.

  • the jobwrapper template is now cached at each WM start (restart to re-load any change)

  • restructured jobwrapper (also removed perl dependencies)

  • dumping the ISM can be done more often at a lesser cost, simply by creating a jobdir request (basically a file) like this one: [ command = "ism_dump"; ]

  • this code baseline is ready to support Grid Site Delegtion 2, at the moment disabled given that for backward-compatibility needs coming from external packages the project has been build agains Gridsite 1.1.18

Configuration changes

LBProxy (= true) in Common section, has been moved from [WorkloadManageProxy] section (see above).

RuntimeMalloc [WorkloadManager]: allows to the use an alternative malloc library (i.e. nedmalloc, google performance tools and many more), run-time redirecting with LD_PRELOAD. Possible values are, for example, RuntimeMalloc = "/usr/lib/" if you use Google malloc. IsmThreads (= true) [WMConfiguration]: The new WM core processes each computation/request in a thread pool (WorkerThreads being the number) . Among many other benefits, this also helps keeping the number of threads controlled, so that it should be possibly set matching the number of physical cores. This is what happens with IsmThreads set to false. Now, think of all the WorkerThreads threads being busy at, say, submitting 1000 nodes collections. It would happen that ISM requests would have to wait, even if on top of the queue (they have the highest priority). To set-up a safer configuration for similar use-cases, one can specify for the WM to handle ISM requests as separate threads (not traversing any queue and not being part of a pool), hence IsmThreads = true.

QueueSize (= 1000) [WorkloadManager]: Size of the queue of events "ready" to be managed by the workers thread pool

Some remarks:

the workaround "EnableZippedISB = false" in the JDL, previously in place for WMS 3.1 (patch 2562), can now be removed

given the increased performance of the WM, backlogs to JC/ICE side could be experimented. For the JC this basically stems from Condor not keeping up the pace (in the way Condor is used within our design). What can be done to avoid this effect:

1) Make sure jobdir is enabled as the JC input

2) Consider using a machine with two disks and two separate controllers. Use one for the JC/Condor stuff

3) A different physical layout for equally distributing the load between the WMS and LB is suggested. It always require two machines for a WMS+LB node, but differently distributed.Check out here

Known Issues

- bug #49844: WMProxy does not catch signal 25

- bug #50009: wmproxy.gacl person record allows anyone to pass

- bug #54144: WMS 3.2 Workload Manager memory leak? **Those who are experiencing this problem are strongly encouraged to use the google malloc**

- bug #53714:WMS PURGER SHOULD NOT directly FORCE PURGE OF jobs when its DN is not authorized on LB server

- bug #55709: problems with glite-wms-wm restart in WMS 3.2

Edit | Attach | PDF | History: r29 < r28 < r27 < r26 < r25 | Backlinks | Raw View | More topic actions
Topic revision: r29 - 2009-09-29 - MarcoCecchi
This site is powered by the TWiki collaboration platformCopyright © 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback