Difference: ReleaseNotes2597 (1 vs. 29)

Revision 292009-09-29 - MarcoCecchi

Line: 1 to 1
 
META TOPICPARENT name="WebHome"
Introduction
Line: 87 to 87
 **Those who are experiencing this problem are strongly encouraged to use the google malloc**

- bug #53714:WMS PURGER SHOULD NOT directly FORCE PURGE OF jobs when its DN is not authorized on LB server

Added:
>
>
- bug #55709: problems with glite-wms-wm restart in WMS 3.2
 \ No newline at end of file

Revision 282009-08-31 - MarcoCecchi

Line: 1 to 1
 
META TOPICPARENT name="WebHome"
Changed:
<
<
Brief explanation
>
>
Introduction
  Savannah patch 2597/3044 stands for a significant redesign of the gLite WMS which is basically (but absolutely not only) the end result of the code restructuring activity which took place during EGEE-II. It includes a redesign of the core architecture, implementation of a parallel match-making algorithm, restructuring of the WMProxy interface and a reviewed ISM structure which allows for an optimized algorithm for match-making with data.
Line: 67 to 67
  Some remarks:
Changed:
<
<
workaround "EnableZippedISB = false" in the JDL can be removed
>
>
the workaround "EnableZippedISB = false" in the JDL, previously in place for WMS 3.1 (patch 2562), can now be removed
  given the increased performance of the WM, backlogs to JC/ICE side could be experimented. For the JC this basically stems from Condor not keeping up the pace (in the way Condor is used within our design). What can be done to avoid this effect:
Line: 84 to 84
 - bug #50009: wmproxy.gacl person record allows anyone to pass

- bug #54144: WMS 3.2 Workload Manager memory leak?

Changed:
<
<
**Whoever is experiencing this problem is strongly encouraged to use the google malloc**
>
>
**Those who are experiencing this problem are strongly encouraged to use the google malloc**
  - bug #53714:WMS PURGER SHOULD NOT directly FORCE PURGE OF jobs when its DN is not authorized on LB server \ No newline at end of file

Revision 272009-08-27 - MarcoCecchi

Line: 1 to 1
 
META TOPICPARENT name="WebHome"
Brief explanation

Revision 262009-08-20 - MarcoCecchi

Line: 1 to 1
 
META TOPICPARENT name="WebHome"
Added:
>
>
Brief explanation
 Savannah patch 2597/3044 stands for a significant redesign of the gLite WMS which is basically (but absolutely not only) the end result of the code restructuring activity which took place during EGEE-II. It includes a redesign of the core architecture, implementation of a parallel match-making algorithm, restructuring of the WMProxy interface and a reviewed ISM structure which allows for an optimized algorithm for match-making with data.

Among the major problems revealed by the previous releases of the WMS was the core architecture and the match-making performance, which the subsequent difficulty to recover from complex situations involving pending jobs, resubmissions, denials of service, timeouts and so on. Of course, the WMS has to cover a wide range of submission use-cases, and, even after the positive introduction of bulk submission and match-making for collections (as required by the experiments), both WMS stability and performance continued to be affected by an internal architecture prone to race conditions and needing huge locks throughput.

Line: 22 to 24
  This patch also provides an update of ICE, improving the performance and the scalability of submissions to CREAM based CE via the WMS, even if there are still some other scalability issues, which appear when there are many (thousands) active jobs being managed by ICE
Changed:
<
<
Newly introduced features:
>
>
Newly introduced features
 
  • parallel match-making
Line: 34 to 36
 
  • improved error reporting for DAGs
Added:
>
>
  • cancel implemented also for collections
 
  • run-time selection of LB type: server or proxy.

Typically for small VOs (but not only as we will see later), it can make sense to install both WMS and LB on a single machine. In such circumstances, the use of LB proxy (a cache of jobs under processing) is discouraged to avoid storing twice the same events (this will change with the advent of LB 2.0 wherever LBserver and LBProxy are colocated). Also, configuring the whole system to work with LBserver instead of LB proxy needs to be done once for each and every component for correct job state computation. The previous versions allowed (not correctly) to mix up use of LBserver and proxy. See the "configuration changes" section for more details.

Line: 46 to 50
 
  • this code baseline is ready to support Grid Site Delegtion 2, at the moment disabled given that for backward-compatibility needs coming from external packages the project has been build agains Gridsite 1.1.18
Changed:
<
<
Configuration changes
>
>
Configuration changes
  LBProxy (= true) in Common section, has been moved from [WorkloadManageProxy] section (see above).
Line: 67 to 71
  3) A different physical layout for equally distributing the load between the WMS and LB is suggested. It always require two machines for a WMS+LB node, but differently distributed.Check out here https://twiki.cnaf.infn.it/cgi-bin/twiki/view/EgeeJra1It/DistributedWMS
Changed:
<
<
Known Issues
>
>
Known Issues
  - bug #49844: WMProxy does not catch signal 25

- bug #50009: wmproxy.gacl person record allows anyone to pass \ No newline at end of file

Added:
>
>
- bug #54144: WMS 3.2 Workload Manager memory leak? **Whoever is experiencing this problem is strongly encouraged to use the google malloc**

- bug #53714:WMS PURGER SHOULD NOT directly FORCE PURGE OF jobs when its DN is not authorized on LB server

Revision 252009-07-22 - MarcoCecchi

Line: 1 to 1
 
META TOPICPARENT name="WebHome"
Savannah patch 2597/3044 stands for a significant redesign of the gLite WMS which is basically (but absolutely not only) the end result of the code restructuring activity which took place during EGEE-II. It includes a redesign of the core architecture, implementation of a parallel match-making algorithm, restructuring of the WMProxy interface and a reviewed ISM structure which allows for an optimized algorithm for match-making with data.
Line: 72 to 72
 - bug #49844: WMProxy does not catch signal 25

- bug #50009: wmproxy.gacl person record allows anyone to pass

Deleted:
<
<
- bug #51039: If bulk matchmaking is enabled, submitting a collection with data requirements cause a crash in the WorkloadManager
 \ No newline at end of file

Revision 242009-06-22 - MarcoCecchi

Line: 1 to 1
 
META TOPICPARENT name="WebHome"
Changed:
<
<
Savannah patch 2597 stands for a significant redesign of the gLite WMS which is basically (but absolutely not only) the end result of the code restructuring activity which took place during EGEE-II. It includes a redesign of the core architecture, implementation of a parallel match-making algorithm, restructuring of the WMProxy interface and a reviewed ISM structure which allows for an optimized algorithm for match-making with data.
>
>
Savannah patch 2597/3044 stands for a significant redesign of the gLite WMS which is basically (but absolutely not only) the end result of the code restructuring activity which took place during EGEE-II. It includes a redesign of the core architecture, implementation of a parallel match-making algorithm, restructuring of the WMProxy interface and a reviewed ISM structure which allows for an optimized algorithm for match-making with data.
  Among the major problems revealed by the previous releases of the WMS was the core architecture and the match-making performance, which the subsequent difficulty to recover from complex situations involving pending jobs, resubmissions, denials of service, timeouts and so on. Of course, the WMS has to cover a wide range of submission use-cases, and, even after the positive introduction of bulk submission and match-making for collections (as required by the experiments), both WMS stability and performance continued to be affected by an internal architecture prone to race conditions and needing huge locks throughput.

Revision 232009-06-08 - MarcoCecchi

Line: 1 to 1
 
META TOPICPARENT name="WebHome"
Savannah patch 2597 stands for a significant redesign of the gLite WMS which is basically (but absolutely not only) the end result of the code restructuring activity which took place during EGEE-II. It includes a redesign of the core architecture, implementation of a parallel match-making algorithm, restructuring of the WMProxy interface and a reviewed ISM structure which allows for an optimized algorithm for match-making with data.
Line: 75 to 75
  - bug #50009: wmproxy.gacl person record allows anyone to pass
Deleted:
<
<
- but #51039: If bulk matchmaking is enabled, submitting a collection with data requirements cause a crash in the WorkloadManager
 \ No newline at end of file
Added:
>
>
- bug #51039: If bulk matchmaking is enabled, submitting a collection with data requirements cause a crash in the WorkloadManager
 \ No newline at end of file

Revision 222009-05-29 - SalvatoreMonforte

Line: 1 to 1
 
META TOPICPARENT name="WebHome"
Savannah patch 2597 stands for a significant redesign of the gLite WMS which is basically (but absolutely not only) the end result of the code restructuring activity which took place during EGEE-II. It includes a redesign of the core architecture, implementation of a parallel match-making algorithm, restructuring of the WMProxy interface and a reviewed ISM structure which allows for an optimized algorithm for match-making with data.
Line: 75 to 75
  - bug #50009: wmproxy.gacl person record allows anyone to pass
Deleted:
<
<
- specifying DataAccessProtocol in a JDL (i.e. DataAccessProtocol = {"file","gsiftp","gridftp","rfio"}; ) without setting neither InputData nor DataRequirements causes the WM to crash.
 \ No newline at end of file
Added:
>
>
- but #51039: If bulk matchmaking is enabled, submitting a collection with data requirements cause a crash in the WorkloadManager
 \ No newline at end of file

Revision 212009-05-29 - MarcoCecchi

Line: 1 to 1
 
META TOPICPARENT name="WebHome"
Savannah patch 2597 stands for a significant redesign of the gLite WMS which is basically (but absolutely not only) the end result of the code restructuring activity which took place during EGEE-II. It includes a redesign of the core architecture, implementation of a parallel match-making algorithm, restructuring of the WMProxy interface and a reviewed ISM structure which allows for an optimized algorithm for match-making with data.
Line: 71 to 71
 Known Issues
Changed:
<
<
bug #49844: WMProxy does not catch signal 25
>
>
- bug #49844: WMProxy does not catch signal 25
 
Changed:
<
<
bug #50009: wmproxy.gacl person record allows anyone to pass
>
>
- bug #50009: wmproxy.gacl person record allows anyone to pass

- specifying DataAccessProtocol in a JDL (i.e. DataAccessProtocol = {"file","gsiftp","gridftp","rfio"}; ) without setting neither InputData nor DataRequirements causes the WM to crash.

 \ No newline at end of file

Revision 202009-05-19 - MarcoCecchi

Line: 1 to 1
 
META TOPICPARENT name="WebHome"
Deleted:
<
<
 Savannah patch 2597 stands for a significant redesign of the gLite WMS which is basically (but absolutely not only) the end result of the code restructuring activity which took place during EGEE-II. It includes a redesign of the core architecture, implementation of a parallel match-making algorithm, restructuring of the WMProxy interface and a reviewed ISM structure which allows for an optimized algorithm for match-making with data.

Among the major problems revealed by the previous releases of the WMS was the core architecture and the match-making performance, which the subsequent difficulty to recover from complex situations involving pending jobs, resubmissions, denials of service, timeouts and so on. Of course, the WMS has to cover a wide range of submission use-cases, and, even after the positive introduction of bulk submission and match-making for collections (as required by the experiments), both WMS stability and performance continued to be affected by an internal architecture prone to race conditions and needing huge locks throughput.

Line: 67 to 66
 2) Consider using a machine with two disks and two separate controllers. Use one for the JC/Condor stuff

3) A different physical layout for equally distributing the load between the WMS and LB is suggested. It always require two machines for a WMS+LB node, but differently distributed.Check out here https://twiki.cnaf.infn.it/cgi-bin/twiki/view/EgeeJra1It/DistributedWMS

Added:
>
>

Known Issues

bug #49844: WMProxy does not catch signal 25

bug #50009: wmproxy.gacl person record allows anyone to pass

Revision 192009-04-14 - MarcoCecchi

Line: 1 to 1
 
META TOPICPARENT name="WebHome"
Deleted:
<
<
Savannah patch 2597 stands for a significant redesign of the WMS. That's basically (but absolutely not only) the end result of the code restructuring activity which took place during EGEE-II. It includes a redesign of the core architecture, implementation of a parallel match-making algorithm, restructuring of the WMProxy interface and a reviewed ISM structure which allows for an optimized algorithm for match-making with data.
 
Changed:
<
<
Among the major problems shown by the previous releases of the gLite WMS was the match-making performance. Bulk match-making for collections was an effective feature, but the WMS performance is affected by also single jobs, among which resubmissions and pending are always found even when relying on collections for one's computing model. In this respect, the new architecture can be utilized in a wider scenario of submission use-cases. Prior to the present release single job processing was done almost serially, because of "big" locks on concurrently accessed data structures. This redesign will also allow for faster operations on the ISM, such as dumping or updating (deleting expired entries and so on). The so-called Task Queue is replaced by a prioritized event queue model where requests queues up waiting to be processed in a as stateless as possible fashion. Periodic activities, handled as "timed" events, will re-schedule themselves to show-up at a given time in the "ready" queue. The number of threads is kept controlled, in such a way that each and every request for processing - a "submit" as well a "purchase from BDII" request - is processed within the same thread pool as a generic functor. The WM core becomes essentially a multi-threaded function executor, without the clear-cut dispatcher/request handler distinction which also used to require syncronization (locks) on several data structures.
>
>
Savannah patch 2597 stands for a significant redesign of the gLite WMS which is basically (but absolutely not only) the end result of the code restructuring activity which took place during EGEE-II. It includes a redesign of the core architecture, implementation of a parallel match-making algorithm, restructuring of the WMProxy interface and a reviewed ISM structure which allows for an optimized algorithm for match-making with data.

Among the major problems revealed by the previous releases of the WMS was the core architecture and the match-making performance, which the subsequent difficulty to recover from complex situations involving pending jobs, resubmissions, denials of service, timeouts and so on. Of course, the WMS has to cover a wide range of submission use-cases, and, even after the positive introduction of bulk submission and match-making for collections (as required by the experiments), both WMS stability and performance continued to be affected by an internal architecture prone to race conditions and needing huge locks throughput.

In this respect, the new architecture implemented in this release can be utilized in a wider scenario of submission use-cases. Prior to the present release single job processing&match-making was done almost serially, because of "big" locks on concurrently accessed data structures. This redesign will also allow for faster operations on the ISM, such as dumping or updating (deleting expired entries and so on). The so-called Task Queue is replaced by a prioritized event queue model where requests queues up waiting to be processed in a as stateless as possible fashion. Two different queues are needed to accomodate for this new model. Periodic activities, handled as "timed" events, will re-schedule themselves to show-up at a given time in the "ready" queue. The number of threads is kept controlled, in such a way that each and every request for processing - a "submit" as well a "purchase from BDII" request - is processed within the same thread pool as a generic functor. The WM core becomes essentially a multi-threaded function executor, without the clear-cut dispatcher/request handler distinction which also used to require syncronization (locks) on several data structures.

  Situations like the following, with basically serialized match-making, are now prevented by design:
Added:
>
>
...
 09 Oct, 19:41:16 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/ubZmW8I1u5xiaUIlJiFcsg (0/6042 [12.67, 16.6] ) 09 Oct, 19:41:20 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/Y-JZ3DEbqJbordWuIvkIxg (0/6042 [10.84, 15.47] ) 09 Oct, 19:41:28 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/MrYWfFnejXhh43SHX0Lp3g (0/6042 [12.39, 19.95] )
Line: 37 to 41
 
  • the jobwrapper template is now cached at each WM start (restart to re-load any change)
Changed:
<
<
  • restructured jobwrapper
>
>
  • restructured jobwrapper (also removed perl dependencies)
 
  • dumping the ISM can be done more often at a lesser cost, simply by creating a jobdir request (basically a file) like this one: [ command = "ism_dump"; ]

  • this code baseline is ready to support Grid Site Delegtion 2, at the moment disabled given that for backward-compatibility needs coming from external packages the project has been build agains Gridsite 1.1.18
Deleted:
<
<
 Configuration changes

LBProxy (= true) in Common section, has been moved from [WorkloadManageProxy] section (see above).

Line: 64 to 66
  2) Consider using a machine with two disks and two separate controllers. Use one for the JC/Condor stuff
Changed:
<
<
3) A different physical layout for equally distributing the load between the WMS and LB is suggested. It always require two machines for a WMS+LB node, but differently distributed.Check out here https://twiki.cnaf.infn.it/cgi-bin/twiki/view/EgeeJra1It/DistributedWMS
>
>
3) A different physical layout for equally distributing the load between the WMS and LB is suggested. It always require two machines for a WMS+LB node, but differently distributed.Check out here https://twiki.cnaf.infn.it/cgi-bin/twiki/view/EgeeJra1It/DistributedWMS

Revision 182009-04-08 - MassimoSgaravatto

Line: 1 to 1
 
META TOPICPARENT name="WebHome"
Savannah patch 2597 stands for a significant redesign of the WMS. That's basically (but absolutely not only) the end result of the code restructuring activity which took place during EGEE-II. It includes a redesign of the core architecture, implementation of a parallel match-making algorithm, restructuring of the WMProxy interface and a reviewed ISM structure which allows for an optimized algorithm for match-making with data.
Line: 16 to 16
  ...
Added:
>
>
This patch also provides an update of ICE, improving the performance and the scalability of submissions to CREAM based CE via the WMS, even if there are still some other scalability issues, which appear when there are many (thousands) active jobs being managed by ICE
 Newly introduced features:

  • parallel match-making
Line: 40 to 43
 
  • this code baseline is ready to support Grid Site Delegtion 2, at the moment disabled given that for backward-compatibility needs coming from external packages the project has been build agains Gridsite 1.1.18
Added:
>
>
 Configuration changes

LBProxy (= true) in Common section, has been moved from [WorkloadManageProxy] section (see above).

Revision 172009-04-03 - MarcoCecchi

Line: 1 to 1
 
META TOPICPARENT name="WebHome"
Changed:
<
<
Savannah patch 2597 stands for a significant redesign of the WMS. That's basically (but absolutely not only) the end result of the code restructuring activity which took place during EGEE-II. It includes a redesign of the core architecture, restructuring of the WMProxy interface and a reviewed ISM structure which allows for an optimized algorithm for match-making with data.
In a few words, this release in target at providing a more
performant, responsive and lightweight WMS. 
>
>
Savannah patch 2597 stands for a significant redesign of the WMS. That's basically (but absolutely not only) the end result of the code restructuring activity which took place during EGEE-II. It includes a redesign of the core architecture, implementation of a parallel match-making algorithm, restructuring of the WMProxy interface and a reviewed ISM structure which allows for an optimized algorithm for match-making with data.
 
Changed:
<
<
Among the major problems shown by the previous releases of the gLite WMS was the match-making performance. Bulk match-making for collections  was an effective feature, but the WMS performance is affected by also single jobs, among which resubmissions and pending are always found even when relying on collections for one's computing model. In this respect, the new architecture can be utilized in a wider scenario of submission use-cases.
Prior to the present release single job processing was done almost serially, because of "big" locks on concurrently accessed data structures. This redesign will also allow for faster operations on the ISM, such as dumping or updating (deleting expired entries and so on).
The so-called Task Queue is replaced by a prioritized event queue model where requests queues up waiting to be processed in a as stateless as possible fashion. Periodic activities, handled as "timed" events, will re-schedule themselves to show-up at a given time in the "ready" queue. The number of threads is kept controlled, in such a way that each and every request for processing - a "submit" request as well a "purchase from BDII" request -, is processed by the same thread pool. The WM core becomes essentially a multi-threaded function executor, without the clear-cut dispatcher/request handler distinction which alose required syncronization on several data structures.
>
>
Among the major problems shown by the previous releases of the gLite WMS was the match-making performance. Bulk match-making for collections was an effective feature, but the WMS performance is affected by also single jobs, among which resubmissions and pending are always found even when relying on collections for one's computing model. In this respect, the new architecture can be utilized in a wider scenario of submission use-cases. Prior to the present release single job processing was done almost serially, because of "big" locks on concurrently accessed data structures. This redesign will also allow for faster operations on the ISM, such as dumping or updating (deleting expired entries and so on). The so-called Task Queue is replaced by a prioritized event queue model where requests queues up waiting to be processed in a as stateless as possible fashion. Periodic activities, handled as "timed" events, will re-schedule themselves to show-up at a given time in the "ready" queue. The number of threads is kept controlled, in such a way that each and every request for processing - a "submit" as well a "purchase from BDII" request - is processed within the same thread pool as a generic functor. The WM core becomes essentially a multi-threaded function executor, without the clear-cut dispatcher/request handler distinction which also used to require syncronization (locks) on several data structures.
 
Changed:
<
<
Situations like the following, with serialized match-making, are unfortunately known to everyone who has ever dealt with the WMS:
>
>
Situations like the following, with basically serialized match-making, are now prevented by design:
 
Changed:
<
<
09 Oct, 19:41:16 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/ubZmW8I1u5xiaUIlJiFcsg (0/6042 [12.67, 16.6] )
09 Oct, 19:41:20 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/Y-JZ3DEbqJbordWuIvkIxg (0/6042 [10.84, 15.47] )
09 Oct, 19:41:28 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/MrYWfFnejXhh43SHX0Lp3g (0/6042 [12.39, 19.95] )
09 Oct, 19:41:32 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/jg_p9HyHKGaTUB0qIV1_Ww (2/6042 [16.76, 20.33] )
09 Oct, 19:41:39 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/fQEtbp4kaF9qTicc-0bTig (0/6042 [19.25, 26.89] )
09 Oct, 19:41:46 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/lvHFKWrE5YkG2EWAUOe0jw (0/6042 [21.81, 28.56] )
09 Oct, 19:41:50 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/HRFXvdB2iGA9ylbvvAUmLA (0/6042 [25.48, 29.72] )
09 Oct, 19:41:58 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/PIcmoaE8q9a7oLcRDSFUgQ (2/6042 [19.52, 27.09] )
09 Oct, 19:42:02 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/M03ebqXLm_8blgrHOa9Jrw (2/6042 [20.88, 25.28] )
>
>
09 Oct, 19:41:16 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/ubZmW8I1u5xiaUIlJiFcsg (0/6042 [12.67, 16.6] ) 09 Oct, 19:41:20 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/Y-JZ3DEbqJbordWuIvkIxg (0/6042 [10.84, 15.47] ) 09 Oct, 19:41:28 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/MrYWfFnejXhh43SHX0Lp3g (0/6042 [12.39, 19.95] ) 09 Oct, 19:41:32 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/jg_p9HyHKGaTUB0qIV1_Ww (2/6042 [16.76, 20.33] ) 09 Oct, 19:41:39 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/fQEtbp4kaF9qTicc-0bTig (0/6042 [19.25, 26.89] ) 09 Oct, 19:41:46 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/lvHFKWrE5YkG2EWAUOe0jw (0/6042 [21.81, 28.56] ) 09 Oct, 19:41:50 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/HRFXvdB2iGA9ylbvvAUmLA (0/6042 [25.48, 29.72] )
  ...
Changed:
<
<
the introduction of parallel match-making together with the removal of previously required huge-scoped locks definitely fixes such patterns, allowing full exploitation of multi-core architectures.
>
>
Newly introduced features:
 
Changed:
<
<
Newly introduced features:
>
>
  • parallel match-making
 
Changed:
<
<
   * run-time selection of LB type: server or proxy.
>
>
  • ISM: restructured algorithm for matchmaking in case of data requirements specified in the JDL.
 
Changed:
<
<
Typically for small VOs (but not only as we will see later), it can make sense to install both WMS and LB on a single machine.  In such circumstances, the use of LB proxy (a cache of jobs under processing) is discouraged to avoid storing twice the same events (this will change with the advent of LB 2.0 wherever LBserver and LBProxy are colocated). Also, configuring the whole system to work with LBserver instead of LB proxy needs to be done once for each and every component for correct job state computation. The previous versions allowed (not correctly) to mix up use of LBserver and proxy. See the "configuration changes" section for more details.
>
>
The new algorithm performs a reverse search starting from data resolution reducing the size of the search space to only those computing resources mounting the storage providing the specified files. Integration of the new restructured data structure within the broker/helper/brokerinfo modules allowing a faster collection of all the data relevant to the construction of the brokerinfo file while performing the MM. This allows the generation of the brokerinfo file without any further query to the ISM for extracting storage infomation.
 
Changed:
<
<
   * highest match-making performance thanks to a review of the Match-making algorithm
>
>
  • added support for IPv6
 
Changed:
<
<
   * parallel match-making
>
>
  • improved error reporting for DAGs
 
Changed:
<
<
   * ISM: restructured algorithm for matchmaking in case of data requirements specified in the JDL.
>
>
  • run-time selection of LB type: server or proxy.
 
Changed:
<
<
The new algorithm performs a reverse search starting from data resolution reducing the size of the search space to only those computing resources mounting the storage providing the specified files. Integration of the new restructured data structure within the broker/helper/brokerinfo modules allowing a faster collection of all the data relevant to the construction of the brokerinfo file while performing the MM. This allows the generation of the brokerinfo file without any further query to the ISM for extracting storage infomation.    * added support for IPv6
>
>
Typically for small VOs (but not only as we will see later), it can make sense to install both WMS and LB on a single machine. In such circumstances, the use of LB proxy (a cache of jobs under processing) is discouraged to avoid storing twice the same events (this will change with the advent of LB 2.0 wherever LBserver and LBProxy are colocated). Also, configuring the whole system to work with LBserver instead of LB proxy needs to be done once for each and every component for correct job state computation. The previous versions allowed (not correctly) to mix up use of LBserver and proxy. See the "configuration changes" section for more details.
 
Changed:
<
<
   * cancel for collections
>
>
  • the jobwrapper template is now cached at each WM start (restart to re-load any change)
 
Changed:
<
<
   * improved error reporting for DAGs
>
>
  • restructured jobwrapper
 
Changed:
<
<
   * jobdir enabled also as JC/ICE input
>
>
  • dumping the ISM can be done more often at a lesser cost, simply by creating a jobdir request (basically a file) like this one: [ command = "ism_dump"; ]
 
Changed:
<
<
   * the jobwrapper template is now cached at each WM start (restart to re-read it)
>
>
  • this code baseline is ready to support Grid Site Delegtion 2, at the moment disabled given that for backward-compatibility needs coming from external packages the project has been build agains Gridsite 1.1.18
 
Changed:
<
<
   * restructured  jobwrapper
>
>
Configuration changes
 
Changed:
<
<
   * dumping the ISM can be done more often at a lesser cost, simply by creating a jobdir request (basically a file) like this one: [ command = "ism_dump"; ]
>
>
LBProxy (= true) in Common section, has been moved from [WorkloadManageProxy] section (see above).
 
Changed:
<
<
This code baseline is ready to support Grid Site DELEGATION 2.
>
>
RuntimeMalloc [WorkloadManager]: allows to the use an alternative malloc library (i.e. nedmalloc, google performance tools and many more), run-time redirecting with LD_PRELOAD. Possible values are, for example, RuntimeMalloc = "/usr/lib/libtcmalloc_minimal.so" if you use Google malloc. IsmThreads (= true) [WMConfiguration]: The new WM core processes each computation/request in a thread pool (WorkerThreads being the number) . Among many other benefits, this also helps keeping the number of threads controlled, so that it should be possibly set matching the number of physical cores. This is what happens with IsmThreads set to false. Now, think of all the WorkerThreads threads being busy at, say, submitting 1000 nodes collections. It would happen that ISM requests would have to wait, even if on top of the queue (they have the highest priority). To set-up a safer configuration for similar use-cases, one can specify for the WM to handle ISM requests as separate threads (not traversing any queue and not being part of a pool), hence IsmThreads = true.
 
Changed:
<
<
Configuration changes

LBProxy (= true) in Common section. See above.

RuntimeMalloc [WMConfiguration] TODO
IsmThreads (= true) [WMConfiguration] TODO
QueueSize (= 1000) [WMConfiguration] TODO

>
>
QueueSize (= 1000) [WorkloadManager]: Size of the queue of events "ready" to be managed by the workers thread pool
  Some remarks:

workaround "EnableZippedISB = false" in the JDL can be removed

Changed:
<
<
Given the increased performance of the WM, the next bottleneck could be represented by huge backlogs to JC, hence to Condor not keeping up the pace (in the way Condor is used within our design)). What can be done to avoid this effect:
>
>
given the increased performance of the WM, backlogs to JC/ICE side could be experimented. For the JC this basically stems from Condor not keeping up the pace (in the way Condor is used within our design). What can be done to avoid this effect:
  1) Make sure jobdir is enabled as the JC input

2) Consider using a machine with two disks and two separate controllers. Use one for the JC/Condor stuff

Changed:
<
<
3) A different physical layout for equally distributing the load between the WMS and LB is suggested. It always require two machines for a WMS+LB node, but differently distributed.Check out here https://twiki.cnaf.infn.it/cgi-bin/twiki/view/EgeeJra1It/DistributedWMS

-- MarcoCecchi - 03 Nov 2008

>
>
3) A different physical layout for equally distributing the load between the WMS and LB is suggested. It always require two machines for a WMS+LB node, but differently distributed.Check out here https://twiki.cnaf.infn.it/cgi-bin/twiki/view/EgeeJra1It/DistributedWMS

Revision 162009-03-28 - MarcoCecchi

Line: 1 to 1
 
META TOPICPARENT name="WebHome"
Changed:
<
<
This patch implements a significant redesign of the WMS core. The remainder of the code restructuring activity which took place during EGEE-II is finally present in this release.
>
>
Savannah patch 2597 stands for a significant redesign of the WMS. That's basically (but absolutely not only) the end result of the code restructuring activity which took place during EGEE-II. It includes a redesign of the core architecture, restructuring of the WMProxy interface and a reviewed ISM structure which allows for an optimized algorithm for match-making with data.
In a few words, this release in target at providing a more
performant, responsive and lightweight WMS. 
 
Changed:
<
<
Among the major problems shown by the previous releases of the gLite WMS, the efficacy of the match-making and the WM in general needs to affect not only large collections but also single jobs, among which resubmissions and pending are typically found.
Prior to the present release single job processing was done almost serially, because of "big" locks on wide accessed data structures. This redesign will also allow for faster operations on the ISM, such as dumping or updating.
The so-called Task Queue is replaced by a prioritized event queue where requests queues up waiting to be processed in a as stateless as possible fashion (so that several locks have been consequently removed).
Periodic activities, handled as timed events, will re-schedule themselves to show-up at a given time in the ready queue. The number of threads is kept controlled, in such a way that each and every request, whatever kind, is processed by the same thread pool. The WM core becomes essentially a multi-threaded function executor, without the clear-cut dispatcher/request handler distinction.
>
>
Among the major problems shown by the previous releases of the gLite WMS was the match-making performance. Bulk match-making for collections  was an effective feature, but the WMS performance is affected by also single jobs, among which resubmissions and pending are always found even when relying on collections for one's computing model. In this respect, the new architecture can be utilized in a wider scenario of submission use-cases.
Prior to the present release single job processing was done almost serially, because of "big" locks on concurrently accessed data structures. This redesign will also allow for faster operations on the ISM, such as dumping or updating (deleting expired entries and so on).
The so-called Task Queue is replaced by a prioritized event queue model where requests queues up waiting to be processed in a as stateless as possible fashion. Periodic activities, handled as "timed" events, will re-schedule themselves to show-up at a given time in the "ready" queue. The number of threads is kept controlled, in such a way that each and every request for processing - a "submit" request as well a "purchase from BDII" request -, is processed by the same thread pool. The WM core becomes essentially a multi-threaded function executor, without the clear-cut dispatcher/request handler distinction which alose required syncronization on several data structures.
 
Changed:
<
<
...
>
>
Situations like the following, with serialized match-making, are unfortunately known to everyone who has ever dealt with the WMS:

09 Oct, 19:41:16 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/ubZmW8I1u5xiaUIlJiFcsg (0/6042 [12.67, 16.6] )
09 Oct, 19:41:20 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/Y-JZ3DEbqJbordWuIvkIxg (0/6042 [10.84, 15.47] )
09 Oct, 19:41:28 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/MrYWfFnejXhh43SHX0Lp3g (0/6042 [12.39, 19.95] )
09 Oct, 19:41:32 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/jg_p9HyHKGaTUB0qIV1_Ww (2/6042 [16.76, 20.33] )
09 Oct, 19:41:39 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/fQEtbp4kaF9qTicc-0bTig (0/6042 [19.25, 26.89] )
09 Oct, 19:41:46 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/lvHFKWrE5YkG2EWAUOe0jw (0/6042 [21.81, 28.56] )
09 Oct, 19:41:50 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/HRFXvdB2iGA9ylbvvAUmLA (0/6042 [25.48, 29.72] )
09 Oct, 19:41:58 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/PIcmoaE8q9a7oLcRDSFUgQ (2/6042 [19.52, 27.09] )
09 Oct, 19:42:02 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/M03ebqXLm_8blgrHOa9Jrw (2/6042 [20.88, 25.28] )

 
Changed:
<
<
In short to avoid situations like the following:
>
>
...
 
Changed:
<
<
09 Oct, 19:41:16 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/ubZmW8I1u5xiaUIlJiFcsg (0/6042 [12.67, 16.6] )
09 Oct, 19:41:20 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/Y-JZ3DEbqJbordWuIvkIxg (0/6042 [10.84, 15.47] )
09 Oct, 19:41:28 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/MrYWfFnejXhh43SHX0Lp3g (0/6042 [12.39, 19.95] )
09 Oct, 19:41:32 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/jg_p9HyHKGaTUB0qIV1_Ww (2/6042 [16.76, 20.33] )
09 Oct, 19:41:39 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/fQEtbp4kaF9qTicc-0bTig (0/6042 [19.25, 26.89] )
09 Oct, 19:41:46 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/lvHFKWrE5YkG2EWAUOe0jw (0/6042 [21.81, 28.56] )
09 Oct, 19:41:50 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/HRFXvdB2iGA9ylbvvAUmLA (0/6042 [25.48, 29.72] )
09 Oct, 19:41:58 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/PIcmoaE8q9a7oLcRDSFUgQ (2/6042 [19.52, 27.09] )
09 Oct, 19:42:02 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/M03ebqXLm_8blgrHOa9Jrw (2/6042 [20.88, 25.28] )
09 Oct, 19:42:06 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/LoGgBGlLe58Hg1PEv5yYvg (2/6042 [14.32, 17.78] )
09 Oct, 19:42:13 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/8FxXoeHMpC3voxUz2JL-jA (2/6042 [7.53, 15.15] )
>
>
the introduction of parallel match-making together with the removal of previously required huge-scoped locks definitely fixes such patterns, allowing full exploitation of multi-core architectures.
 

Newly introduced features:

Changed:
<
<
   * run-time selection of LB/LBproxy (this has to be unified given that component cannot work with different settings not to break job status machine state computation). The attribute is LBProxy in the "Common" section of the configuration file, so it has moved from section WMProxy to Common.
   * Hiighest Match-making performance factor thanks to: 1) faster Match-making algorithm 2) parallel Match-making
   * added support for IPv6
   * added support for Grid site DELEGATION 2,  API bindings???
>
>
   * run-time selection of LB type: server or proxy.
 
Added:
>
>
Typically for small VOs (but not only as we will see later), it can make sense to install both WMS and LB on a single machine.  In such circumstances, the use of LB proxy (a cache of jobs under processing) is discouraged to avoid storing twice the same events (this will change with the advent of LB 2.0 wherever LBserver and LBProxy are colocated). Also, configuring the whole system to work with LBserver instead of LB proxy needs to be done once for each and every component for correct job state computation. The previous versions allowed (not correctly) to mix up use of LBserver and proxy. See the "configuration changes" section for more details.

   * highest match-making performance thanks to a review of the Match-making algorithm

   * parallel match-making

   * ISM: restructured algorithm for matchmaking in case of data requirements specified in the JDL.

The new algorithm performs a reverse search starting from data resolution reducing the size of the search space to only those computing resources mounting the storage providing the specified files. Integration of the new restructured data structure within the broker/helper/brokerinfo modules allowing a faster collection of all the data relevant to the construction of the brokerinfo file while performing the MM. This allows the generation of the brokerinfo file without any further query to the ISM for extracting storage infomation.    * added support for IPv6

  

 * cancel for collections
Changed:
<
<
   * improved error reporting for DAGs
   * dumping the ISM can be done more often at a lesser cost, simply by creating a jobdir request (basically a file) like this one: [ command = "ism_dump"; ]
   * <p />removed perl code from the jobwrapper
   * <p />jobdir enabled as JC/ICE input -
   * The jw template is now cached at each WM start (restart to re-read it) - CE forward requirements
- removed perl code from the jobwrapper
- jobdir enabled as JC/ICE input
  - ISM, restructured algorithm for matchmaking in case of data requirements specified in the JDL. The new algorithm performs a reverse search starting from data resolution reducing the size of the search space to only those computing resources mounting the storage providing the specified files. Integration of the new restructured data structure within the broker/helper/brokerinfo modules allowing a faster collection of all the data relevant to the construction of the brokerinfo file while performing the MM. This allow the generation of the brokerinfo without any further query to the ISM for extracting storage infomation.
   * ...
>
>
   * improved error reporting for DAGs

   * jobdir enabled also as JC/ICE input

   * the jobwrapper template is now cached at each WM start (restart to re-read it)

   * restructured  jobwrapper

   * dumping the ISM can be done more often at a lesser cost, simply by creating a jobdir request (basically a file) like this one: [ command = "ism_dump"; ]

This code baseline is ready to support Grid Site DELEGATION 2.

Configuration changes

 
Changed:
<
<
Newly introduced configuration parameters: LBProxy (= true) in Common section.
For small VOs, it can make sense to install both WMS and LB on a single machine.  In such circumstances, the use of LB proxy (a cache of jobs under processing - at the moment used only by the WM) is discouraged to avoid storing twice the same events (this will change with the advent of LB 2.0). Also, configuring the system with LBserver instead of LB proxy has to be done for each and every component, for correct job state computation.
>
>
LBProxy (= true) in Common section. See above.
  RuntimeMalloc [WMConfiguration] TODO
IsmThreads (= true) [WMConfiguration] TODO
QueueSize (= 1000) [WMConfiguration] TODO
Line: 33 to 71
  1) Make sure jobdir is enabled as the JC input
Changed:
<
<
2) Consider using a machine with two disks and two separate controllers. Use one for the LB database.
>
>
2) Consider using a machine with two disks and two separate controllers. Use one for the JC/Condor stuff
  3) A different physical layout for equally distributing the load between the WMS and LB is suggested. It always require two machines for a WMS+LB node, but differently distributed.Check out here https://twiki.cnaf.infn.it/cgi-bin/twiki/view/EgeeJra1It/DistributedWMS

Revision 152009-03-27 - MarcoCecchi

Line: 1 to 1
 
META TOPICPARENT name="WebHome"
This patch implements a significant redesign of the WMS core. The remainder of the code restructuring activity which took place during EGEE-II is finally present in this release.
Line: 9 to 9
 In short to avoid situations like the following:

09 Oct, 19:41:16 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/ubZmW8I1u5xiaUIlJiFcsg (0/6042 [12.67, 16.6] )
09 Oct, 19:41:20 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/Y-JZ3DEbqJbordWuIvkIxg (0/6042 [10.84, 15.47] )
09 Oct, 19:41:28 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/MrYWfFnejXhh43SHX0Lp3g (0/6042 [12.39, 19.95] )
09 Oct, 19:41:32 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/jg_p9HyHKGaTUB0qIV1_Ww (2/6042 [16.76, 20.33] )
09 Oct, 19:41:39 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/fQEtbp4kaF9qTicc-0bTig (0/6042 [19.25, 26.89] )
09 Oct, 19:41:46 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/lvHFKWrE5YkG2EWAUOe0jw (0/6042 [21.81, 28.56] )
09 Oct, 19:41:50 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/HRFXvdB2iGA9ylbvvAUmLA (0/6042 [25.48, 29.72] )
09 Oct, 19:41:58 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/PIcmoaE8q9a7oLcRDSFUgQ (2/6042 [19.52, 27.09] )
09 Oct, 19:42:02 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/M03ebqXLm_8blgrHOa9Jrw (2/6042 [20.88, 25.28] )
09 Oct, 19:42:06 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/LoGgBGlLe58Hg1PEv5yYvg (2/6042 [14.32, 17.78] )
09 Oct, 19:42:13 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/8FxXoeHMpC3voxUz2JL-jA (2/6042 [7.53, 15.15] )

Changed:
<
<
Newly introduced features:
>
>
 
Changed:
<
<
   * run-time selection of LB/LBproxy (this has to be unified given that component cannot work with different settings not to break job status machine state computation). The attribute is LBProxy in the "Common" section of the configuration file, so it has moved from section WMProxy to Common.
   * 10x Match-making performance factor thanks to: 1) faster Match-making algorithm 2) parallel Match-making
   * added support for IPv6
   * added support for Grid site DELEGATION 2,  API bindings???*****
>
>
Newly introduced features:
 
Changed:
<
<
* cancel for collections (untested)
>
>
   * run-time selection of LB/LBproxy (this has to be unified given that component cannot work with different settings not to break job status machine state computation). The attribute is LBProxy in the "Common" section of the configuration file, so it has moved from section WMProxy to Common.
   * Hiighest Match-making performance factor thanks to: 1) faster Match-making algorithm 2) parallel Match-making
   * added support for IPv6
   * added support for Grid site DELEGATION 2,  API bindings???

* cancel for collections

     * improved error reporting for DAGs
   * dumping the ISM can be done more often at a lesser cost, simply by creating a jobdir request (basically a file) like this one: [ command = "ism_dump"; ]
   * <p />removed perl code from the jobwrapper
   * <p />jobdir enabled as JC/ICE input -
   * The jw template is now cached at each WM start (restart to re-read it) - CE forward requirements
- removed perl code from the jobwrapper
- jobdir enabled as JC/ICE input
  - ISM, restructured algorithm for matchmaking in case of data requirements specified in the JDL. The new algorithm performs a reverse search starting from data resolution reducing the size of the search space to only those computing resources mounting the storage providing the specified files. Integration of the new restructured data structure within the broker/helper/brokerinfo modules allowing a faster collection of all the data relevant to the construction of the brokerinfo file while performing the MM. This allow the generation of the brokerinfo without any further query to the ISM for extracting storage infomation.
   * ...

Newly introduced configuration parameters: LBProxy (= true) in Common section.
For small VOs, it can make sense to install both WMS and LB on a single machine.  In such circumstances, the use of LB proxy (a cache of jobs under processing - at the moment used only by the WM) is discouraged to avoid storing twice the same events (this will change with the advent of LB 2.0). Also, configuring the system with LBserver instead of LB proxy has to be done for each and every component, for correct job state computation.

Changed:
<
<
RuntimeMalloc [WMConfiguration]
IsmThreads (= true) [WMConfiguration]
QueueSize (= 1000) [WMConfiguration]
>
>
RuntimeMalloc [WMConfiguration] TODO
IsmThreads (= true) [WMConfiguration] TODO
QueueSize (= 1000) [WMConfiguration] TODO
  Some remarks:
Line: 30 to 33
  1) Make sure jobdir is enabled as the JC input
Changed:
<
<
2)
>
>
2) Consider using a machine with two disks and two separate controllers. Use one for the LB database.
 
Changed:
<
<
3)A different physical layout for equally distributing the load
between the WMS and LB is suggested. It always require two machines for a WMS+LB node, but differently distibuted.
>
>
3) A different physical layout for equally distributing the load between the WMS and LB is suggested. It always require two machines for a WMS+LB node, but differently distributed.Check out here https://twiki.cnaf.infn.it/cgi-bin/twiki/view/EgeeJra1It/DistributedWMS
  -- MarcoCecchi - 03 Nov 2008 \ No newline at end of file

Revision 142009-03-26 - SalvatoreMonforte

Line: 1 to 1
 
META TOPICPARENT name="WebHome"
This patch implements a significant redesign of the WMS core. The remainder of the code restructuring activity which took place during EGEE-II is finally present in this release.
Line: 15 to 15
  * cancel for collections (untested)
Changed:
<
<
   * improved error reporting for DAGs
   * dumping the ISM can be done more often at a lesser cost, simply by creating a jobdir request (basically a file) like this one: [ command = "ism_dump"; ]
   * <p />removed perl code from the jobwrapper
   * <p />jobdir enabled as JC/ICE input -
   * The jw template is now cached at each WM start (restart to re-read it) - CE forward requirements
- removed perl code from the jobwrapper
- jobdir enabled as JC/ICE input
  - ISM,  ... Salvo to comment on brokerinfo generation TODO
   * ...
>
>
   * improved error reporting for DAGs
   * dumping the ISM can be done more often at a lesser cost, simply by creating a jobdir request (basically a file) like this one: [ command = "ism_dump"; ]
   * <p />removed perl code from the jobwrapper
   * <p />jobdir enabled as JC/ICE input -
   * The jw template is now cached at each WM start (restart to re-read it) - CE forward requirements
- removed perl code from the jobwrapper
- jobdir enabled as JC/ICE input
  - ISM, restructured algorithm for matchmaking in case of data requirements specified in the JDL. The new algorithm performs a reverse search starting from data resolution reducing the size of the search space to only those computing resources mounting the storage providing the specified files. Integration of the new restructured data structure within the broker/helper/brokerinfo modules allowing a faster collection of all the data relevant to the construction of the brokerinfo file while performing the MM. This allow the generation of the brokerinfo without any further query to the ISM for extracting storage infomation.
   * ...
  Newly introduced configuration parameters: LBProxy (= true) in Common section.
For small VOs, it can make sense to install both WMS and LB on a single machine.  In such circumstances, the use of LB proxy (a cache of jobs under processing - at the moment used only by the WM) is discouraged to avoid storing twice the same events (this will change with the advent of LB 2.0). Also, configuring the system with LBserver instead of LB proxy has to be done for each and every component, for correct job state computation.

Revision 132009-03-26 - MarcoCecchi

Line: 1 to 1
 
META TOPICPARENT name="WebHome"
This patch implements a significant redesign of the WMS core. The remainder of the code restructuring activity which took place during EGEE-II is finally present in this release.
Changed:
<
<
Among the major problems shown by the previous releases of the gLite WMS, the efficacy of the match-making and the WM in general needs to affect not only large collections but also single jobs, among which resubmissions and pending are typically found. Prior to the present release single job processing was done almost serially, because of "big" locks on wide accessed data structures. This redesign will also allow for faster operations on the ISM, such as dumping or updating. The so-called Task Queue is replaced by a prioritized event queue where requests queues up waiting to be processed in a as stateless as possible fashion (so that several locks have been consequently removed). Periodic activities, handled as timed events, will re-schedule themselves to show-up at a given time in the ready queue. The number of threads is kept controlled, in such a way that each and every request, whatever kind, is processed by the same thread pool. The WM core becomes essentially a multi-threaded function executor, without the clear-cut dispatcher/request handler distinction.

- WMPROXY refactoring; Luca to comment timeout list-match delegation code restructuring and bug fixing

Newly introduced features:

  • parallel Match-making
  • faster Match-making algorithm
  • added support for IPv6

  • added support for Grid site DELEGATION 2
  • improved error reporting for DAGs
  • Dumping the ISM can be done more often at a lesser cost, simply by creating a jobdir request (basically a file) like this one: [ command = "ism_dump"; ]

  • run-time selection of LB/LBproxy for each service: attribute LBProxy in the "Common" section of the configuration file

  • removed perl code from the jobwrapper

  • jobdir enabled as JC/ICE input -

  • The jw template is now cached at each WM start (restart to re-read it) - CE forward requirements - LDAP querying restructuring - ISM, ... Salvo to comment TODO
  • ...

Newly introduced configuration parameters: LBProxy (= true) in Common section. For small VOs, it can make sense to install both WMS and LB on a single machine. In such circumstances, the use of LB proxy (a cache of jobs under processing - at the moment used only by the WM) is discouraged to avoid storing twice the same events (this will change with the advent of LB 2.0). Also, configuring the system with LBserver instead of LB proxy has to be done for each and every component, for correct job state computation.

RuntimeMalloc [WMConfiguration] IsmThreads (= true) [WMConfiguration] QueueSize (= 1000) [WMConfiguration]

>
>
Among the major problems shown by the previous releases of the gLite WMS, the efficacy of the match-making and the WM in general needs to affect not only large collections but also single jobs, among which resubmissions and pending are typically found.
Prior to the present release single job processing was done almost serially, because of "big" locks on wide accessed data structures. This redesign will also allow for faster operations on the ISM, such as dumping or updating.
The so-called Task Queue is replaced by a prioritized event queue where requests queues up waiting to be processed in a as stateless as possible fashion (so that several locks have been consequently removed).
Periodic activities, handled as timed events, will re-schedule themselves to show-up at a given time in the ready queue. The number of threads is kept controlled, in such a way that each and every request, whatever kind, is processed by the same thread pool. The WM core becomes essentially a multi-threaded function executor, without the clear-cut dispatcher/request handler distinction.

...

In short to avoid situations like the following:

09 Oct, 19:41:16 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/ubZmW8I1u5xiaUIlJiFcsg (0/6042 [12.67, 16.6] )
09 Oct, 19:41:20 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/Y-JZ3DEbqJbordWuIvkIxg (0/6042 [10.84, 15.47] )
09 Oct, 19:41:28 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/MrYWfFnejXhh43SHX0Lp3g (0/6042 [12.39, 19.95] )
09 Oct, 19:41:32 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/jg_p9HyHKGaTUB0qIV1_Ww (2/6042 [16.76, 20.33] )
09 Oct, 19:41:39 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/fQEtbp4kaF9qTicc-0bTig (0/6042 [19.25, 26.89] )
09 Oct, 19:41:46 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/lvHFKWrE5YkG2EWAUOe0jw (0/6042 [21.81, 28.56] )
09 Oct, 19:41:50 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/HRFXvdB2iGA9ylbvvAUmLA (0/6042 [25.48, 29.72] )
09 Oct, 19:41:58 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/PIcmoaE8q9a7oLcRDSFUgQ (2/6042 [19.52, 27.09] )
09 Oct, 19:42:02 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/M03ebqXLm_8blgrHOa9Jrw (2/6042 [20.88, 25.28] )
09 Oct, 19:42:06 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/LoGgBGlLe58Hg1PEv5yYvg (2/6042 [14.32, 17.78] )
09 Oct, 19:42:13 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/8FxXoeHMpC3voxUz2JL-jA (2/6042 [7.53, 15.15] ) Newly introduced features:

   * run-time selection of LB/LBproxy (this has to be unified given that component cannot work with different settings not to break job status machine state computation). The attribute is LBProxy in the "Common" section of the configuration file, so it has moved from section WMProxy to Common.
   * 10x Match-making performance factor thanks to: 1) faster Match-making algorithm 2) parallel Match-making
   * added support for IPv6
   * added support for Grid site DELEGATION 2,  API bindings???*****

* cancel for collections (untested)

   * improved error reporting for DAGs
   * dumping the ISM can be done more often at a lesser cost, simply by creating a jobdir request (basically a file) like this one: [ command = "ism_dump"; ]
   * <p />removed perl code from the jobwrapper
   * <p />jobdir enabled as JC/ICE input -
   * The jw template is now cached at each WM start (restart to re-read it) - CE forward requirements
- removed perl code from the jobwrapper
- jobdir enabled as JC/ICE input
  - ISM,  ... Salvo to comment on brokerinfo generation TODO
   * ...

Newly introduced configuration parameters: LBProxy (= true) in Common section.
For small VOs, it can make sense to install both WMS and LB on a single machine.  In such circumstances, the use of LB proxy (a cache of jobs under processing - at the moment used only by the WM) is discouraged to avoid storing twice the same events (this will change with the advent of LB 2.0). Also, configuring the system with LBserver instead of LB proxy has to be done for each and every component, for correct job state computation.

RuntimeMalloc [WMConfiguration]
IsmThreads (= true) [WMConfiguration]
QueueSize (= 1000) [WMConfiguration]

  Some remarks:
Line: 45 to 32
  2)
Changed:
<
<
3)A different physical layout for equally distributing the load between the WMS and LB is suggested. It always require two machines for a WMS+LB node, but differently distibuted.
>
>
3)A different physical layout for equally distributing the load
between the WMS and LB is suggested. It always require two machines for a WMS+LB node, but differently distibuted.
  -- MarcoCecchi - 03 Nov 2008

Revision 122009-03-23 - MarcoCecchi

Line: 1 to 1
 
META TOPICPARENT name="WebHome"
Added:
>
>
This patch implements a significant redesign of the WMS core. The remainder of the code restructuring activity which took place during EGEE-II is finally present in this release.
 Among the major problems shown by the previous releases of the gLite WMS, the efficacy of the match-making and the WM in general needs to affect not only large collections but also single jobs, among which resubmissions and pending are typically found. Prior to the present release single job processing was done almost serially, because of "big" locks on wide accessed data structures. This redesign will also allow for faster operations on the ISM, such as dumping or updating. The so-called Task Queue is replaced by a prioritized event queue where requests queues up waiting to be processed in a as stateless as possible fashion (so that several locks have been consequently removed).
Line: 13 to 15
 Newly introduced features:
  • parallel Match-making
  • faster Match-making algorithm
Changed:
<
<
  • added support for IPv6

>
>
  • added support for IPv6

 
  • added support for Grid site DELEGATION 2
  • improved error reporting for DAGs
  • Dumping the ISM can be done more often at a lesser cost, simply by creating a jobdir request (basically a file) like this one: [ command = "ism_dump"; ]

  • run-time selection of LB/LBproxy for each service: attribute LBProxy in the "Common" section of the configuration file

Changed:
<
<
  • removed perl code from the jobwrapper

>
>
  • removed perl code from the jobwrapper

 
  • jobdir enabled as JC/ICE input -

  • The jw template is now cached at each WM start (restart to re-read it) - CE forward requirements - LDAP querying restructuring
Line: 37 to 39
  workaround "EnableZippedISB = false" in the JDL can be removed
Added:
>
>
Given the increased performance of the WM, the next bottleneck could be represented by huge backlogs to JC, hence to Condor not keeping up the pace (in the way Condor is used within our design)). What can be done to avoid this effect:

1) Make sure jobdir is enabled as the JC input

 
Added:
>
>
2)
 
Added:
>
>
3)A different physical layout for equally distributing the load between the WMS and LB is suggested. It always require two machines for a WMS+LB node, but differently distibuted.
 
Deleted:
<
<
-- MarcoCecchi - 03 Nov 2008
 \ No newline at end of file
Added:
>
>
-- MarcoCecchi - 03 Nov 2008

Revision 112009-03-19 - MarcoCecchi

Line: 1 to 1
 
META TOPICPARENT name="WebHome"
Among the major problems shown by the previous releases of the gLite WMS, the efficacy of the match-making and the WM in general needs to affect not only large collections but also single jobs, among which resubmissions and pending are typically found. Prior to the present release single job processing was done almost serially, because of "big" locks on wide accessed data structures. This redesign will also allow for faster operations on the ISM, such as dumping or updating.
Line: 6 to 6
 Periodic activities, handled as timed events, will re-schedule themselves to show-up at a given time in the ready queue. The number of threads is kept controlled, in such a way that each and every request, whatever kind, is processed by the same thread pool. The WM core becomes essentially a multi-threaded function executor, without the clear-cut dispatcher/request handler distinction.

- WMPROXY refactoring; Luca to comment

Added:
>
>
timeout list-match delegation code restructuring and bug fixing
  Newly introduced features:
Added:
>
>
  • parallel Match-making
  • faster Match-making algorithm
 
  • added support for IPv6

  • added support for Grid site DELEGATION 2
  • improved error reporting for DAGs
Line: 15 to 20
 
  • run-time selection of LB/LBproxy for each service: attribute LBProxy in the "Common" section of the configuration file

  • removed perl code from the jobwrapper

  • jobdir enabled as JC/ICE input -

Deleted:
<
<
* sandbox transport protocol as a jdl parameter
 
  • The jw template is now cached at each WM start (restart to re-read it) - CE forward requirements
Changed:
<
<
- ISM ... Salvo to comment TODO
>
>
- LDAP querying restructuring - ISM, ... Salvo to comment TODO
 
  • ...

Newly introduced configuration parameters: LBProxy (= true) in Common section. For small VOs, it can make sense to install both WMS and LB on a single machine. In such circumstances, the use of LB proxy (a cache of jobs under processing - at the moment used only by the WM) is discouraged to avoid storing twice the same events (this will change with the advent of LB 2.0). Also, configuring the system with LBserver instead of LB proxy has to be done for each and every component, for correct job state computation.

Changed:
<
<
IsmThreads (= true) QueueSize (= 1000)
>
>
RuntimeMalloc [WMConfiguration] IsmThreads (= true) [WMConfiguration] QueueSize (= 1000) [WMConfiguration]

Some remarks:

workaround "EnableZippedISB = false" in the JDL can be removed

 

Revision 102009-03-14 - MarcoCecchi

Line: 1 to 1
 
META TOPICPARENT name="WebHome"
Changed:
<
<
After the introduction of collections and bulk match-making, which represented an effective optimisation on real-life use-cases, the efficacy of the match-making and the WM in general needs to affect not only large collections but also single jobs, among which resubmissions and pending are typically found. Prior to the present release single job processing was done almost serially, because of "big" locks on wide accessed data structures. This redesign will also allow for faster operations on the ISM, such as dumping or updating.
>
>
Among the major problems shown by the previous releases of the gLite WMS, the efficacy of the match-making and the WM in general needs to affect not only large collections but also single jobs, among which resubmissions and pending are typically found. Prior to the present release single job processing was done almost serially, because of "big" locks on wide accessed data structures. This redesign will also allow for faster operations on the ISM, such as dumping or updating.
 The so-called Task Queue is replaced by a prioritized event queue where requests queues up waiting to be processed in a as stateless as possible fashion (so that several locks have been consequently removed). Periodic activities, handled as timed events, will re-schedule themselves to show-up at a given time in the ready queue. The number of threads is kept controlled, in such a way that each and every request, whatever kind, is processed by the same thread pool. The WM core becomes essentially a multi-threaded function executor, without the clear-cut dispatcher/request handler distinction.
Changed:
<
<
- WMPROXY refactoring; TODO
>
>
- WMPROXY refactoring; Luca to comment
  Newly introduced features:
  • added support for IPv6

Line: 19 to 20
  - ISM ... Salvo to comment TODO
  • ...
Deleted:
<
<
 Newly introduced configuration parameters:
Changed:
<
<
LBProxy (= true) in Common section. For correct job state computation use or LB proxy and server cannot be mixed up. To avoid storing events twice in the same DB when WMS and LB are installed on the same machine, it would be helpful using LBServer instead of LBProxy.
>
>
LBProxy (= true) in Common section. For small VOs, it can make sense to install both WMS and LB on a single machine. In such circumstances, the use of LB proxy (a cache of jobs under processing - at the moment used only by the WM) is discouraged to avoid storing twice the same events (this will change with the advent of LB 2.0). Also, configuring the system with LBserver instead of LB proxy has to be done for each and every component, for correct job state computation.
 IsmThreads (= true) QueueSize (= 1000)

Revision 92009-02-23 - MarcoCecchi

Line: 1 to 1
 
META TOPICPARENT name="WebHome"
After the introduction of collections and bulk match-making, which represented an effective optimisation on real-life use-cases, the efficacy of the match-making and the WM in general needs to affect not only large collections but also single jobs, among which resubmissions and pending are typically found. Prior to the present release single job processing was done almost serially, because of "big" locks on wide accessed data structures. This redesign will also allow for faster operations on the ISM, such as dumping or updating. The so-called Task Queue is replaced by a prioritized event queue where requests queues up waiting to be processed in a as stateless as possible fashion (so that several locks have been consequently removed).
Line: 20 to 20
 
  • ...
Added:
>
>
Newly introduced configuration parameters: LBProxy (= true) in Common section. For correct job state computation use or LB proxy and server cannot be mixed up. To avoid storing events twice in the same DB when WMS and LB are installed on the same machine, it would be helpful using LBServer instead of LBProxy. IsmThreads (= true) QueueSize (= 1000)

  -- MarcoCecchi - 03 Nov 2008 \ No newline at end of file

Revision 82009-02-20 - MarcoCecchi

Line: 1 to 1
 
META TOPICPARENT name="WebHome"
After the introduction of collections and bulk match-making, which represented an effective optimisation on real-life use-cases, the efficacy of the match-making and the WM in general needs to affect not only large collections but also single jobs, among which resubmissions and pending are typically found. Prior to the present release single job processing was done almost serially, because of "big" locks on wide accessed data structures. This redesign will also allow for faster operations on the ISM, such as dumping or updating. The so-called Task Queue is replaced by a prioritized event queue where requests queues up waiting to be processed in a as stateless as possible fashion (so that several locks have been consequently removed).
Line: 11 to 11
 
  • added support for Grid site DELEGATION 2
  • improved error reporting for DAGs
  • Dumping the ISM can be done more often at a lesser cost, simply by creating a jobdir request (basically a file) like this one: [ command = "ism_dump"; ]

Changed:
<
<
  • run-time selection of LB/LBproxy for each service *TODO

>
>
  • run-time selection of LB/LBproxy for each service: attribute LBProxy in the "Common" section of the configuration file

 
  • removed perl code from the jobwrapper

  • jobdir enabled as JC/ICE input - * sandbox transport protocol as a jdl parameter

Revision 72009-02-19 - MarcoCecchi

Line: 1 to 1
 
META TOPICPARENT name="WebHome"
After the introduction of collections and bulk match-making, which represented an effective optimisation on real-life use-cases, the efficacy of the match-making and the WM in general needs to affect not only large collections but also single jobs, among which resubmissions and pending are typically found. Prior to the present release single job processing was done almost serially, because of "big" locks on wide accessed data structures. This redesign will also allow for faster operations on the ISM, such as dumping or updating. The so-called Task Queue is replaced by a prioritized event queue where requests queues up waiting to be processed in a as stateless as possible fashion (so that several locks have been consequently removed).
Line: 16 to 16
 
  • jobdir enabled as JC/ICE input - * sandbox transport protocol as a jdl parameter

  • The jw template is now cached at each WM start (restart to re-read it) - CE forward requirements
Changed:
<
<
- ISM ... Salvo - curl TODO
>
>
- ISM ... Salvoto comment TODO
 
  • ...
Deleted:
<
<
proxy renewal bug - terminated collections still registered
 
Deleted:
<
<
Bug fixing:

  • Fix for bug #39903: Fermilab proxy cannot submit to WMS SL4, they are ok with SL3
    committed to 3_1_0, 3_2_0 and HEAD
    glite-wms-wmproxy_R_3_1_40_1
  • Fix for bug #39903: Fermilab proxy cannot submit to WMS SL4, they are ok with SL3
    committed to 3_1_0, 3_2_0 and HEAD
    glite-wms-wmproxy_R_3_1_40_1
  • Fix for bug #45389: wms proxy certificate expires while the purger is using it
  • Fix for bug #41720: command glite-brokerinfo without specify option crash
  • glite-wms-brokerinfo-access_R_3_1_4_1

  • Fix for bug #43368: Long Nordugrid ARC Jobs go into the HELD state and get resubmitted
  • committed to 3_1_0 and 3_2_0

  • Fix for bug #43370: grid job names don't get translated from gLite to Nordugrid ARC submission
  • Fix for bug #43545: messed up user DN logged by WMProxy
    glite-wms-wmproxy_R_3_1_40_1
  • Fix for bug bug #33103: Request for adding an feature to select only specific VO resources via an additional LDAP filter
  • Fix for bug bug# 43498 WMS needs more ISM logging
  • Fix for bug bug #44321 OutputData in the job wrapper should be implemented in terms of lcg-utils
  • -- MarcoCecchi - 03 Nov 2008

  • Fix for bug #45389: wms proxy certificate expires while the purger is using it
  • Fix for bug #41720: command glite-brokerinfo without specify option crash
  • glite-wms-brokerinfo-access_R_3_1_4_1

  • Fix for bug #43368: Long Nordugrid ARC Jobs go into the HELD state and get resubmitted
  • committed to 3_1_0 and 3_2_0

  • Fix for bug #43370: grid job names don't get translated from gLite to Nordugrid ARC submission
  • Fix for bug #43545: messed up user DN logged by WMProxy
    glite-wms-wmproxy_R_3_1_40_1
  • Fix for bug bug #33103: Request for adding an feature to select only specific VO resources via an additional LDAP filter
  • Fix for bug bug# 43498 WMS needs more ISM logging
  • Fix for bug bug #44321 OutputData in the job wrapper should be implemented in terms of lcg-utils
  •   -- MarcoCecchi - 03 Nov 2008

    Revision 62009-02-10 - MarcoCecchi

    Line: 1 to 1
     
    META TOPICPARENT name="WebHome"
    After the introduction of collections and bulk match-making, which represented an effective optimisation on real-life use-cases, the efficacy of the match-making and the WM in general needs to affect not only large collections but also single jobs, among which resubmissions and pending are typically found. Prior to the present release single job processing was done almost serially, because of "big" locks on wide accessed data structures. This redesign will also allow for faster operations on the ISM, such as dumping or updating. The so-called Task Queue is replaced by a prioritized event queue where requests queues up waiting to be processed in a as stateless as possible fashion (so that several locks have been consequently removed).
    Line: 6 to 6
      - WMPROXY refactoring; TODO
    Changed:
    <
    <
    Newly introduced features:
    • added support for IPv6

    >
    >
    Newly introduced features:
    • added support for IPv6

     
    • added support for Grid site DELEGATION 2
    • improved error reporting for DAGs
    Changed:
    <
    <
    • Dumping the ISM can be done more often at a lesser cost, simply by creating a jobdir request (basically a file) like this one: [ command = "ism_dump"; ]

    • run-time selection of LB/LBproxy for each service *TODO

    • removed perl code from the jobwrapper

    • jobdir enabled as JC/ICE input - as a parameter jdl sandbox transport protocol 

    • The jw template is now cached at each WM start (restart to re-read it) - CE forward requirements - ISM ... Salvo - curl TODO

    • ...
    >
    >
    • Dumping the ISM can be done more often at a lesser cost, simply by creating a jobdir request (basically a file) like this one: [ command = "ism_dump"; ]

    • run-time selection of LB/LBproxy for each service *TODO

    • removed perl code from the jobwrapper

    • jobdir enabled as JC/ICE input - * sandbox transport protocol as a jdl parameter

    • The jw template is now cached at each WM start (restart to re-read it) - CE forward requirements - ISM ... Salvo - curl TODO
    • ...
     
    Deleted:
    <
    <
    for me: - some not catched excections: open bug
     proxy renewal bug - terminated collections still registered

    Bug fixing:

    Revision 52009-01-14 - MarcoCecchi

    Line: 1 to 1
     
    META TOPICPARENT name="WebHome"
    Changed:
    <
    <
    After the introduction of collections and bulk match-making, which represented an effective optimisation on real-life use-cases, the efficacy of the match-making and the WM in general needs to affect not only large collections but also single jobs, among which resubmissions and pending are typically found. Prior to the present release single job processing was done almost serially, because of "big" locks on wide accessed data structures. This redesing will also allow for faster operations on the ISM, such as dumping or updating. .. This has required rewriting the core, as part of the restructuring done during EGEE-II. .. The so-called Task Queue is replaced by a prioritized event queue where requests queues up waiting to be processed in a as stateless as possible fashion (so that several locks have been consequently removed). In fact, the former approach required handling a state machine for managing internal states... dispatcher/request, approach quite prone to race conditions and needing big locks throughout. Periodic activities (timed events) will re-schedule themselves to show-up at a given time in the priority queue. The number of threads is kept controlled, in such a way that each and every request, whatever kind, is processed by the same thread pool. The WM core becomes essentially a multi-threaded function executor, without the clear-cut dispatcher/request handler distinction.
    >
    >
    After the introduction of collections and bulk match-making, which represented an effective optimisation on real-life use-cases, the efficacy of the match-making and the WM in general needs to affect not only large collections but also single jobs, among which resubmissions and pending are typically found. Prior to the present release single job processing was done almost serially, because of "big" locks on wide accessed data structures. This redesign will also allow for faster operations on the ISM, such as dumping or updating. The so-called Task Queue is replaced by a prioritized event queue where requests queues up waiting to be processed in a as stateless as possible fashion (so that several locks have been consequently removed). Periodic activities, handled as timed events, will re-schedule themselves to show-up at a given time in the ready queue. The number of threads is kept controlled, in such a way that each and every request, whatever kind, is processed by the same thread pool. The WM core becomes essentially a multi-threaded function executor, without the clear-cut dispatcher/request handler distinction.
      - WMPROXY refactoring; TODO

    Revision 42009-01-13 - MarcoCecchi

    Line: 1 to 1
     
    META TOPICPARENT name="WebHome"
    Added:
    >
    >
    After the introduction of collections and bulk match-making, which represented an effective optimisation on real-life use-cases, the efficacy of the match-making and the WM in general needs to affect not only large collections but also single jobs, among which resubmissions and pending are typically found. Prior to the present release single job processing was done almost serially, because of "big" locks on wide accessed data structures. This redesing will also allow for faster operations on the ISM, such as dumping or updating. .. This has required rewriting the core, as part of the restructuring done during EGEE-II. .. The so-called Task Queue is replaced by a prioritized event queue where requests queues up waiting to be processed in a as stateless as possible fashion (so that several locks have been consequently removed). In fact, the former approach required handling a state machine for managing internal states... dispatcher/request, approach quite prone to race conditions and needing big locks throughout. Periodic activities (timed events) will re-schedule themselves to show-up at a given time in the priority queue. The number of threads is kept controlled, in such a way that each and every request, whatever kind, is processed by the same thread pool. The WM core becomes essentially a multi-threaded function executor, without the clear-cut dispatcher/request handler distinction.

    - WMPROXY refactoring; TODO

    Newly introduced features:

    • added support for IPv6

    • added support for Grid site DELEGATION 2
    • improved error reporting for DAGs
    • Dumping the ISM can be done more often at a lesser cost, simply by creating a jobdir request (basically a file) like this one: [ command = "ism_dump"; ]

    • run-time selection of LB/LBproxy for each service *TODO

    • removed perl code from the jobwrapper

    • jobdir enabled as JC/ICE input - as a parameter jdl sandbox transport protocol 

    • The jw template is now cached at each WM start (restart to re-read it) - CE forward requirements - ISM ... Salvo - curl TODO

    • ...

    for me: - some not catched excections: open bug proxy renewal bug - terminated collections still registered

    Bug fixing:

  • Fix for bug #39903: Fermilab proxy cannot submit to WMS SL4, they are ok with SL3
    committed to 3_1_0, 3_2_0 and HEAD
    glite-wms-wmproxy_R_3_1_40_1
  •  
  • Fix for bug #39903: Fermilab proxy cannot submit to WMS SL4, they are ok with SL3
    committed to 3_1_0, 3_2_0 and HEAD
    glite-wms-wmproxy_R_3_1_40_1
  • Fix for bug #45389: wms proxy certificate expires while the purger is using it
  • Fix for bug #41720: command glite-brokerinfo without specify option crash
  • Changed:
    <
    <
    glite-wms-brokerinfo-access_R_3_1_4_1
    >
    >
    glite-wms-brokerinfo-access_R_3_1_4_1
     
  • Fix for bug #43368: Long Nordugrid ARC Jobs go into the HELD state and get resubmitted
  • Changed:
    <
    <
    committed to 3_1_0 and 3_2_0
    >
    >
    committed to 3_1_0 and 3_2_0
     
  • Fix for bug #43370: grid job names don't get translated from gLite to Nordugrid ARC submission
  • Line: 19 to 49
     
  • Fix for bug bug# 43498 WMS needs more ISM logging
  • Changed:
    <
    <
  • Fix for bug bug #44321 OutputData in the job wrapper should be implemented in terms of lcg-utils
  • >
    >
  • Fix for bug bug #44321 OutputData in the job wrapper should be implemented in terms of lcg-utils
  •   -- MarcoCecchi - 03 Nov 2008
    Added:
    >
    >
  • Fix for bug #45389: wms proxy certificate expires while the purger is using it
  • Fix for bug #41720: command glite-brokerinfo without specify option crash
  • glite-wms-brokerinfo-access_R_3_1_4_1

  • Fix for bug #43368: Long Nordugrid ARC Jobs go into the HELD state and get resubmitted
  • committed to 3_1_0 and 3_2_0

  • Fix for bug #43370: grid job names don't get translated from gLite to Nordugrid ARC submission
  • Fix for bug #43545: messed up user DN logged by WMProxy
    glite-wms-wmproxy_R_3_1_40_1
  • Fix for bug bug #33103: Request for adding an feature to select only specific VO resources via an additional LDAP filter
  • Fix for bug bug# 43498 WMS needs more ISM logging
  • Fix for bug bug #44321 OutputData in the job wrapper should be implemented in terms of lcg-utils
  • -- MarcoCecchi - 03 Nov 2008

    Revision 32008-12-16 - MarcoCecchi

    Line: 1 to 1
     
    META TOPICPARENT name="WebHome"
    Deleted:
    <
    <
     
  • Fix for bug #39903: Fermilab proxy cannot submit to WMS SL4, they are ok with SL3
    committed to 3_1_0, 3_2_0 and HEAD
    glite-wms-wmproxy_R_3_1_40_1
  • Line: 4 to 3
     
  • Fix for bug #39903: Fermilab proxy cannot submit to WMS SL4, they are ok with SL3
    committed to 3_1_0, 3_2_0 and HEAD
    glite-wms-wmproxy_R_3_1_40_1
  • Added:
    >
    >
  • Fix for bug #45389: wms proxy certificate expires while the purger is using it
  •  
  • Fix for bug #41720: command glite-brokerinfo without specify option crash
  • glite-wms-brokerinfo-access_R_3_1_4_1

    Revision 22008-12-02 - MarcoCecchi

    Line: 1 to 1
     
    META TOPICPARENT name="WebHome"
    Added:
    >
    >
  • Fix for bug #39903: Fermilab proxy cannot submit to WMS SL4, they are ok with SL3
    committed to 3_1_0, 3_2_0 and HEAD
    glite-wms-wmproxy_R_3_1_40_1
  • Fix for bug #41720: command glite-brokerinfo without specify option crash
  • glite-wms-brokerinfo-access_R_3_1_4_1

  • Fix for bug #43368: Long Nordugrid ARC Jobs go into the HELD state and get resubmitted
  • committed to 3_1_0 and 3_2_0

  • Fix for bug #43370: grid job names don't get translated from gLite to Nordugrid ARC submission
  • Fix for bug #43545: messed up user DN logged by WMProxy
    glite-wms-wmproxy_R_3_1_40_1
  • Fix for bug bug #33103: Request for adding an feature to select only specific VO resources via an additional LDAP filter
  • Fix for bug bug# 43498 WMS needs more ISM logging
  • Fix for bug bug #44321 OutputData in the job wrapper should be implemented in terms of lcg-utils
  •  
    Deleted:
    <
    <
    Hello, world.
      -- MarcoCecchi - 03 Nov 2008 \ No newline at end of file

    Revision 12008-11-03 - MarcoCecchi

    Line: 1 to 1
    Added:
    >
    >
    META TOPICPARENT name="WebHome"

    Hello, world.

    -- MarcoCecchi - 03 Nov 2008

     
    This site is powered by the TWiki collaboration platformCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
    Ideas, requests, problems regarding TWiki? Send feedback