Workflow Management System
The Workflow Management System (WfMS) project is developed within the
CoreGRID european project. The main goal of this project is to provide an effective solution to run
complex scientific workflows modeled with
Petri Nets taking full advantage of the
distributed and
etherogeneous nature of the Grid. One of the design principle is the
neutrality towards the underlying mechanism for task execution, in order not to compromise
interoperability with multiple infrastuctures.
The project also aims at
language interoperability, placing attention to workflow description languages and introducing
language translators. The internal representation of a workflow is based on the High Level Petri Nets (HLPN) formalism due to its formal semantics and the existence of several analysis tools. The reference language of the WfMS is the
Grid Workflow Description Language (GWorkflowDL) which is based on the HLPN formalism.
The fist prototype of the system have been tested with workflows accessing resources available on the Grid provided by the
EGEE project, a large and relatively mature infrastucture. In particular, the execution of Grid jobs is performed by relying on the gLite
Workload Management System (WMS) through its
Web Service interface (WMProxy).
Papers
- CGW07
- Simone Pellegrini, Francesco Giacomini, Antonia Ghiselli: A Practical Approach for a Workflow Management System. In Proceedings of the CoreGRID Workshop 2007, Dresden, 2007.
Petri Nets in workflow modeling
Recent studies have demostrated that the modling capabilities of Petri Nets outperforms other formalisms tanks to the following properties:
- the formal semantics despite the graphical nature,
- state-based structure (as opposed to the event-based one),
- the availability of many analysis techniques.
A small tutorial about the Petri Nets, and their usage in the workflow management can be found
here.
More suitable than Petri Nets, the High Level Petri Nets (HLPN) formalism can be used for workflow modeling. The HLPN term is used for many Petri Nets formalisms that extend the basic P/T net formalism. This includes
coloured Petri Nets,
hierarchical Petri Nets, and
timed Petri Nets. The Grid Workflow Description Language is based on HLPN.
The Grid Workflow Descpription Language
The GWorkflowDL consists of two parts:
- a generic part, used to define the structure of the workflow, reflecting the data and control flow in the application,
- a middleware-specific part (extensions) that defines how the workflow should be executed in the context of a specific Grid computing middleware.
Considering the generic Petri Net depicted in the following figure:
The generic part of the workflow can be represented in the GWorkflowDL language as:
<workflow>
<place ID="p1">
<token><data><t1 xsd:type="xs:int">3</t1></data></token>
</place>
<place ID="p2">
<token><data><t2 xsd:type="xs:int">2</t2></data></token>
</place>
<place ID="q0" />
<transition ID="sum">
<inputPlace placeID="p1" edgeExpression="a1"/>
<inputPlace placeID="p2" edgeExpression="a2"/>
<outputPlace placeID="q0" edgeExpression="b"/>
<operation /> <!-- generic operation -->
</transition>
</workflow>
In the workflow there are no information about the operation associated to the Petri Net transiion
T. More detailed information are provided by the
concrete part of the description. A concrete operation could be the invocation of a
Web Service, or the
remote execution of a program or the invocation of a
local routine.
The enactment process, performed by the
WfMS, is responsable of mapping abstract operations -- associated to a Petri Net transisions -- to concrete operations. For example the
plus operation represented in the above workflow can be mapped to a Web Service invocation as described in the following concrete workflow:
<workflow>
<place ID="p1">
<token><data><t1 xsd:type="xs:int">3</t1></data></token>
</place>
<place ID="p2">
<token><data><t2 xsd:type="xs:int">2</t2></data></token>
</place>
<place ID="q0" />
<transition ID="sum">
<inputPlace placeID="p1" edgeExpression="a1"/>
<inputPlace placeID="p2" edgeExpression="a2"/>
<outputPlace placeID="q0" edgeExpression="b"/>
<operation>
<wsOperation wsdl="http://localhost/plus?wsdl" operationName="plus">
<in>n1</in>
<in>n2</in>
<out>q1</out>
</wsOperation>
</operation>
</transition>
</workflow>
Or, for example, mapped to the local method invocation as depicted in the following concrete workflow:
<workflow>
<place ID="p1">
<token><data><t1 xsd:type="xs:int">3</t1></data></token>
</place>
<place ID="p2">
<token><data><t2 xsd:type="xs:int">2</t2></data></token>
</place>
<place ID="q0" />
<transition ID="sum">
<inputPlace placeID="p1" edgeExpression="a1"/>
<inputPlace placeID="p2" edgeExpression="a2"/>
<outputPlace placeID="q0" edgeExpression="b"/>
<operation>
<pyOperation operation="b = a1 + a2" />
</operation>
</transition>
</workflow>
The WfMS and the EGEE/gLite Grid Middleware
The first prototype of the WfMS has been tested with workflows accessing the resources provided by the EGEE/gLite middleware. The gLite middleware takes care of finding the best available resources considering a set of users requirements and preferences (such as CPU architecture, OS, current load) disburdening the
WfMS from the resources management. In the following picture an high-level view of the system is provided:
--
SimonePellegrini - 19 Dec 2007