Interoperability of heterogeneous large-scale scientific workflows and data resources

Kukla, T. 2011. Interoperability of heterogeneous large-scale scientific workflows and data resources. PhD thesis University of Westminster School of Electronics and Computer Science

TitleInteroperability of heterogeneous large-scale scientific workflows and data resources
TypePhD thesis
AuthorsKukla, T.
Abstract

Workflow allows e-Scientists to express their experimental processes in a structured

way and provides a glue to integrate remote applications. Since Grid provides an

enormously large amount of data and computational resources, executing workflows

on the Grid results in significant performance improvement. Several workflow management

systems, which are widely used by different scientific communities, were

developed for various purposes. Therefore, they differ in several aspects.

This thesis outlines two major problems of existing workflow systems: workflow

interoperability and data access. On the one hand, existing workflow systems are

based on different technologies. Therefore, to achieve interoperability between their

workflows at any level is a challenging task. In spite of the fact that there is a clear

demand for interoperable workflows, for example, to enable scientists to share workflows,

to leverage existing work of others, and to create multi-disciplinary workflows;

currently, there are only limited, ad-hoc workflow interoperability solutions available

for scientists. Existing solutions only realise workflow interoperability between

a small set of workflow systems and do not consider performance issues that arise

in the case of large-scale (computational and/or data intensive) scientific workflows.

Scientific workflows are typically computation and/or data intensive and are executed

in a distributed environment to speed up their execution time. Therefore,

their performance is a key issue. Existing interoperability solutions bottleneck the

communication between workflows in most scenarios dramatically increasing execution time. On the other hand, many scientific computational experiments are based

on data that reside in data resources which can be of different types and vendors.

Many workflow systems support access to limited subsets of such data resources

preventing data level workflow interoperation between different systems. Therefore,

there is a demand for a general solution that provides access to a wide range of data

resources of different types and vendors. If such a solution is general, in the sense

that it can be adopted by several workflow systems, then it also enables workflows

of different systems to access the same data resources and therefore interoperate at

data level. Note that data semantics are out of the scope of this work. For the

same reasons as described above, the performance characteristics of such a solution

are inevitably important. Although in terms of functionality, there are solutions

which could be adopted by workflow systems for this purpose, they provide poor

performance. For that reason, they did not gain wide acceptance by the scientific

workflow community.

Addressing these issues, a set of architectures is proposed to realise heterogeneous

data access and heterogeneous workflow execution solutions. The primary goal was

to investigate how such solutions can be implemented and integrated with workflow

systems. The secondary aim was to analyse how such solutions can be implemented

and utilised by single applications.

Year2011
FileTamas_KUKLA.pdf
Publication dates
Completed2011

Related outputs

Enabling Scientific Workflow Sharing through Coarse-Grained Interoperability
Terstyanszky, G., Kukla, T., Kiss, T., Kacsuk, P., Balasko, A. and Farkas, Z. 2014. Enabling Scientific Workflow Sharing through Coarse-Grained Interoperability. Future Generation Computing Systems: The International Journal of Grid Computing and eScience. 37, pp. 46-59.

Exploring workflow interoperability for neuroimage analysis on the SHIWA platform
Korkhov, V., Krefting, D., Kukla, T., Terstyanszky, G., Caan, M.W.A. and Olabarriaga, S.D. 2013. Exploring workflow interoperability for neuroimage analysis on the SHIWA platform. Journal of Grid Computing. 11 (3), pp. 505-522.

Application repository and science gateway for running molecular docking and dynamics simulations
Terstyanszky, G., Kiss, T., Kukla, T., Lichtenberger, Z., Winter, S., Greenwell, P., McEldowney, S. and Heindl, H. 2012. Application repository and science gateway for running molecular docking and dynamics simulations. in: Gesing, S., Glatard, T., Kruger, J., Delgado Olabarriaga, S., Solomonides, T., Silverstein, J.C., Montagnat, J., Gaignard, A. and Krefting, D. (ed.) Healthgrid applications and technologies meet science gateways for life sciences IOS Press.

Achieving interoperation of grid data resources via workflow level integration
Kiss, T. and Kukla, T. 2009. Achieving interoperation of grid data resources via workflow level integration. Journal of Grid Computing. 7 (3), pp. 355-374.

Integrating Open Grid Services Architecture Data Access and Integration with computational Grid workflows
Kukla, T., Kiss, T., Kacsuk, P. and Terstyanszky, G. 2009. Integrating Open Grid Services Architecture Data Access and Integration with computational Grid workflows. Philosophical Transactions of the Royal Society A: Mathematical, Physical & Engineering Sciences. 367 (1897), pp. 2521-2532.

A general and scalable solution for heterogeneous workflow invocation and nesting
Kukla, T., Kiss, T., Terstyanszky, G. and Kacsuk, P. 2008. A general and scalable solution for heterogeneous workflow invocation and nesting. in: Proceedings of the 3rd Workshop on Workflows in Support of Large-Scale Science, in conjunction with SC 2008, Austin, TX, USA, November 17 2008 IEEE . pp. 1-8

Towards Grid data interoperation: OGSA-DAI data resources in computational Grid workflows
Kiss, T., Kukla, T., Terstyanszky, G., Kacsuk, P. and Sipos, G. 2008. Towards Grid data interoperation: OGSA-DAI data resources in computational Grid workflows. in: Proc. of the CoreGRID Workshop “Integrated Research in Grid Computing”, Heraklion-Crete, Greece, 2-4 April 2008 Crete University Press.

High-level user interface for accessing database resources on the Grid
Kiss, T. and Kukla, T. 2008. High-level user interface for accessing database resources on the Grid. in: Kacsuk, P., Lovas, R. and Nemeth, Z. (ed.) Distributed and parallel systems: in focus: desktop grid computing Boston, MA Springer. pp. 155-163

Permalink - https://westminsterresearch.westminster.ac.uk/item/8zyyx/interoperability-of-heterogeneous-large-scale-scientific-workflows-and-data-resources


Share this
Tweet
Email