Checkpointing of parallel applications in a Grid environment

Sajadah, K. 2011. Checkpointing of parallel applications in a Grid environment. MPhil thesis University of Westminster School of Electronics and Computer Science https://doi.org/10.34737/90017

TitleCheckpointing of parallel applications in a Grid environment
TypeMPhil thesis
AuthorsSajadah, K.
Abstract

The Grid environment is generic, heterogeneous, and dynamic with lots of unreliable resources making it very exposed to failures. The environment is unreliable because it is geographically dispersed involving multiple autonomous administrative domains and it is composed of a large number of components. Examples of failures in the Grid environment can be: application crash, Grid node crash, network failures, and Grid system component failures. These types of failures can affect the execution of parallel/distributed application in the Grid environment and so, protections against these faults are crucial. Therefore, it is essential to develop efficient fault tolerant mechanisms to allow users to successfully execute Grid applications. One of the research challenges in Grid computing is to be able to develop a fault tolerant solution that will ensure Grid applications are executed reliably with minimum overhead incurred.
While checkpointing is the most common method to achieve fault tolerance, there is still a lot of work to be done to improve the efficiency of the mechanism. This thesis provides an in-depth description of a novel solution for checkpointing parallel applications executed on a Grid. The checkpointing mechanism implemented allows to checkpoint an application at regions where there is no interprocess communication involved and therefore reducing the checkpointing overhead and checkpoint size.

Year2011
File
PublisherUniversity of Westminster
Publication dates
Published2011
Digital Object Identifier (DOI)https://doi.org/10.34737/90017

Related outputs

A checkpointing mechanism for the Grid environment
Sajadah, K., Terstyanszky, G., Winter, S. and Kacsuk, P. 2008. A checkpointing mechanism for the Grid environment. in: Proceedings of the UK e-Science All Hands Meeting 2008, Edinburgh, UK, 8th - 11th September 2008 Edinburgh National e-Science Centre.

Checkpointing of parallel applications in a grid environment
Sajadah, K., Terstyanszky, G., Winter, S. and Kacsuk, P. 2008. Checkpointing of parallel applications in a grid environment. in: Kacsuk, P., Lovas, R. and Nemeth, Z. (ed.) Distributed and parallel systems: in focus: desktop grid computing Boston, MA Springer. pp. 179-187

Security mechanisms for legacy code applications in GT3 environment
Terstyanszky, G., Delaitre, T., Goyeneche, A., Kiss, T., Sajadah, K., Winter, S. and Kacsuk, P. 2005. Security mechanisms for legacy code applications in GT3 environment. in: 13th Euromicro Conference on Parallel, Distributed, and Network-Based Processing proceedings: Lugano, Switzerland, February 9-11, 2005 Los Alamitos, USA IEEE . pp. 220-226

Deploying application on a GT3 Grid
Kiss, T., Delaitre, T., Goyeneche, A., Winter, S., Kacsuk, P., Terstyanszky, G., Igbe, D., Maselino, P., Sajadah, K. and Weingarten, N. 2004. Deploying application on a GT3 Grid. London, UK University of Westminster.

Experiences with publishing and executing parallel legacy code using an OGSI grid service
Delaitre, T., Goyeneche, A., Kiss, T., Terstyanszky, G., Winter, S., Kacsuk, P., Igbe, D., Maselino, P., Sajadah, K. and Weingarten, N. 2004. Experiences with publishing and executing parallel legacy code using an OGSI grid service. in: Proceedings of the UK E-Science All Hands Meeting, 31st Aug - 3rd Sep, 2004, Nottingham, UK EPSRC. pp. 999-1002

Permalink - https://westminsterresearch.westminster.ac.uk/item/90017/checkpointing-of-parallel-applications-in-a-grid-environment


Share this

Usage statistics

98 total views
170 total downloads
These values cover views and downloads from WestminsterResearch and are for the period from September 2nd 2018, when this repository was created.