Checkpointing of parallel applications in a Grid environment

Sajadah, K. 2011. Checkpointing of parallel applications in a Grid environment. Thesis University of Westminster School of Electronics and Computer Science

TitleCheckpointing of parallel applications in a Grid environment
AuthorsSajadah, K.
Abstract

The Grid environment is generic, heterogeneous, and dynamic with lots of unreliable resources making it very exposed to failures. The environment is unreliable because it is geographically dispersed involving multiple autonomous administrative domains and it is composed of a large number of components. Examples of failures in the Grid

environment can be: application crash, Grid node crash, network failures, and Grid system component failures. These types of failures can affect the execution of

parallel/distributed application in the Grid environment and so, protections against these faults are crucial. Therefore, it is essential to develop efficient fault tolerant mechanisms to allow users to successfully execute Grid applications. One of the research challenges in Grid computing is to be able to develop a fault tolerant solution that will ensure Grid applications are executed reliably with minimum overhead incurred.

While checkpointing is the most common method to achieve fault tolerance, there is still a lot of work to be done to improve the efficiency of the mechanism. This thesis provides an in-depth description of a novel solution for checkpointing parallel applications executed on a Grid. The checkpointing mechanism implemented allows to checkpoint an application at regions where there is no interprocess communication involved and therefore reducing the checkpointing overhead and checkpoint size.

Year2011
FileKreeteeraj_SAJADAH_2011.pdf
Publication dates
Completed2011

Related outputs

A checkpointing mechanism for the Grid environment
Sajadah, K., Terstyanszky, G., Winter, S. and Kacsuk, P. 2008. A checkpointing mechanism for the Grid environment. in: Proceedings of the UK e-Science All Hands Meeting 2008, Edinburgh, UK, 8th - 11th September 2008 Edinburgh National e-Science Centre.

Checkpointing of parallel applications in a grid environment
Sajadah, K., Terstyanszky, G., Winter, S. and Kacsuk, P. 2008. Checkpointing of parallel applications in a grid environment. in: Kacsuk, P., Lovas, R. and Nemeth, Z. (ed.) Distributed and parallel systems: in focus: desktop grid computing Boston, MA Springer. pp. 179-187

Security mechanisms for legacy code applications in GT3 environment
Terstyanszky, G., Delaitre, T., Goyeneche, A., Kiss, T., Sajadah, K., Winter, S. and Kacsuk, P. 2005. Security mechanisms for legacy code applications in GT3 environment. in: 13th Euromicro Conference on Parallel, Distributed, and Network-Based Processing proceedings: Lugano, Switzerland, February 9-11, 2005 Los Alamitos, USA IEEE . pp. 220-226

Deploying application on a GT3 Grid
Kiss, T., Delaitre, T., Goyeneche, A., Winter, S., Kacsuk, P., Terstyanszky, G., Igbe, D., Maselino, P., Sajadah, K. and Weingarten, N. 2004. Deploying application on a GT3 Grid. London, UK University of Westminster.

Experiences with publishing and executing parallel legacy code using an OGSI grid service
Delaitre, T., Goyeneche, A., Kiss, T., Terstyanszky, G., Winter, S., Kacsuk, P., Igbe, D., Maselino, P., Sajadah, K. and Weingarten, N. 2004. Experiences with publishing and executing parallel legacy code using an OGSI grid service. in: Proceedings of the UK E-Science All Hands Meeting, 31st Aug - 3rd Sep, 2004, Nottingham, UK EPSRC. pp. 999-1002

Permalink - https://westminsterresearch.westminster.ac.uk/item/90017/checkpointing-of-parallel-applications-in-a-grid-environment


Share this
Tweet
Email