|Chapter title||Checkpointing of parallel applications in a grid environment|
|Authors||Sajadah, K., Terstyanszky, G., Winter, S. and Kacsuk, P.|
|Editors||Kacsuk, P., Lovas, R. and Nemeth, Z.|
Jobs in Grid workflows are exposed to different types of failure. It is important to develop fault tolerant mechanisms to ensure a good level of reliability during the execution of Grid jobs. While checkpointing is the most common method to achieve fault tolerance, there still is a lot of work to be done to improve the efficiency of the mechanism. The paper gives an overview of a checkpoint solution for checkpointing parallel applications executed on multiple sites in the Grid environment. The checkpointing mechanism is an improvement of the PGRADE checkpointing solution.
|Keywords||Checkpointing, First Order Approximation, Natural Synchronisation Points, Critical Region|
|Book title||Distributed and parallel systems: in focus: desktop grid computing|
|Place of publication||Boston, MA|
|Digital Object Identifier (DOI)||https://doi.org/10.1007/978-0-387-79448-8|