Chapter title | Checkpointing of parallel applications in a grid environment |
---|
Authors | Sajadah, K., Terstyanszky, G., Winter, S. and Kacsuk, P. |
---|
Editors | Kacsuk, P., Lovas, R. and Nemeth, Z. |
---|
Abstract | Jobs in Grid workflows are exposed to different types of failure. It is important to develop fault tolerant mechanisms to ensure a good level of reliability during the execution of Grid jobs. While checkpointing is the most common method to achieve fault tolerance, there still is a lot of work to be done to improve the efficiency of the mechanism. The paper gives an overview of a checkpoint solution for checkpointing parallel applications executed on multiple sites in the Grid environment. The checkpointing mechanism is an improvement of the PGRADE checkpointing solution. |
---|
Keywords | Checkpointing, First Order Approximation, Natural Synchronisation Points, Critical Region |
---|
Book title | Distributed and parallel systems: in focus: desktop grid computing |
---|
Page range | 179-187 |
---|
Year | 2008 |
---|
Publisher | Springer |
---|
Publication dates |
---|
Published | 2008 |
---|
Place of publication | Boston, MA |
---|
ISBN | 9780387794471 |
---|
Digital Object Identifier (DOI) | https://doi.org/10.1007/978-0-387-79448-8 |
---|