The paper describes a parallel program checkpointing mechanism and its potential application in Grid systems in order to migrate applications among Grid sites. The checkpointing mechanism can automatically (without user interaction) support generic PVM programs created by the PGRADE Grid programming environment. The developed checkpointing mechanism is general enough to be used by any Grid job manager but the current implementation is connected to Condor. As a result, the integrated Condor/PGRADE system can guarantee the execution of any PVM program in the Grid. Notice that the Condor system can only guarantee the execution of sequential jobs. Integration of the Grid migration framework and the Mercury Grid monitor results in an observable Grid execution environment where the performance monitoring and visualization of PVM applications are supported even when the PVM application migrates in the Grid. The work presented in this paper has been supported by the Hungarian Chemistrygrid OMFB-00580/2003 project, the Hungarian Supergrid OMFB-00728/2002 project, the Hungarian IHM 4671/1/2003 project and the Hungarian Research Fund No. T042459. |