Abstract | Scientific workflows have emerged in the past decade as a new solution for representing complex scientific experiments. Generally, they are data and compute intensive applications and may need high performance computing infrastructures (clusters, grids and cloud) to be executed. Recently, cloud services have gained widespread availability and popularity since their rapid elasticity and resource pooling, which is well suited to the nature of scientific applications that may experience variable demand and eventually spikes in resource. In this paper we investigate dynamic execution capabilities, focused on fault tolerance behavior in the Occopus framework which was developed by SZTAKI and was targeted to provide automatic features for configuring and orchestrating distributed applications (so called virtual infrastructures) on single or multi cloud systems. |
---|