Developing Data-intensive Scientific Workflow Scheduling Model on Heterogeneous Cloud Computing Resources

Loading...
Thumbnail Image

Authors

Kenesa, Bedasa

Journal Title

Journal ISSN

Volume Title

Publisher

ASTU

Abstract

Cloud computing is a distributed and powerful IT architecture that allows cloud providers to efficiently supply computing services to users on demand. Cloud computing is quickly becoming the preferred method for large-scale and complicated computation. For the reason of the shortage of suitable computing facilities on local servers and the growing volume of data produced and consumed by the experiments, a number of scientific experiments are being moved to cloud. Several existing large-scale scientific workflows based on simulations produce very large files and datasets and require huge amounts of computing resources to execute. Cybershake workflow, for example, process over 200 TB of data file which is generated during execution. Montage workflow creates data files that are several GB in size, but only 30% of CPU time is assigned to computation and the remaining 70% is used by I/O operations. Consequently, it should be addressed by eliminating unnecessary data movement or minimizing the probability of transferring data as much as possible, resulting in a shorter total workflow execution time. This proposed work focuses on tackling both data file transfer problem that is enormously time-consuming task in workflow execution and scheduling of tasks within workflow. A score-based data-intensive scientific workflow scheduling algorithm on heterogeneous cloud resources that allocates resources to workflow application tasks based on the performance of VMs like processor speed, processing cores, storage capacity (secondary storage), bandwidth strength, and RAM size. For the evaluation of our work, we used a WorkflowSim 1.0, which is an extended simulation toolkit from CloudSim for workflow simulation. We compared our scheduling algorithm to three standard scheduling algorithms: Round Robin, MinMin and MaxMin scheduling algorithms. In terms of Makespan, the proposed scheduling method outperforms all three heuristics.

Description

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By