Developing Data-intensive Scientific Workflow Scheduling Model on Heterogeneous Cloud Computing Resources
Loading...
Date
Authors
Kenesa, Bedasa
Journal Title
Journal ISSN
Volume Title
Publisher
ASTU
Abstract
Cloud computing is a distributed and powerful IT architecture that allows cloud providers to
efficiently supply computing services to users on demand. Cloud computing is quickly becoming
the preferred method for large-scale and complicated computation. For the reason of the
shortage of suitable computing facilities on local servers and the growing volume of data
produced and consumed by the experiments, a number of scientific experiments are being moved
to cloud. Several existing large-scale scientific workflows based on simulations produce very
large files and datasets and require huge amounts of computing resources to execute.
Cybershake workflow, for example, process over 200 TB of data file which is generated during
execution. Montage workflow creates data files that are several GB in size, but only 30% of
CPU time is assigned to computation and the remaining 70% is used by I/O operations.
Consequently, it should be addressed by eliminating unnecessary data movement or minimizing
the probability of transferring data as much as possible, resulting in a shorter total workflow
execution time. This proposed work focuses on tackling both data file transfer problem that is
enormously time-consuming task in workflow execution and scheduling of tasks within
workflow. A score-based data-intensive scientific workflow scheduling algorithm on
heterogeneous cloud resources that allocates resources to workflow application tasks based on
the performance of VMs like processor speed, processing cores, storage capacity (secondary
storage), bandwidth strength, and RAM size. For the evaluation of our work, we used a
WorkflowSim 1.0, which is an extended simulation toolkit from CloudSim for workflow
simulation. We compared our scheduling algorithm to three standard scheduling algorithms:
Round Robin, MinMin and MaxMin scheduling algorithms. In terms of Makespan, the proposed
scheduling method outperforms all three heuristics.
