Off-campus WSU users: To download campus access dissertations, please use the following link to log into our proxy server with your WSU access ID and password, then click the "Off-campus Download" button below.

Non-WSU users: Please talk to your librarian about requesting this dissertation through interlibrary loan.

Access Type

WSU Access

Date of Award

January 2023

Degree Type

Dissertation

Degree Name

Ph.D.

Department

Computer Science

First Advisor

Shiyong Lu

Abstract

The workflow engine serves as the central module in a modern SWfMS, encompassing both the workflow planner and the workflow executor. The workflow planner takes into account the workflow specification and relevant constraints to generate an optimized workflow schedule. List-based scheduling algorithms have demonstrated their efficacy in generating feasible schedules with shorter response times compared to meta-heuristic algorithms for the workflow scheduling problem. In a cloud-based environment, workflow scheduling algorithms need to consider the specific characteristics of the underlying cloud platform, such as on-demand resource provisioning strategies, virtually unlimited compute capacities, virtual machine booting times, homogeneous network, and pay-as-you-go pricing model, to produce an optimal scheduling solution that meets the deadline constraints of the workflow. The workflow executor, in collaboration with the task executor, then seamlessly executes the workflow in the workflow management system.

In this dissertation, we initially introduced a path-based scheduling algorithm, denoted as LPOD1, which aims to generate optimized workflow schedule solutions for deadline-constrained workflows in cloud computing environments. We then employed dynamic programming strategy to reduce time complexity and established six theorems to prove the correctness of the recurrence relationship. This optimized version, referred to as LPOD2, was integrated into the workflow executor of DATAVIEW, a popular open-source big data workflow management system. LPOD2 is designed to directly access provenance data collected by the task executor, such as task execution times and data transfer times between consecutive tasks, and utilize this information to achieve optimal workflow schedules. We conducted a series of case studies using synthetic and real-life workflows in DATAVIEW. Experimental results demonstrated that the proposed algorithm is efficient and scalable to handle large numbers of tasks and large datasets, exhibiting superior workflow scheduling quality and performance compared to state-of-the-art algorithms such as IC-PCP and untuned meta-heuristic algorithm GA. Finally, we implemented the proposed architecture and designed distributed algorithms for the workflow executor and task executors, incorporating optimizations in data movement, task movement, and communication among different subsystems. This resulted in a new version of DATAVIEW that encompasses the designed architecture, algorithms, and optimization strategies.

Off-campus Download

Share

COinS