Off-campus WSU users: To download campus access dissertations, please use the following link to log into our proxy server with your WSU access ID and password, then click the "Off-campus Download" button below.

Non-WSU users: Please talk to your librarian about requesting this dissertation through interlibrary loan.

Access Type

WSU Access

Date of Award

January 2019

Degree Type


Degree Name



Computer Science

First Advisor

Shiyong . Lu


Big data workflows have been identified as the latest generation of data-centric workflow technologies addressing five key challenges of big data: variety, volume, velocity, veracity, and value. A workflow is computerized modeling and automation of processing the tasks, maintaining the execution order of their dependencies. Due to the increase of big data and workflows, a single machine can't handle the overall execution and as a result, big data demands a scalable big data management framework that could be executed in the cloud. Big data workflow management systems (BDWFMSs) have as of late risen as well known data analysis stages for different networks including machine learning and security to perform large scale data analytics in the grids and clouds. First, leveraging the automation procedure of a workflow, a diagnosis recommendation workflow has been developed. Diagnosis recommendation is an extremely important problem in healthcare, where the clinician infers an optimal diagnosis for a patient. This problem has a significant impact on improving patient quality of life. Existing techniques for solving this problem need a large number of labeled instances, which are not available readily. To overcome this problem, in this project work, we describe a semi-supervised diagnosis recommendation system based on frequent pattern mining and clustering. Second, to execute a huge workflow in the cloud with a given deadline, we need to reduce the operation cost. Besides, some of the tasks in the workflow require confidentiality and integrity of run time environment. Here, we use the Intel Software Guard Extensions (SGX) and SEVas a Trusted Execution Environment (TEE) to support the integrity and confidentiality of individual workflow tasks. Based on this, we propose a deadline-constrained and SGX-aware workflow scheduling algorithm, called SGX-E2C2D (Efficient Cost-Effective Deadline Constrained algorithms for IaaS clouds), to address these two challenges. SGX-E2C2D features several heuristics including exploiting longest critical paths and reuse of extra times in existing virtual machine instances. Third, the protection of data confidentiality and secure execution of workflow applications remains an important and challenging problem. Towards these security challenges we introduced and implemented SecDATAVIW that is an enhanced version of the DATAVIEW system to protect the confidentiality and integrity of workflows at remote clouds. Moreover, we evaluated the performance of the SECDATAVIEW system and reported on our results.

Off-campus Download