Access Type

Open Access Dissertation

Date of Award

January 2022

Degree Type

Dissertation

Degree Name

Ph.D.

Department

Computer Science

First Advisor

Shiyong Lu

Second Advisor

Fengwei Zhang

Abstract

Nowadays, big data analytics are essential tools for helping businesses, healthcare providers, and decision-makers in society, in making important and strategic choices that benefit their business, patients, and people. Processing big data requires massive amounts of computing and storage resources due to the nature of big data characteristics. Cloud providers with their low-cost, elastic, and enormous amounts of hardware and software resources are a suitable platform for deploying big data analytics systems. Also, the increasing demand for GPU accelerator instances from big data and machine learning applications encourages cloud providers to invest in GPU servers to enable the provisioning of the General-Purpose Graphics Processing Units (GPGPU) service. However, cloud architecture often contains a massive amount of hardware and software Trusted Computing Base (TCB), which is the quintessential software and hardware portion of the system whose security is critical and must be ensured for a sound behavior of the rest of the system. The massive size of cloud TCB makes them vulnerable to security attacks that exploit bugs or vulnerabilities in the large TCB, allowing attackers to access sensitive information from the cloud. The existing cloud's security vulnerabilities could decrease the willingness of cloud users to deploy security-sensitive big data tasks in cloud environments. In this dissertation, to overcome the above-mentioned security challenges, we mainly used hardware-assisted Trusted Execution Environment (TEE) incorporated with other security primitives to decrease TCB's size and thus secure big data analytics platforms. Therefore at first, we empirically investigated the state of the art hardware-assisted TEEs in x86 architecture such as Intel Software Guard eXtensions (SGX) and AMD Secure Encrypted Virtualization (SEV), reported our findings regarding their performance, use cases, and security implications for the cloud environments. Then, we surveyed prior research and efforts to decrease the security risk of deploying big data analytics systems in the cloud, including solutions that used TEE or non-TEE approaches. Next, we designed, implemented, and evaluated a secure big data scientific workflow management system prototype based on the DATAVIEW system to protect user code and data with Intel SGX TEE support; our prototype provides the smallest TCB size for DATAVIEW scientific workflow management system inside a cloud environment with SGX support and has been tested with both real SGX server and Amazon AWS cloud with simulated SGX support. We also reported on the performance results of our design. Then we extended and matured our security architecture and designed, implemented, and released the SecDATAVIEW system, a secure big data scientific workflow management platform with heterogeneous cloud computing support. SecDATAVIEW employs, the state of the art TEEs such as Intel SGX and AMD SEV available in the cloud's hardware resources and guarantees the confidentiality and integrity of workflow's code, data, and results while executing them in a public cloud environment. SecDATAVIEW provides a robust security guarantee for workflow's code, data, and results during the runtime and while at rest and is compatible with different cloud hardware in x86 architecture. We also reported on the performance results of the SecDATAVIEW system. The SecDATAVIEW system source codes and binaries were released on SecDATAVIEW's GitHub for public access and benefits. Further, we extend the SecDATAVIEW architecture with more security layers to provide a more robust secure platform for big data analytics. Especially the extended version of SecDATAVIEW is enhanced with the real-time platform attestation mechanisms that guarantee the platform's integrity and include solutions to protect sensitive users' data and results from attackers who initiate their attacks when users leave the SecDATAVIEW platform. Besides, we evaluated the extended version of SecDATAVIEW with a new set of experiments, including Neural Network and heavily connected workflows to show the behavior and usability of the system. Likewise, we released the source code of the extended version of the SecDATAVIEW system on SecDATAVIEW's GitHub for public access and benefits. Finally, as complementary and exciting research, we designed, implemented, and evaluated the SecGPU platform, a GPGPU trusted execution environment for commodity clouds. SecGPU is a new system that addresses important research challenges for providing secure GPGPU access in commodity clouds with the help of state of the art TEEs such as Intel TDX and AMD SEV, solving the lack of GPGPU support inside the cloud's TEE and also lack of interposition support for GPGPU access in commodity clouds. As a result, any GPGPU application, including big data analytics applications, can use SecGPU to execute GPGPU tasks securely in public cloud environments.

Share

COinS