Off-campus WSU users: To download campus access dissertations, please use the following link to log into our proxy server with your WSU access ID and password, then click the "Off-campus Download" button below.
Non-WSU users: Please talk to your librarian about requesting this thesis through interlibrary loan.
Date of Award
The need for real-time and large-scale data processing has led to the development of frameworks for distributed stream processing in clouds. To provide fast, scalable, and fault tolerant stream processing, recent Distributed Stream Processing Systems (DSPS) have proposed to treat streaming workloads as a series of batch jobs, instead of a series of records. Batch-based stream processing systems could process data at high rate, however, it also leads to large end-to-end latency. In this thesis we concentrate on minimizing the end-to-end latency of batched streaming system by leveraging adaptive batch size and execution parallelism tuning. We propose, DyBBS, a heuristic algorithm integrated with isotonic regression to automatically learn and adjust batch size and execution parallelism according to workloads and operating conditions without any workload specified prior knowledge. The experiment results show that our algorithm is able to significantly reduces the end-to-end latency for two representative streaming workloads: i) for Reduce workload, the latency can be reduced by 34.97% and 48.02% for sinusoidal and Markov chain data input rates, respectively; and ii) for Join workload, the latencies reductions are 63.28% and 67.51% for sinusoidal and Markov chain data input rates, respectively.
Zhang, Quan, "Adaptive Block And Batch Sizing Forbatched Stream Processing System" (2017). Wayne State University Theses. 598.