Open Access Thesis
Date of Award
The need for real-time and large-scale data processing has led to the development of frameworks for distributed stream processing in clouds. To provide fast, scalable, and fault tolerant stream processing, recent Distributed Stream Processing Systems (DSPS) have proposed to treat streaming workloads as a series of batch jobs, instead of a series of records. Batch-based stream processing systems could process data at high rate, however, it also leads to large end-to-end latency. In this thesis we concentrate on minimizing the end-to-end latency of batched streaming system by leveraging adaptive batch size and execution parallelism tuning. We propose, DyBBS, a heuristic algorithm integrated with isotonic regression to automatically learn and adjust batch size and execution parallelism according to workloads and operating conditions without any workload specified prior knowledge. The experiment results show that our algorithm is able to significantly reduces the end-to-end latency for two representative streaming workloads: i) for Reduce workload, the latency can be reduced by 34.97% and 48.02% for sinusoidal and Markov chain data input rates, respectively; and ii) for Join workload, the latencies reductions are 63.28% and 67.51% for sinusoidal and Markov chain data input rates, respectively.
Zhang, Quan, "Adaptive Block And Batch Sizing Forbatched Stream Processing System" (2017). Wayne State University Theses. 598.