"Comparing Performance Of Spark And Flink On Batch And Streaming Data " by Niranjan Jadhav

Wayne State University Theses

Off-campus WSU users: To download campus access dissertations, please use the following link to log into our proxy server with your WSU access ID and password, then click the "Off-campus Download" button below.

Non-WSU users: Please talk to your librarian about requesting this thesis through interlibrary loan.

Title

Comparing Performance Of Spark And Flink On Batch And Streaming Data

Author

Niranjan Jadhav, Wayne State UniversityFollow

Access Type

WSU Access

Date of Award

January 2017

Degree Type

Thesis

Degree Name

M.S.

Department

Computer Science

First Advisor

Alexander Kotov

Abstract

In recent history, there has been a rapid growth in the amount of data created across the globe. This data has been created in variety of formats due to growing sense of usefulness of data in numerous industries like Automotive, Social Media, Retail, E-commerce, Healthcare,

etc. Collectively simple activities performed by individuals or machines in this world create variety of high volume data at a fast pace. Predictions have been made that the total data volume in the digital world will grow 1000 times more than the volume of data in present in year 2016. The extra-ordinary growth in three V’s (Volume, Variety and Velocity) creates new challenges to handle Big Data. This increasing demand to efficiently handle ever growing data gave birth to distributed data storage and processing frameworks such as Hadoop, Spark, Mahout, Storm, Flink, etc. However, with increased number of available options to process the big data, comes a new challenge to select a perfect tool according to requirements. Selecting an optimal tool to perform some specific operations on specific type of data requires great

deal of efforts in researching the working principals of the tool. In this study, we provide detailed information about underlying processing methodology, component stack used by both Spark and Flink along with key differences in them. We perform extensive set of experiments

on multiple problems including batch jobs, iterative and machine learning jobs and streaming jobs. Finally, we provide detailed analysis of the performance comparison and the reasoning behind it.

Recommended Citation

Jadhav, Niranjan, "Comparing Performance Of Spark And Flink On Batch And Streaming Data" (2017). Wayne State University Theses. 621.
https://digitalcommons.wayne.edu/oa_theses/621

Download

Off-campus Download COinS

DigitalCommons@WayneState

Wayne State University Theses

Title

Comparing Performance Of Spark And Flink On Batch And Streaming Data

Author

Access Type

Date of Award

Degree Type

Degree Name

Department

First Advisor

Abstract

Recommended Citation

Links

Browse

Author Corner

DigitalCommons@WayneState

Wayne State University Theses

Title

Comparing Performance Of Spark And Flink On Batch And Streaming Data

Author

Access Type

Date of Award

Degree Type

Degree Name

Department

First Advisor

Abstract

Recommended Citation

Share

Links

Browse

Author Corner