"Performance Comparison Of Two Data Mining Algorithms On Big Data Platforms " by Md Rajiur Rahman Raju

Wayne State University Theses

Title

Performance Comparison Of Two Data Mining Algorithms On Big Data Platforms

Author

Md Rajiur Rahman Raju, Wayne State UniversityFollow

Access Type

Open Access Thesis

Date of Award

January 2015

Degree Type

Thesis

Degree Name

M.S.

Department

Computer Science

First Advisor

Chandan K. Reddy

Abstract

In this Big data era, the need for performing large-scale computations is evident. A better understanding of the most suitable platforms which can efficiently run these computations is needed. In this thesis, we attempt to compare four such big data platforms, namely Hadoop, Spark, GPU, and Multicore CPU. We compare these platforms using two prominent data mining algorithms, namely, K-means clustering and K-nearest neighbour classification and discuss specific implementation-level details. We provide several insights into the best possible implementations of these algorithms and systematically compare the benefits and drawbacks of each of these platforms. We conduct experiments by varying data size and parameters to obtain runtime and scalability performances of these platforms. Our experiments show that GPU and Multicore CPU are faster but have certain limitations. On the other hand, Hadoop and Spark are able to handle large scale datasets. We also observe that Spark performs better than Hadoop for both iterative and non-iterative jobs. In summary, we have examined different characteristics of four big data platforms and provided comparative analysis for the cases of two algorithms. Since many other data mining algorithms either use these two methods during pre-processing or as an integral component, we hope that our analysis will have impact in many other applications and algorithms beyond the ones that are being reported in this thesis.

Recommended Citation

Raju, Md Rajiur Rahman, "Performance Comparison Of Two Data Mining Algorithms On Big Data Platforms" (2015). Wayne State University Theses. 476.
https://digitalcommons.wayne.edu/oa_theses/476

Download

Included in

Computer Sciences Commons

COinS

DigitalCommons@WayneState

Wayne State University Theses

Title

Performance Comparison Of Two Data Mining Algorithms On Big Data Platforms

Author

Access Type

Date of Award

Degree Type

Degree Name

Department

First Advisor

Abstract

Recommended Citation

Included in

Links

Browse

Author Corner

DigitalCommons@WayneState

Wayne State University Theses

Title

Performance Comparison Of Two Data Mining Algorithms On Big Data Platforms

Author

Access Type

Date of Award

Degree Type

Degree Name

Department

First Advisor

Abstract

Recommended Citation

Included in

Share

Links

Browse

Author Corner