Off-campus WSU users: To download campus access dissertations, please use the following link to log into our proxy server with your WSU access ID and password, then click the "Off-campus Download" button below.

Non-WSU users: Please talk to your librarian about requesting this thesis through interlibrary loan.

Access Type

WSU Access

Date of Award

January 2011

Degree Type

Thesis

Degree Name

M.S.

Department

Computer Science

First Advisor

Chandan K. Reddy

Abstract

In this era of data abundance, it has become critical to be able to process large volumes of data at much faster rates than ever before. Boosting is a powerful predictive model that has been successfully used in many real-world applications. However, due to it's inherent sequential nature, achieving scalability for boosting is not trivial and demands the development of new parallelized versions which will allow them to efficiently handle large-scale data. In this work, we propose two parallel boosting algorithms, AdaBoost.PL and LogitBoost.PL, which facilitate simultaneous participation of multiple computing nodes to construct a boosted ensemble classifier. The proposed algorithms are competitive to the corresponding serial versions in terms of the generalization performance. In addition, our algorithms achieve significant speedup since our approach does not require individual computing nodes to communicate with each other for sharing their data. Hence, they are applicable and are robust in preserving privacy of computations as well. We used Map-Reduce framework to implement our algorithms and demonstrated the performance in terms of classification accuracy, speedup and scaleup using a wide variety of synthetic and real-world data sets.