Off-campus WSU users: To download campus access theses, please use the following link to log into our proxy server with your WSU access ID and password.
Non-WSU users: Please talk to your librarian about requesting this thesis through interlibrary loan.
Date of Award
Chandan K. Reddy
In this era of data abundance, it has become critical to be able to process large volumes of data at much faster rates than ever before. Boosting is a powerful predictive model that has been successfully used in many real-world applications. However, due to it's inherent sequential nature, achieving scalability for boosting is not trivial and demands the development of new parallelized versions which will allow them to efficiently handle large-scale data. In this work, we propose two parallel boosting algorithms, AdaBoost.PL and LogitBoost.PL, which facilitate simultaneous participation of multiple computing nodes to construct a boosted ensemble classifier. The proposed algorithms are competitive to the corresponding serial versions in terms of the generalization performance. In addition, our algorithms achieve significant speedup since our approach does not require individual computing nodes to communicate with each other for sharing their data. Hence, they are applicable and are robust in preserving privacy of computations as well. We used Map-Reduce framework to implement our algorithms and demonstrated the performance in terms of classification accuracy, speedup and scaleup using a wide variety of synthetic and real-world data sets.
Palit, Indranil, "A scalable and parallel boosting framework" (2011). Wayne State University Theses. Paper 78.