Access Type

Open Access Thesis

Date of Award

January 2015

Degree Type

Thesis

Degree Name

M.S.

Department

Computer Science

First Advisor

Xuewen Chen

Abstract

Feature learning and object classification in machine learning have become very active research areas in recent decades. Identifying good features has various benefits for object classification in respect to reducing the computational cost and increasing the classification accuracy. In addition, many research studies have focused on the use of Graphics Processing Units (GPUs) to improve the training time for machine learning algorithms. In this study, the use of an alternative platform, called High Performance Computing Cluster (HPCC), to handle unsupervised feature learning, image and speech classification and improve the computational cost is proposed.

HPCC is a Big Data processing and massively parallel processing (MPP) computing platform used for solving Big Data problems. Algorithms are implemented in HPCC with a language called Enterprise Control Language (ECL) which is a declarative, data-centric programming language. It is a powerful, high-level, parallel programming language ideal for Big Data intensive applications.

In this study, various databases are explored, such as the CALTECH-101 and AR databases, and a subset of wild PubFig83 data to which multimedia content is added. Unsupervised learning algorithms are applied to extract low-level image features from unlabeled data using HPCC. A new object identification framework that works in a multimodal learning and classification process is proposed.

Coates et al. discovered that K-Means clustering method out-performed various deep learning methods such as sparse autoencoder for image classification. K-Means implemented in HPCC with various classifiers is compared with Coates et al. classification results.

Detailed results on image classification in HPCC using Naive Bayes, Random Forest, and C4.5 Decision Tree are performed and presented. The highest recognition rates are achieved using C4.5 Decision Tree classifier in HPCC systems. For example, the classification accuracy result of Coates et al. is improved from 74.3% to 85.2% using C4.5 Decision Tree classifier in HPCC. It is observed that the deeper the decision tree, the fitter the model, resulting in a higher accuracy.

The most important contribution of this study is the exploration of image classification problems in HPCC platform.

Share

COinS