Access Type

Open Access Dissertation

Date of Award

January 2013

Degree Type

Dissertation

Degree Name

Ph.D.

Department

Electrical and Computer Engineering

First Advisor

Abhilash Pandya

Second Advisor

Chandan K. Reddy

Abstract

As machine learning methods extend to more complex and diverse set of problems, situations arise where the complexity and availability of data presents a situation where the information source is not "adequate" to generate a representative hypothesis. Learning from multiple sources of data is a promising research direction as researchers leverage ever more diverse sources of information. Since data is not readily available, knowledge has to be transferred from other sources and new methods (both supervised and un-supervised) have to be developed to selectively share and transfer knowledge. In this dissertation, we present both supervised and un-supervised techniques to tackle a problem where learning algorithms cannot generalize and require an extension to leverage knowledge from different sources of data. Knowledge transfer is a difficult problem as diverse sources of data can overwhelm each individual dataset's distribution and a careful set of transformations has to be applied to increase the relevant knowledge at the risk of biasing a dataset's distribution and inducing negative transfer that can degrade a learner's performance.

We give an overview of the issues encountered when the learning dataset does not have a sufficient supply of training examples. We categorize the structure of small datasets and highlight the need for further research. We present an instance-transfer supervised classification algorithm to improve classification performance in a target dataset via knowledge transfer from an auxiliary dataset. The improved classification performance of our algorithm is demonstrated with several real-world experiments. We extend the instance-transfer paradigm to supervised classification with "Absolute Rarity'", where a dataset has an insufficient supply of training examples and a skewed class distribution. We demonstrate one solution with a transfer learning approach and another with an imbalanced learning approach and demonstrate the effectiveness of our algorithms with several real world text and demographics classification problems (among others). We present an unsupervised multi-task clustering algorithm where several small datasets are simultaneously clustered and knowledge is transferred between the datasets to improve clustering performance on each individual dataset and we demonstrate the improved clustering performance with an extensive set of experiments.

Recommended Citation

Al-Stouhi, Samir, "Learning With An Insufficient Supply Of Data Via Knowledge Transfer And Sharing" (2013). Wayne State University Dissertations. 829.
https://digitalcommons.wayne.edu/oa_dissertations/829

Download

Included in

Computer Engineering Commons, Computer Sciences Commons

COinS

DigitalCommons@WayneState

Wayne State University Dissertations

Learning With An Insufficient Supply Of Data Via Knowledge Transfer And Sharing

Access Type

Date of Award

Degree Type

Degree Name

Department

First Advisor

Second Advisor

Abstract

Recommended Citation

Included in

Links

Browse

Author Corner

DigitalCommons@WayneState

Wayne State University Dissertations

Learning With An Insufficient Supply Of Data Via Knowledge Transfer And Sharing

Author

Access Type

Date of Award

Degree Type

Degree Name

Department

First Advisor

Second Advisor

Abstract

Recommended Citation

Included in

Share

Links

Browse

Author Corner