Open Access Dissertation
Date of Award
Electrical and Computer Engineering
Chandan K. Reddy
As machine learning methods extend to more complex and diverse set of problems, situations arise where the complexity and availability of data presents a situation where the information source is not "adequate" to generate a representative hypothesis. Learning from multiple sources of data is a promising research direction as researchers leverage ever more diverse sources of information. Since data is not readily available, knowledge has to be transferred from other sources and new methods (both supervised and un-supervised) have to be developed to selectively share and transfer knowledge. In this dissertation, we present both supervised and un-supervised techniques to tackle a problem where learning algorithms cannot generalize and require an extension to leverage knowledge from different sources of data. Knowledge transfer is a difficult problem as diverse sources of data can overwhelm each individual dataset's distribution and a careful set of transformations has to be applied to increase the relevant knowledge at the risk of biasing a dataset's distribution and inducing negative transfer that can degrade a learner's performance.
We give an overview of the issues encountered when the learning dataset does not have a sufficient supply of training examples. We categorize the structure of small datasets and highlight the need for further research. We present an instance-transfer supervised classification algorithm to improve classification performance in a target dataset via knowledge transfer from an auxiliary dataset. The improved classification performance of our algorithm is demonstrated with several real-world experiments. We extend the instance-transfer paradigm to supervised classification with "Absolute Rarity'", where a dataset has an insufficient supply of training examples and a skewed class distribution. We demonstrate one solution with a transfer learning approach and another with an imbalanced learning approach and demonstrate the effectiveness of our algorithms with several real world text and demographics classification problems (among others). We present an unsupervised multi-task clustering algorithm where several small datasets are simultaneously clustered and knowledge is transferred between the datasets to improve clustering performance on each individual dataset and we demonstrate the improved clustering performance with an extensive set of experiments.
Al-Stouhi, Samir, "Learning With An Insufficient Supply Of Data Via Knowledge Transfer And Sharing" (2013). Wayne State University Dissertations. 829.