Access Type

Open Access Dissertation

Date of Award

January 2019

Degree Type

Dissertation

Degree Name

Ph.D.

Department

Computer Science

First Advisor

Dongxiao Zhu

Abstract

Data with both heterogeneity and homogeneity is now ubiquitous due to the development of multitudinous data collection techniques. To encode the data heterogeneity and homogeneity, we focus on unsupervised and supervised learning approaches. In unsupervised learning, to consider both data heterogeneity and homogeneity, we develop three clustering frameworks to maximize the heterogeneity among data sub-groups and homogeneity within each data sub-group for over-dispersed data in three different data types, i.e., alphabetic, network and mixed feature types data. In supervised learning, the traditional approaches, however, either build a global model for a whole group including all sub-groups, which fail to consider data heterogeneity among different sub-groups; or build and learn one model for each subgroup independently, which ignores data homogeneity and relatedness among these sub-groups. To overcome the limitations and utilize both data heterogeneity and homogeneity, we implement multi-task learning (MTL) framework to conduct risk factor analysis and survival analysis for different sub-groups.

Share

COinS