Access Type

Open Access Dissertation

Date of Award

January 2016

Degree Type

Dissertation

Degree Name

Ph.D.

Department

Computer Science

First Advisor

Chandan K. Reddy

Abstract

Predicting time-to-event from longitudinal data where different events occur at different time points is an extremely important problem in several domains such as healthcare, economics, social networks and seismology, to name a few. A unique challenge in this problem involves building predictive models from right censored data (also called as survival data). This is a phenomenon where instances whose event of interest are not yet observed within a given observation time window and are considered to be right censored. Effective models for predicting time-to-event labels from such right censored data with good accuracy can have a significant impact in these domains.

However, existing methods in the literature cannot capture various complexities present in real-world survival data such as feature groups and intra and inter-event correlations. To address such challenges, we briefly summarize the major contributions of the methods proposed here as (i) modeling intra-event correlations in survival data using structured sparsity-based regularizers, (ii) learning novel representations for survival data by inferring inter-event and intra-event correlations, (iii) extending linear regression-based methods to learn predictive models from right censored data and (iv) identifying censored instances and events from the data which are contributing extensively to learning a model with lesser number of training instances using active learning. We present optimization-based algorithms corresponding to each of the aforementioned contributions in this dissertation utilizing diverse techniques such as regularization, representation learning and active learning. Our methods are tested on different real-world longitudinal datasets such as electronic health records (EHRs), crowdfunding data, gene-expression data and several publicly available synthetic survival datasets. The results demonstrate the goodness of these methods when compared to state-of-the-art survival analysis, classification and regression methods from the literature.

Share

COinS