Access Type

Open Access Dissertation

Date of Award

January 2016

Degree Type

Dissertation

Degree Name

Ph.D.

Department

Computer Science

First Advisor

Chandan K. Reddy

Abstract

Survival analysis aims to predict the occurrence of specific events of interest at future time points. The presence of incomplete observations due to censoring brings unique challenges in this domain and differentiates survival analysis techniques from other standard regression methods. In this thesis, we propose four models to deal with the high-dimensional survival analysis. Firstly, we propose a regularized linear regression model with weighted least-squares to handle the survival prediction in the presence of censored instances. We employ the elastic net penalty term for inducing sparsity into the linear model to effectively handle high-dimensional data. As opposed to the existing censored linear models, the parameter estimation of our model does not need any prior estimation of survival times of censored instances. The second model we proposed is a unified model for regularized parametric survival regression for an arbitrary survival distribution. We employ a generalized linear model to approximate the negative log-likelihood and use the elastic net as a sparsity-inducing penalty to effectively deal with high-dimensional data. The proposed model is then formulated as a penalized iteratively reweighted least squares and solved using a cyclical coordinate descent-based method.Considering the fact that the popularly used survival analysis methods such as Cox proportional hazard model and parametric survival regression suffer from some strict assumptions and hypotheses that are not realistic in many real-world applications. we reformulate the survival analysis problem as a multi-task learning problem in the third model which predicts the survival time by estimating the survival status at each time interval during the study duration. We propose an indicator matrix to enable the multi-task learning algorithm to handle censored instances and incorporate some of the important characteristics of survival problems such as non-negative non-increasing list structure into our model through max-heap projection. And the proposed formulation is solved via an Alternating Direction Method of Multipliers (ADMM) based algorithm. Besides above three methods which aim at solving standard survival prediction problem, we also propose a transfer learning model for survival analysis. During our study, we noticed that obtaining sufficient labeled training instances for learning a robust prediction model is a very time consuming process and can be extremely difficult in practice. Thus, we proposed a Cox based model which uses the L2,1-norm penalty to encourage source predictors and target predictors share similar sparsity patterns and hence learns a shared representation across source and target domains to improve the model performance on the target task. We demonstrate the performance of the proposed models using several real-world high-dimensional biomedical benchmark datasets and our experimental results indicate that our model outperforms other state-of-the-art related competing methods and attains very competitive performance on various datasets.

Share

COinS