Access Type

Open Access Embargo

Date of Award

January 2015

Degree Type


Degree Name



Computer Science

First Advisor

Chandan K. Reddy


Predicting event occurrence at an early stage in longitudinal studies is an important and challenging problem which has high practical value. As opposed to the standard classification and regression problems where a domain expert can provide the labels for the data in a reasonably short period of time, training data in such longitudinal studies must be obtained only by waiting for the occurrence of sufficient number of events. On the other hand, survival analysis aims at finding the underlying distribution for data that measure the length of time until the occurrence of an event. However, it cannot give an answer to the open question of "how to forecast whether a subject will experience event by end of study having event occurrence information at early stage of survival data?''. This problem exhibits two major challenges: 1) absence of complete information about event occurrence (censoring) and 2) availability of only a partial set of events that occurred during the initial phase of the study. Thus, the main objective of this work is to predict for which subject in the study event will occur at future based on few event information at the initial stages of a longitudinal study.

In this thesis, we propose a novel approach to address the first challenge by introducing a new method for handling censored data using Kaplan-Meier estimator. The second challenge is tackled by effectively integrating Bayesian methods with an Accelerated Failure Time (AFT) model by adapting the prior probability of the event occurrence for future time points. In another word, we propose a novel Early Stage Prediction (ESP) framework for building event prediction models which are trained at early stages of longitudinal studies. More specifically, we extended the Naive Bayes, Tree-Augmented Naive Bayes (TAN) and Bayesian Network methods based on the proposed framework, and developed three algorithms, namely, ESP-NB, ESP-TAN and ESP-BN, to effectively predict event occurrence using the training data obtained at early stage of the study. The proposed framework is evaluated using a wide range of synthetic and real-world benchmark datasets. Our extensive set of experiments show that the proposed ESP framework is able to more accurately predict future event occurrences using only a limited amount of training data compared to the other alternative prediction methods.