Access Type

Open Access Dissertation

Date of Award

January 2013

Degree Type


Degree Name



Computer Science

First Advisor

Sorin Draghici


The rate of acquiring biological data has greatly surpassed our ability to interpret it. At the same time, we have started to understand that evolution of many diseases such as cancer, are the results of the interplay between the disease itself and the immune system of the host. It is now well accepted that cancer is not a single disease, but a “complex collection of distinct genetic diseases united by common hallmarks”. Understanding the differences between such disease subtypes is key not only in providing adequate treatments for known subtypes but also identifying new ones. These unforeseen disease subtypes are one of the main reasons high-profile clinical trials fail. To identify such cases, we proposed a classification technique, based on Support Vector Machines, that is able to automatically identify samples that are dissimilar from the classes used for training. We assessed the performance of this approach both with artificial data and data from the UCI machine learning repository. Moreover, we showed in a leukemia experiment that our method is able to identify 65% of the MLL patients when it was trained only on AML vs. ALL. In addition, to augment our ability to understand the disease mechanism in each subgroup, we proposed a systems biology approach able to consider all measured gene expressing changes, thus eliminating the possibility that small but important gene changes (e.g. transcription factors) are omitted from the analysis. We showed that this approach provides consistent results that do not depend on the choice of an arbitrary threshold for the differential regulation. We also showed in a multiple sclerosis study that this approach is able to obtain consistent results across multiple experiments performed by different groups on different technologies, that could not be achieved based solely using differential expression. The cut-off free impact analysis was released as part of the ROntoTools Bioconductor package.