Access Type

Open Access Dissertation

Date of Award

January 2012

Degree Type

Dissertation

Degree Name

Ph.D.

Department

Computer Science

First Advisor

Chandan K. Reddy

Abstract

Capturing the changes between two biological phenotypes is a crucial task in understanding the mechanisms of various diseases. Most of the existing computational approaches depend on testing the changes in the expression levels of each single gene individually. In this work, we proposed novel computational approaches to identify the differential genes between two phenotypes. These approaches aim to quantitatively characterize the differences between two phenotypes and can provide better insights and understanding of various diseases. The purpose of this thesis is three-fold. Firstly, we review the state-of-the-art approaches for differential analysis of gene expression data.

Secondly, we propose a novel differential network analysis approach that is composed of two algorithms, namely, DiffRank and DiffSubNet, to identify differential hubs and differential subnetworks, respectively. In this approach, two datasets are represented as two networks , and then the problem of identifying differential genes is transformed to the problem of comparing two networks to identify the most differential network omponents. Studying such networks can provide valuable knowledge about the data. The DiffRank algorithm ranks the nodes of two networks based on their differential behavior using two novel differential measures: differential connectivity and differential betweenness centrality for each node. These measures are propagated through the network and are optimized to capture the local and global structural changes between two networks. Then, we integrated the results of this algorithm into the proposed differential subnetwork algorithm which is called DiffSubNet. This algorithm aims to identify sets of differentially connected nodes. We demonstrated the effectiveness of these algorithms on synthetic datasets and real-world applications and showed that these algorithms identified meaningful and valuable information compared to some of the baseline methods that can be used for such a task.

Thirdly, we propose a novel differential co-clustering approach to efficiently find arbitrarily positioned difeferntial (or discriminative) co-clusters from large datasets. The goal of this approach is to discover a distinguishing set of gene patterns that are highly correlated in a subset of the samples (subspace co-expressions) in one phenotype but not in the other. This approach is useful when the biological samples are assumed to be heterogenous or have multiple subtypes. To achieve this goal, we propose a novel co-clustering algorithm, Ranking-based Arbitrarily Positioned Overlapping Co-Clustering (RAPOCC), to efficiently extract significant co-clusters. This algorithm optimizes a novel ranking-based objective function to find arbitrarily positioned co-clusters, and it can extract large and overlapping co-clusters containing both positively and negatively correlated genes. Then, we extend this algorithm to discover discriminative co-clusters by incorporating the class information into the co-cluster search process. The novel discriminative co-clustering algorithm is called Discriminative RAPOCC (Di-RAPOCC), to efficiently extract the discriminative co-clusters from labeled datasets. We also characterize the discriminative co-clusters and propose three novel measures that can be used to evaluate the performance of any discriminative subspace algorithm. We evaluated the proposed algorithms on several synthetic and real gene expression datasets, and our experimental results showed that the proposed algorithms outperformed several existing algorithms available in the literature.

The shift from single gene analysis to the differential gene network analysis and differential co-clustering can play a crucial role in future analysis of gene expression and can help in understanding the mechanism of various diseases.

Share

COinS