Access Type

Open Access Thesis

Date of Award

January 2016

Degree Type

Thesis

Degree Name

M.S.

Department

Computer Science

First Advisor

Chandan K. Reddy

Abstract

Building effective prediction models from high-dimensional data is an important problem in several domains such as in bioinformatics, healthcare analytics and general regression analysis. Extracting feature groups automatically from such data with several correlated features is necessary, in order to use regularizers such as the group lasso which can exploit this deciphered grouping structure to build effective prediction models. Elastic net, fused-lasso and Octagonal Shrinkage Clustering Algorithm for Regression (oscar) are some of the popular feature grouping methods proposed in the literature which recover both sparsity and feature groups from the data. However, their predictive ability is affected adversely when the regression coefficients of adjacent feature groups are similar, but not exactly equal. This happens as these methods merge such adjacent feature groups erroneously, which is widely known as the misfusion problem. In order to solve this problem, in this thesis, we propose a weighted L1 norm-based approach which is effective at recovering feature groups, despite the proximity of the coefficients of adjacent feature groups, building extremely accurate prediction models. This convex optimization problem is solved using the fast iterative soft-thresholding algorithm (FISTA). We depict how our approach is more successful than competing feature grouping methods such as the elastic net, fused-lasso and oscar at solving the misfusion problem on synthetic datasets. We also compare the goodness of prediction of our algorithm against state-of-the-art non-convex feature grouping methods when applied on a real-world breast cancer dataset, the 20-Newsgroups dataset and synthetic datasets.

Share

COinS