Access Type

Open Access Dissertation

Date of Award

January 2010

Degree Type


Degree Name



Computer Science

First Advisor

Ming Dong


Clustering is the unsupervised classification of data objects into different groups (clusters) such that objects in one group are similar together and dissimilar from another group. Feature selection for unsupervised learning is a technique that chooses the best feature subset for clustering. In general, unsupervised feature selection algorithms conduct feature selection in a global sense by producing a common feature subset for all the clusters. This, however, can be invalid in clustering practice, where the local intrinsic property of data matters more, which implies that localized feature selection is more desirable.

In this dissertation, we focus on cluster-wise feature selection for unsupervised learning. We first propose a Cross-Projection method to achieve localized feature selection. The proposed algorithm computes adjusted and normalized scatter separability for individual clusters. A sequential backward search is then applied to find the optimal (perhaps local) feature subsets for each cluster. Our experimental results show the need for feature selection in clustering and the benefits of selecting features locally.

We also present another approach based on Maximal Likelihood with Gaussian mixture. We introduce a probabilistic model based on Gaussian mixture. The feature relevance for an individual cluster is treated as a probability, which is represented by localized feature saliency and estimated through Expectation Maximization (EM) algorithm during the clustering process. In addition, the number of clusters is determined by integrating a Minimum Message Length (MML) criterion. Experiments carried out on both synthetic and real-world datasets illustrate the performance of the approach in finding embedded clusters.

Another novel approach based on Bayesian framework is successfully implemented. We place prior distributions over the parameters of the Gaussian mixture model, and maximize the marginal log-likelihood given mixing co-efficient and feature saliency. The parameters are estimated by Bayesian Variational Learning. This approach computes the feature saliency for each cluster, and detects the number of clusters simultaneously.