Access Type

Open Access Dissertation

Date of Award

January 2012

Degree Type

Dissertation

Degree Name

Ph.D.

Department

Computer Science

First Advisor

Ming Dong

Abstract

Today, digital data is accumulated at a faster than ever speed in science, engineering, biomedicine, and real-world sensing. Data mining provides us an effective way for the exploration and analysis of hidden patterns from these data for a broad spectrum of applications. Usually, these datasets share one prominent characteristic: tremendous in size with tens of thousands of objects and features. In addition, data is not only collected over a period of time, but the relationship between data points can change over that period too. Besides, knowledge is very sparsely encoded because the patterns are usually active only in a local area. The ubiquitous phenomenon of massive, dynamic, and sparse data imposes considerable challenges in data mining research. Recently, techniques that can expand the human ability to comprehend

large-scale data have attracted significant attention in the research community.

In this dissertation, we present approaches to solve the problems of complex data analysis in various applications. Specifically, we have achieved the following: 1) we develop Exemplar-based low-rank sparse Matrix Decomposition (EMD), a novel method for fast clustering large-scale data by incorporating low-rank approximations into matrix decomposition-based clustering; 2) we propose ECKF, a general model for large-scale Evolutionary Clustering based on low-rank Kernel matrix Factorization; by monitoring the low-rank approximation errors at every time step, ECKF can analyze if the underlying structure of the data or the nature of the relationship between the data points has changed over different time steps; based on this, a decision to either succeed the previous clustering or perform a new clustering is made; 3) we propose a Multi-level Low-rank Approximation (MLA) framework for fast spectral clustering, which is empirically shown to cluster large-scale data very efficiently; 4) we extend the MLA framework with a non-linear kernel and apply it to HD image segmentation; with sufficient data samples selected by fast sampling strategy, our method shows superior performance compared with other leading approximate spectral clusterings; 5) we develop a fast algorithm to detect abnormal crowd behavior in surveillance videos by employing low-rank matrix approximations to model crowd behavior patterns; through experiments performed on simulation crowd

videos, we demonstrate the effectiveness of our method.

Share

COinS