Conventional clustering algorithms are restricted for use with data containing ratio or interval scale variables; hence, distances are used. As social studies require merely categorical data, the literature is enriched with more complicated clustering techniques and algorithms of categorical data. These techniques are based on similarity or dissimilarity matrices. The algorithms are using density based or pattern based approaches. A probabilistic nature to similarity structure is proposed. The entropy dissimilarity measure has comparable results with simple matching dissimilarity at hierarchical clustering. It overcomes dimension increase through binarization of the categorical data. This approach is also functional with the clustering methods, where a- priori cluster number information is available.
Çilingtürk, A Mete and Ergüt, Özlem
"Hierarchical Clustering with Simple Matching and Joint Entropy Dissimilarity Measure,"
Journal of Modern Applied Statistical Methods:
1, Article 21.
Available at: http://digitalcommons.wayne.edu/jmasm/vol13/iss1/21