Document Type

Open Access Article


Surnames (family names) show distinctive geographical patterning and in many disciplines remain an underutilized source of information about population origins, migration and identity. This paper investigates the geographical structure of surnames, using a unique individual level database assembled from registers and telephone directories from 16 European countries. We develop a novel combination of methods for exhaustively analyzing this multinational data set, based upon the Lasker Distance, consensus clustering and multidimensional scaling. Our analysis is both data rich and computationally intensive, entailing as it does the aggregation, clustering and mapping of 8 million surnames collected from 152 million individuals. The resulting regionalization has applications in developing our understanding of the social and cultural complexion of Europe, and offers potential insights into the long and short-term dynamics of migration and residential mobility. The research also contributes a range of methodological insights for future studies concerning spatial clustering of surnames and population data more widely. In short, this paper further demonstrates the value of surnames in multinational population studies and also the increasing sophistication of techniques available to analyze them.