Human Biology Open Access Pre-Prints

Document Type

Open Access Preprint

Anticipated Volume


Anticipated Issue



Surnames (family names) show distinctive geographical patterning and remain an underutilised source of information about population origins, migration and identity. In this paper we investigate the geographical structure of surnames in 16 European countries through the use of the Lasker Distance, consensus clustering and multidimensional scaling. Our analysis is both data rich and computationally intensive, entailing as it does the aggregation, clustering and mapping of 8 million surnames collected from 152 million individuals. The resulting regionalisation demonstrates the utility of an innovative inductive approach to summarising and analysing large population datasets across cultural and geographic space, the outcomes of which can provide the basis to hypothesis generation about social and cultural patterning and the dynamics of migration and residential mobility in Europe. The research also contributes a range of methodological insights for future studies concerning spatial clustering of surnames.