Document Type

Open Access Article


We analyze the geographic location of 77,451 different Italian surnames (17,579,891 individuals) obtained from the lists of telephone subscribers of the year 1993.

By using a specific neural network analysis (Self-Organizing Maps, SOMs), we automatically identify the geographic origin of 49,117 different surnames. To validate the methodology, we compare the results to a study, previously conducted, on the same database, with accurate supervised methods. By comparing the results, we find an overlap of 97%, meaning that the SOMs methodology is highly reliable and well traces back the geographic origin of surnames at the time of their introduction (Late Middle Ages/Renaissance in Italy).

SOMs results enables one to distinguish monophyletic surnames from polyphyletic ones, that is surnames having had a single geographic and historic origin from those that started to be in use, with an identical spelling, in different locations (respectively, 76.06% and 21.05% of the total). As we are interested in geographic origins, polyphyletic surnames are excluded from further analyses.

By comparing the present location of each monophyletic surname to its inferred geographic origin in late Middle Ages/Renaissance, we measure the extent of the migrations having occurred in Italy since that time. We find that the percentage of individuals presently living in the very area where their surname started to be in use centuries ago is extremely variable (ranging from 22.77% to 77.86% according to the province), thus meaning that self-assessed regional identities seldom correspond to the "autochthony" they imply. For example the upper part of the Thyrennian coast (Northern Latium, Tuscany) has a strong identity but few "autochthonous" inhabitants (28%) having been a passageway from the North to the South of Italy.