Document Type



We have newly constructed an ethnohistorical database consisting of 3460 records of ethnic locations and movements in Europe since 2200 B.C. Using this database, we computed vectors of proportions that peoples speaking various language families contributed to the gene pools of 2216 1° ✕ 1° land-based quadrats of Europe. From these vectors we computed ethnohistorical distances as arc distances between all pairs of quadrats. We used these distances as predictors of genetic distances, which we calculated independently from 26 genetic systems. We find significant partial correlations between ethnohistorical and genetic distances when geographic distance, a common causative factor, is held constant. Ethnohistorical distances explain a significant amount of the genetic variation observed in modern populations. These results are highly robust to simulated errors in and omissions from the ethnohistorical database. Randomization tests show that the historical sequence of the movements does not affect estimates of the ethnohistory-genetics correlation, but the geographic locations of movements do. We track the development of the ethnohistory-genetics correlation through time and show it to be gradual and cumulative over the past 4200 years.