Document Type


Open Access Pre-Print

Open Access Preprint


Since Frank Livingstone proposed the idea that there are no races, only clines, in 1962, little has changed in how anthropologists study and, ultimately, estimate ancestry. How we talk about the study of human variation may have changed—shifting away from “racial” labels and toward those of supposed ancestral origins—but the methods we use to label and analyze groups, however termed, have remained the same. The author suggests a new theoretical approach to ancestry estimation that does not rely on group labels, using the Howells Craniometric Data Set as an example. In the suggested workflow, the data structures itself into natural clusters, referred to as “morphogroups,” without relying on a group label. Each morphogroup is explored for subgroups, and the process is repeated until no further distinctions can be made. At each level an individual is compared to the morphogroup in a descriptive manner, focusing on similarities and differences. Lastly, a multi-iteration classification procedure, using random forest modeling, classifies by morphogroup. In this test, hierarchical clustering identifies the optimal number of natural clusters within the data, and principal components analysis is used to explore morphogroups. (The author provides a markdown file of all code used, at https://rpubs.com/kenyhercz2/717620.) Using this suggested workflow, the author identifies three main morphogroups in the Howells data set, each with different numbers of subclusters ranging from 0 to 8. Morphogroup correct classifications are typically in the mid-90% range, and the accompanying sex estimations, between 93% and 100% correct. The author emphasizes that this is but one of myriad ways ancestry could be estimated. Human variation and identity are not static, and we should help one another rethink and redefine what is possible for our field.