Human Biology Open Access Pre-Prints

Document Type

Open Access Preprint

Anticipated Volume


Anticipated Issue



The Combined DNA Index System (CODIS) loci are a standard microsatellite marker set widely used for distinguishing among individuals in forensic DNA identity testing for medico-legal casework in the United States and in other countries. In anthropological genetic research, CODIS markers have become an important tool for uses extending beyond case investigations to quantify ancestry proportions, reveals patterns of admixture and trace population histories. These investigations are especially prevalent in studies of Latin American population structure. Nevertheless, the accuracy of the ancestry estimates computed from the CODIS loci for highly admixed Latino populations has not been formally tested. Long-standing arguments have been made that small ancestry panels, including the CODIS loci specifically, are not suitable for ancestry inference in admixed populations, due to the high heterozygosity and limited number of the loci used. Recent studies on ancestry inference using the CODIS loci suggest that these do confer more information of population-level identifiability than recognized in forensic genetic scholarship and by the medico-legal community. Here, we formally test the ability of CODIS and CODIS-Proxy (e.g. high heterozygosity and individual identifiability loci) marker panels to accurately estimate admixture proportions of individuals, including a sample of Latinos with a wide range of ancestry proportions. Using the same individuals in order to make direct comparisons of the outcomes, we produce ancestry estimates from 1) a small CODIS/CODIS Proxy loci panel and 2) a robust and validated microsatellite ancestry informative panel. We find evidence (e.g. ρ = 0.80 to 0.88) that supports the use of CODIS/CODIS-Proxy loci to capture the general ancestry estimation trends of a sample. This finding is in line with what studies using CODIS on Latin American populations have found, in that the ancestry estimations generated by CODIS present trends supported by documented population histories (e.g. colonialism and population movements) and microevolutionary events (e.g. gene flow) in Latin America. However, the present study also highlights the limitations of CODIS for making individual-level inferences of ancestry, as the associated estimates for an acceptable level of statistical confidence (95%) are demonstrated here to be too broad to make any nuanced inferences regarding the individual’s actual ancestry composition.