"Validating Functional Mechanisms For Non-Coding Genetic Variants Associated With Comp . . ." by Cynthia Ann Kalita

Access Type

Open Access Dissertation

Date of Award

January 2018

Degree Type

Dissertation

Degree Name

Ph.D.

Department

Molecular Biology and Genetics

First Advisor

Francesca Luca

Second Advisor

Roger Pique-Regi

Abstract

Genome-wide association studies (GWAS) have identified a large number of genetic variants associated with disease as well as normal phenotypic variation for complex traits. However challenges remain in determining the functional relevance of human DNA sequence variants. Even after fine mapping, most variants are located in non-coding regions making it difficult to infer mechanisms linking individual genetic variants with the disease trait. In addition, we do not know under which environmental conditions the sequence variants have a functional impact, and whether they become one of many factors involved in complex phenotypes at the organismal level.

Chapter 1 describes computational methods to predict causal GWAS variants, validation of a subset for ASE using a traditional reporter assay, and development of a method to identify ASE in high throughput assays. These methods improved positive detection of enhancer activity and ASE, and this analysis pipeline will continue to be useful as more researchers begin using high throughput assays to identify allelic effects. Chapter 2 improves upon chapter 1 with the development of a new modification of STARR-seq in order to streamline the assay and improve power to detect ASE through the addition of an UMI. Additionally, by integrating BiT-STARR-seq with a high throughput allele-specific EMSA, we are able to identify the mechanism behind many ASE variants.

Studying GxE in human studies is extremely difficult, so our approach of using an in-vitro method and modeling molecular phenotypes is a useful alternative. Chapter 3 describes the investigation of GxE with complex traits. Using GEMMA, we were able to identify environments that were enriched for complex traits. With ATAC-seq data we were able to identify differentially accessible regions, TF footprints, and differential TF footprints. Integrating this data with BiT-STARR-seq, we were able to identify enrichment for these differential chromatin accessibility regions with ASE.

Overall, these chapters show the integration of computational predictions with experimental validation in order to identify allelic effects. This design is a useful approach to validate the molecular mechanism for specific transcription factors, and link these to the context of human health.

Share

COinS