The genetic etiology of complex diseases in human has been commonly viewed as a complex process involving both genetic and environmental factors functioning in a complicated manner. Quite often the interactions among genetic variants play major roles in determining the susceptibility of an individual to a particular disease. Statistical methods for modeling interactions underlying complex diseases between single genetic variants (e.g. single nucleotide polymorphisms or SNPs) have been extensively studied. Recently, haplotype-based analysis has gained its popularity among genetic association studies. When multiple sequence or haplotype interactions are involved in determining an individual's susceptibility to a disease, it presents daunting challenges in statistical modeling and testing of the interaction effects, largely due to the complicated higher order epistatic complexity.
In this article, we propose a new strategy in modeling haplotype-haplotype interactions under the penalized logistic regression framework with adaptive L1-penalty. We consider interactions of sequence variants between haplotype blocks. The adaptive L1-penalty allows simultaneous effect estimation and variable selection in a single model. We propose a new parameter estimation method which estimates and selects parameters by the modified Gauss-Seidel method nested within the EM algorithm. Simulation studies show that it has low false positive rate and reasonable power in detecting haplotype interactions. The method is applied to test haplotype interactions involved in mother and offspring genome in a small for gestational age (SGA) neonates data set, and significant interactions between different genomes are detected.
As demonstrated by the simulation studies and real data analysis, the approach developed provides an efficient tool for the modeling and testing of haplotype interactions. The implementation of the method in R codes can be freely downloaded from http://www.stt.msu.edu/~cui/software.html.
Li et al.: Mapping Haplotype-haplotype Interactions with Adaptive LASSO. BMC Genetics 2010 11:79. 25. 24. Cockerham CC: An extension of the concept of partitioning hereditary variance for analysis of covariances among relatives when epistatis is present. Genetics 1954, 39:859-882. Tibshirani R: Regression shrinkage and selection via the lasso. J Royal Statist Soc B 1996, 58(1):267-288. Fu W: Penalized regressions: the Bridge versus the Lasso. J Computational and Graphical Statistics 1998, 7(3):397-416. Efron B, Hastie T, Johnstone I, Tibshirani R: Least Angle Regression. Annals of Statistics 2004, 32(2):407-499. 26. 27. 28. Bertsekas DT, Tsitsiklis JN: Parallel and Distributed Computation: Numerical Methods. Prentice Hall, Englewood Cliffs, NJ, USA 1989. Shevade SK, Keerthi SS: A simple and efficient algorithm for gene selection using sparse logistic regression. Bioinformatics 2003, 19(17):2246-53. 29. 30. Barrett JC, Fry B, Maller J, Daly MJ: Haploview: analysis and visualization of 31. 32. 33. LD and haplotype maps. Bioinformatics 2005, 21(2):263-5. Lou XY, Chen GB, Yan L, Ma J, Zhu J, Elston R, Li MD: A generalized combinatorial approach for detecting gene-by gene and gene-by- environment interactions with application to Nicotine Dependence. Am J Hum Genet 2007, 80:1125-1137. Lawlor DA, Gaunt TR, Hinks LJ, Davey SG, Timpson N, Day IN, Ebrahim S: The association of the PON1 Q192R polymorphism with complications and outcomes of pregnancy: findings from the British Womenâ€™s Heart and Health cohort study. Paediatr Perinat Epidemiol 2006, 20(3):244-50. Kaipainen A, Korhonen J, Pajusola K, Aprelikova O, Persico MG, Terman BI, Alitalo K: The related FLT4, FLT1, and KDR receptor tyrosine kinases show distinct expression patterns in human fetal endothelial cells. J Exp Med 1993, 178(6):2077-88.