Multiple types of (epi)genetic measurements are involved in the development and progression of complex diseases. Different types of (epi)genetic measurements are interconnected, and modeling their associations can lead to a better understanding of disease biology and facilitate building clinically useful models. Such analysis is challenging in multiple aspects. To fix notations, we use gene expression (GE) and copy number variation (CNV) as an example. Both GE and CNV measurements are high-dimensional. One GE is possibly regulated by multiple CNVs, however, the set of relevant CNVs is unknown. For a specific GE, the cis-acting CNV usually has the dominant effect and can behave differently from the trans-acting CNVs. In addition, GE measurements can have long tails and contamination. Lastly, some CNVs are more tightly connected to each other than the rest. In this study, a novel method is developed to more effectively model the associations between (epi)genetic measurements. For each GE, a partially linear model is assumed with a nonlinear effect for the cis-acting CNV. A robust loss function is adopted to accommodate long-tail distributions and data contamination. We adopt penalization to accommodate the high dimensionality and select relevant CNVs. A network structure is introduced to account for the interconnections among CNVs. We develop a computational algorithm and rigorously establish the consistency properties. Simulation shows the superiority of proposed method over alternatives. The analysis of a TCGA (The Cancer Genome Atlas) dataset demonstrates the practical applicability of proposed method.
This will be a joint Stat-Biostat seminar.