Jung Ae Lee

PhD Candidate, Statistcs

Sample Integrity in High Dimensional Data

This dissertation consists of two parts for the topic of sample integrity in high dimensional data. The first part focuses on batch effect in gene expression data. Batch bias has been found in many microarray studies that involve multiple batches of samples. Currently available methods for batch effect removal are mainly based on gene-by-gene analysis. There has been relatively little development on multivariate approaches to batch adjustment, mainly because of the analytical difficulty that originates from the high dimensional nature of gene expression data.

Major Professor(s): 
Dr. Jeongyoun Ahn
Thursday, August 29, 2013 - 3:00pm
Poultry Science Building, Room 240

Eric Vance

Virginia Tech

LISA 2020: Creating A Network of Statistical Collaboration Laboratories
To celebrate the International Year of Statistics, and sponsored by a Google Research Award, LISA—The Laboratory for Interdisciplinary Statistical Analysis at Virginia Tech—is partnering with universities and individuals around the world to create a network of 20 new statistical collaboration laboratories in developing countries by 2020.

LISA and its partners will educate and train statisticians from developing countries to communicate and collaborate with non-statisticians and then support these statisticians to create statistical collaboration laboratories in their home countries to help researchers, government officials, local industries, and NGOs apply statistical thinking and data science to make better decisions through data.

Thursday, August 22, 2013 - 3:30pm

Xiaotong Shen

University of Minnesota

Personalized Information Filtering

Personalized information filtering extracts the information specifically relevant to a user, based on the opinions of users who think alike or the content of the items that a specific user prefers.  In this talk, we discuss latent models to utilize additional user-specific and content-specific predictors, for personalized prediction.  In particular, we factorize a user-over-item preference matrix into a product of two matrices, each having the same rank as the original matrix.

Thursday, August 29, 2013 - 3:30pm

Yao Xie

Georgia Tech

High-Dimensional Change-Point Detection
Yao Xie joined Georgia Institute of Technology as an Assistant Professor in the H. Milton Stewart School of Industrial & Systems Engineering in August 2013. Prior to that, she worked as a Research Scientist at Duke University in the Department of Electrical and Computer Engineering, after receiving her Ph.D. in Electrical Engineering (minor in Mathematics) from Stanford University in 2011. She is interested in sequential statistical methods, statistical signal processing, big data analysis, compressed sensing, optimization, and has been involved in applications to wireless communications, sensor networks, medical and astronomical imaging.

How do we quickly detect small solar flares in a large video stream generated by NASA satellites? How do we improve detection by efficient representation of high-dimensional data that is time-varying? Besides astronomical imaging, high-dimensional change-point detection also arises in many other applications including computer network intrusion detection, sensor networks, medical imaging, and epidemiology.

Thursday, September 5, 2013 - 3:30pm

Eric Kolaczyk

Boston University

Estimating Network Degree Distributions from Sampled Networks: An Inverse Problem

Networks are a popular tool for representing elements in a system and their interconnectedness. Many observed networks can be viewed as only samples of some true underlying network. Such is frequently the case, for example, in the monitoring and study of massive, online social networks. We study the problem of how to estimate the degree distribution -- an object of fundamental interest -- of a true underlying network from its sampled network. In particular, we show that this problem can be formulated as an inverse problem.

Thursday, September 12, 2013 - 3:30pm

Sun-Young Hwang

Sookmyung Women's University

Martingale Estimating Functions: Asymptotic Optimality

Various estimation methods in time series are reviewed in a unified framework via martingale estimating functions. In particular, maximum likelihood and quasi-likelihood are discussed in the context of asymptotic optimality within certain estimating functions. Both ergodic and non-ergodic processes are considered. To illustrate the main results, various parameter estimates for GARCH processes, bifurcating and explosive AR processes, conditionally linear autoregressive processes, and branching Markov processes are presented.

Thursday, September 26, 2013 - 3:30pm

Heping Zhang

Yale University

Genetic Studies of Comorbidity

In psychiatric and behavioral research, about six out of ten people with a substance use disorder suffer from another form of mental illness as well, making it necessary to consider multiple conditions as we study the etiologies of these conditions. The occurrence of multiple disorders in the same patient is referred to as comorbidity. Identifying the risk factors for comorbidity is an important yet difficult topic in psychiatric research. The effort of studying the genetics for comorbidity can be traced back to a century ago.

Thursday, October 3, 2013 - 3:30pm

Tianxi Cai

Harvard University

Systematic Approaches to Subgroup Treatment Selection

Clinical trials that evaluate treatment benefit focus primarily on estimating the average benefit. However, a treatment reported to be effective may not be beneficial to all patients. For example, the benefit of giving chemotherapy prior to hormone therapy with Tamoxifen in the adjuvant treatment of postmenopausal women with lymph node negative breast cancer depends on the ER-status. Due to the toxicity of chemotherapy, it is crucial to identify patients who will and will not benefit from chemotherapy. This gives rise to the need of accurately predicting benefit based on important markers.

Thursday, October 10, 2013 - 3:30pm

Jianhua Hu

The University of Texas MD Anderson Cancer Center

Transformed Low-Rank ANOVA Models for High Dimensional Variable Selection

For high dimensional genetic data, an important problem is to search for associations between genetic variables and a phenotype---typically, a discrete variable (diseased versus normal). A conventional solution is to characterize such relationships through regression models in which a phenotype is treated as the response variable and genetic variables are treated as the covariates. Not surprisingly, such a way incurs the challenging problem of the number of variables much larger than the number of observations.

Thursday, October 17, 2013 - 3:30pm


Subscribe to RSS - Colloquium