The Statistics Department hosts weekly colloquia on a variety of statistcal subjects, bringing in speakers from around the world.
One situation that arises in the field of functional data analysis is the use of imaging data or other very high dimensional data as predictors in regression models. A motivating example involves using baseline images of a patient's brain to predict the patient's clinical outcome. Interest lies both in making such patient-specific predictions and in understanding the relationship between the imaging data and the outcome. Obtaining meaningful fits in such problems requires some type of dimension reduction but this must be done while taking into account the particular (spatial) structure o
The Dipsea is a 100 year old 8 mile running event that starts in Mill Valley CA and ends at the Pacific Ocean near Stinson Beach. What makes the event unique is its handicap system. Each age group for men and women receive a handicap time. For example, the slowest group, the AAA group, comprised of men 74 years old and older, boys 6 and under, women 66 and older, and girls 7 and under, receive a 25 minute handicap. But what makes the event unique is that each group starts ahead of the scratch group by that amount. So first to leave, at 8:30 AM, is the AAA group.
This event has been canceled due to inclement weather. Please stay tuned for future updates.
Bayesian methods for statistical analyses require a different interpretation of probability than traditional “frequentist” methods. The use of Bayesian methods is increasingly common and its flexibility has facilitated a wide range of scientific advances, especially in medicine.
The analysis of functional neuroimaging data often involves the simultaneous testing for activation at thousands of voxels, leading to a massive multiple testing problem. This is true whether the data analyzed are time courses observed at each voxel or a collection of summary statistics such as statistical parametric maps (SPMs). It is known that classical multiplicity corrections become strongly conservative in the presence of a massive number of tests.
This dissertation consists of two parts for the topic of sample integrity in high dimensional data. The first part focuses on batch effect in gene expression data. Batch bias has been found in many microarray studies that involve multiple batches of samples. Currently available methods for batch effect removal are mainly based on gene-by-gene analysis. There has been relatively little development on multivariate approaches to batch adjustment, mainly because of the analytical difficulty that originates from the high dimensional nature of gene expression data.
LISA and its partners will educate and train statisticians from developing countries to communicate and collaborate with non-statisticians and then support these statisticians to create statistical collaboration laboratories in their home countries to help researchers, government officials, local industries, and NGOs apply statistical thinking and data science to make better decisions through data.
Personalized information filtering extracts the information specifically relevant to a user, based on the opinions of users who think alike or the content of the items that a specific user prefers. In this talk, we discuss latent models to utilize additional user-specific and content-specific predictors, for personalized prediction. In particular, we factorize a user-over-item preference matrix into a product of two matrices, each having the same rank as the original matrix.
How do we quickly detect small solar flares in a large video stream generated by NASA satellites? How do we improve detection by efficient representation of high-dimensional data that is time-varying? Besides astronomical imaging, high-dimensional change-point detection also arises in many other applications including computer network intrusion detection, sensor networks, medical imaging, and epidemiology.