Statistical Analysis of Big Data and Structured Data with Application to Neuroscience
Haonan
Wang

Colorado State University

Thursday, November 17, 2016 - 3:30pm

In this talk, we consider two types of data from neuroscience: neuromorphology data and neuron activity data. First, we focus on  data extracted from brain neuron cells of rodents and model each neuron as a data object with topological and geometric properties characterizing the branching structure, connectedness and orientation of a neuron. We define the notions of topological and geometric medians as well as quantiles based on newly-developed curve representations. In addition, we take a novel approach to define the Pareto medians and quantiles through a multi-objective optimization problem. In particular, we study two different objective functions which measure the topological variation and geometric variation respectively. Analytical solutions are provided for topological and geometric medians and quantiles, and in general, for Pareto medians and quantiles, the genetic algorithm is implemented. The proposed methods are demonstrated in a simulation study and are also applied to analyze a real data set of pyramidal neurons from the hippocampus. Next, we model the neuron spiking activity through nonlinear dynamical systems. We adapt the Volterra series expansion of an analytic function to account for the point-process nature of multiple inputs and a single output (MISO) in a neural ensemble. Our model describes the transformed spiking probability for the output as the sum of kernel-weighted integrals of the inputs. The kernel functions need to be identified and estimated, and both local sparsity (kernel functions may be zero on part of their support) and global sparsity (some kernel functions may be identically zero) are of interest. The kernel functions are approximated by B-splines and a penalized likelihood-based approach is proposed for estimation. Even for moderately complex brain functionality, the identification and estimation of this sparse functional dynamical model poses major computational challenges, which we address with big data techniques that can be implemented on a single, multi-core server. The performance of the proposed method is demonstrated using neural recordings from the hippocampus of a rat during open field tasks.

(This is the joint work with Dr. Sienkiewicz, Professor Breidt and Professor Song.)

http://www.stat.colostate.edu/~wanghn/

Room 306, Statistics Building 1130