Bayesian Clustering of Functional Data Using Local Features

North Carolina State University

Thursday, March 3, 2016 - 3:30pm

Functional data arise frequently especially in today’s big data regime in diverse contexts including patient monitoring in medical treatments, weather analysis and in general, in everything that produces observations nearly continuous in time. Clustering of data is a fundamental tool in understanding similarities and dissimilarities between units in the data.  Bayesian methods for clustering of functional data use models which imply the belief that some observations are realizations from some signal plus noise models with identical underlying signal functions, which seem to be an overly simplistic assumption. We employ a model for clustering of functional data that does not assume that any of the signal functions are truly identical, but possibly share many of their local features, represented by coefficients in a multiresolution wavelet basis expansion. We cluster each wavelet coefficient of the signal functions using conditionally independent Dirichlet process priors. We describe efficient Markov chain Monte Carlo techniques for computing the posterior distribution and a method of identifying the posterior expected cluster. The clustering method can be viewed as a procedure for selecting one among uncountably many models, and hence studying its frequntist properties requires new yardsticks different from traditional model selection optimality formulation. We show that under a suitable asymptotic regime, posterior probabilities of neighborhoods of the true model with respect to the product-topology converge to one. We demonstrate the proposed method using an electroencephalography dataset on seizure activity and the popular Canadian weather data.

This talk is based on joint work with Adam Suarez.