Bayesian Variable Selection in the Presence of Multicollinearity

The University of Iowa

Thursday, January 16, 2014 - 3:30pm

In this talk we will give a quick overview of some of the strengths and challenges in Bayesian variable selection as it evolved over the last two decades. We will then discuss two specific problems in linear regression with strong multicollinearity among the covariates.

A variety of Markov chain Monte Carlo algorithms  have been proposed in the literature for Bayesian variable selection in linear regression. The computation for these algorithms can be daunting for a large number of covariates. Ghosh and Clyde proposed an orthogonalization method to exploit the properties of orthogonal design matrices. Their algorithm can scale up the computation tremendously and provide estimates of quantities of interest for the original non-orthogonal problem. In this talk we introduce a class of new "sandwich" algorithms for Bayesian variable selection which are theoretically guaranteed to be at least as good as the algorithm of Ghosh and Clyde. We illustrate via simulation studies and real data analysis that this new class of algorithms can offer substantial gains when there is strong linear dependence among the covariates.

The second part of the talk focuses on the median probability model (MPM) of Barbieri and Berger. The MPM  includes all covariates with posterior marginal inclusion probabilities greater than or equal to 0.5, and it is the optimal predictive model under certain conditions. We use toy examples with Zellner's g-prior to gain some theoretical insight about the behavior of the posterior distribution over models when the design matrix exhibits high multicollinearity. The results suggest that if there are three or more important, strongly correlated covariates, the MPM could potentially discard all of them. Using simulation studies and real data, we illustrate that several other popular priors may also be adversely affected by multicollinearity. However, a routine examination of the joint inclusion probabilities for correlated covariates can help practitioners to cope with the problem.


More information on Joyee Ghosh may be found at