In this talk, I discuss two current projects tangentially related under the umbrella of regression.
The first part of the talk investigates informative missingness in the framework of recommender systems. For example, in 2009, Netflix ran a $1M prize competition to improve their algorithm to recommend movies to their viewers. In this setting, we can imagine a potential rating for every object-user pair. For Netflix, the object would be the movie. However, the vast majority of these pairs are missing. The goal of a recommender system is to predict these missing ratings in order to recommend an object that the user is likely to rate highly. A typically overlooked piece is that the combinations are not missing at random. A relationship between user ratings and their viewing history is expected, as human nature dictates the user would seek out and watch movies that they anticipate enjoying. We model this informative missingness, and place the recommender system in a shared-variable regression framework. We show that taking this additional information into account can aid in prediction quality.
The second part of the talk deals with a new class of prior distributions for linear regression, particularly the high dimensional case. In this setting, choice of prior distribution is notoriously difficult. Instead of placing a prior on the coefficients themselves, we place a prior on the regression R-squared. This is then distributed to the regression coefficients by decomposing it via a Dirichlet Distribution. It is more natural to solicit potential priors from a scientist via knowledge of R-Squared values in their previous studies, rather than knowledge on each regression coefficient itself. We call the new prior R2-D2 in light of its R-Squared Dirichlet Decomposition. In addition to its use in prior solicitation, we show that the R2-D2 prior can outperform existing shrinkage priors in the high dimensional case, in both theory and practice. In particular, compared to the state-of-the-art shrinkage priors, it can simultaneously achieve both higher prior concentration at zero, as well as heavier tails. These two properties combine to provide a higher degree of shrinkage on the irrelevant coefficients, along with less bias in estimation of the larger signals.