Coauthorship and Citation Networks for Statisticians

The University of Georgia

Thursday, October 2, 2014 - 3:30pm

We collect the coauthor and citation data for all research papers published in four of the top journals in statistics between 2003 and 2012, analyze the data from several different perspectives (e.g., patterns, trends, community structures) and present an array of interesting findings. (1) Both the average numbers of papers per author published in these journals and the fraction of self citations have been decreasing, but the proportion of distant citations has been increasing. These findings suggest that the statistics community has become increasingly more collaborative and competitive, and that access to published works have become increasing easier, driven by the boom of online resources and search engines. (2) The analysis suggests “Variable Selection”, “Large-scale Multiple Testing”, and “Covariance Matrix Estimation” as three of the “hot areas” in statistics, and also identifies the most prolific and collaborative authors, as well as the most highly cited authors and papers.  (3) We also identify a handful of meaningful communities, including large-size communities such as  “high-dimensional data”, “large-scale multiple testing” as well as small-size  communities such as the “Dimensional Reduction”,  “Objective Bayes”,  and “Theoretical Machine Learning”. Our findings shed light on research habits, trends, and topological patterns of statisticians, and our data sets provide a fertile ground for future researches on or related to social networks of statisticians.

Joint work with Jiashun Jin at CMU.

More information about Pengsheng Ji may be found at