Motivation: Statistical tests for the detection of differentially expressed genes lead to a large collection of p-values one for each gene comparison. Without any further adjustment, these pvalues may lead to a large number of false positives, simply because the number of genes to be tested is huge, which might mean wastage of laboratory resources. To account for multiple hypotheses, these p-values are typically adjusted using a single step method or a step-down method in order to achieve an overall control of the error rate (the so called familywise error rate). In many applications, this may lead to an overly conservative strategy leading to too few genes being flagged.

Results: In this paper we use the empirical Bayes estimation technique to screen a large number of p-values. In effect, each case borrows strength from an overall picture of the alternative hypotheses computed from all the p-values, while the entire procedure is calibrated in such a way that the familywise error rate at the complete null hypothesis is still controlled by a step-down method. It is shown that the empirical Bayes screening has substantially higher sensitivity than the standard step-down approach at the cost of a small increase in the FDR. Thus, the procedure is particularly useful in situation where it is important to identify all possible potentially significant cases which can be subjected to further confirmatory testing in order to eliminate the false positives. We illustrated this screening procedure using a dataset on human colorectal cancer.

This novel empirical Bayes procedure is advantageous over our earlier proposed empirical Bayes adjustments due to the following reasons: (i) since it applies to the p-values, the tests don't have to be t-tests; in particular they could be F-tests which might arise in certain ANOVA formulation with expression data or even nonparametric tests, (ii) the empirical Bayes adjustment uses nonparametric function estimation techniques to estimate the marginal density of the transformed p-values rather than using a parametric model for the prior distribution and is therefore robust against model mis-specification, (iii) since the null (marginal) distribution of each p-value is uniform, a simple resampling scheme can be proposed for the step-down procedure. Availability: R code for the empirical Bayes screening of multiple p-values is available from the authors upon request.

TR Number: 
2004-04
Susmita Datta and Somnath Datta

To request a copy of this report, please email us. We will send you a pdf copy if one is available.