Massive increases in the amount of data scientists are able to acquire and analyze over the past two decades have driven the development of new statistical tools that can better deal with the challenges of “big data.” One such set of tools is ways of controlling the “false discovery rate” (FDR) in a set of statistical tests. FDR is simply the mean proportion of statistically significant test results that are really false positives. As you may recall from your introductory statistics course, when you perform multiple statistical tests, the probability of false positive results rapidly increases. For example, if one were to perform a single test using an alpha level of 5% and there truly is no effect of the factor being tested, there is only a 5% chance of a false positive result. However, if one were to perform 10 tests using an alpha level of 5% for each test, there is a 40% chance of one or more false positive results (again assuming that the factor being analyzed has no effect). FDR control, like Bonferroni correction, reduces the probability of false positive results by using a more conservative alpha level for each test. The advantage of FDR over Bonferroni correction is that FDR is generally more powerful (i.e., better at detecting true effects) than Bonferroni. This stems from the fact that Bonferroni correction is effectively designed to prevent any false positives, but FDR control is designed to prevent a large proportion of false positives and can afford to be less conservative. In other words, FDR control simply attempts to ensure that the great majority of statistically significant results are accurate but generally lets in a small proportion of false positives in the process.
Because of FDR control’s relatively good statistical power and its ease of application, it has rapidly become commonplace in neuroscience. However, from talking to other researchers, I fear that many people who use FDR control do not understand the following simple, but important facts:
1. There are multiple FDR control algorithms: Although some papers refer to FDR control as if there is only one procedure for controlling FDR, there are actually several different algorithms with different strengths and weaknesses (for reviews see Farcomeni, 2007; Groppe, Urbach, & Kutas, 2011a; Romano & Shaikh, 2006). The most popular is the algorithm derived by Benjamini & Hochberg in 1995, which is relatively easy to implement and statistically powerful. When you use FDR control, be sure to cite which algorithm you’re using.
2. The most popular FDR algorithm is not generally guaranteed to work: Benjamini and Hochberg’s FDR control algorithm (BH-FDR) is only guaranteed to control the false discovery rate when the statistical tests it is applied to are independent or exhibit “positive regression dependency.” When the data being analyzed are normally distributed, positive regression dependency means that none of the variables tested are negatively correlated. Note, that although BH-FDR is not generally guaranteed to work, Clark and Hall (2009) have shown that for normally distributed data BH-FDR will accurately control FDR as the number of tests it is applied to grows. Indeed, several studies using simulated data that violate BH-FDR’s assumptions have shown that in practice, it still works quite well (e.g., Groppe, Urbach, & Kutas, 2011b). That being said, there is at least one FDR control algorithm that is always guaranteed to work. It was derived by Benjamini and Yekutieli (2001), but is not commonly used because it is much more conservative than the BH-FDR procedure.
Note that the BH-FDR algorithm is not the only FDR control algorithm that is not always guaranteed to work. Storey’s “positive FDR” algorithm (pFDR), implemented by MATLAB’s mafdr.m function, can fail when there are dependencies among the variables being tested. Indeed one set of simulation studies found that while BH-FDR was robust to such dependencies, pFDR produces too many false positives (Kim & van de Wiel, 2008).
3. FDR control provides the same degree of assurance as Bonferroni correction that there is indeed some effect: If, after performing an appropriate FDR control procedure, you obtain some statistically significant results, you can be as certain as if you had performed Bonferroni correction that there is indeed at least one true positive in your set of tests. In other words, if you control FDR at an alpha level of 5% and in truth there is no effect of the factor you’re analyzing at any variable, that means there is only a 5% chance that you will get any erroneously significant results. Given that FDR control is usually more powerful than Bonferroni correction, this means that FDR control is a much better tool for simply determining if there is an effect or not. However, the disadvantage of FDR control is that because some false positives are allowed, you cannot be certain that any single significant result is accurate. If it is important to establish the significance of every single test result, a technique like Bonferroni correction or a permutation test that provides “strong control of the family-wise error rate” is necessary.
4. Increasing the number of tests in your analysis can actually produce more significant results: Intuitively, the more individual tests in your set of analyses, the greater the chances of getting a false positive test result and, thus, the more stringently you should correct for multiple comparisons. This is the way Bonferroni correction works, but it isnot necessarily true of FDR control. Adding more tests into your FDR control procedure can actually make it less conservative if the added tests exhibit an effect (i.e., the added test are likely to have small p values). This is because when tests are added that are very likely
true discoveries, FDR control can let in more false discoveries since it is simply controlling the proportion of significant results that are false (see Table 1 for a concrete example). Consequently, one should exclude known effects from an analysis when using FDR control as the known effects will make you less certain about the presence of any other effects.
P.S. If you want to know more about false discovery rate control and other contemporary techniques for correcting for multiple comparisons (e.g., permutation tests). I have a review paper that might be of use.
Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B (Methodological), 57(1), 289–300.
Benjamini, Y., & Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Annals of Statistics, 1165–1188.
Clarke, S., & Hall, P. (2009). Robustness of multiple testing procedures against dependence. Annals of Statistics DOI: 10.1214/07-AOS557
Farcomeni, A. (2007). A review of modern multiple hypothesis testing, with particular attention to the false discovery proportion. Statistical Methods in Medical Research DOI:10.1177/0962280206079046
Groppe, D. M., Urbach, T. P., & Kutas, M. (2011a). Mass univariate analysis of event‐related brain potentials/fields I: A critical tutorial review. Psychophysiology DOI:10.1111/j.1469-8986.2011.01273.x
Groppe, D. M., Urbach, T. P., & Kutas, M. (2011b). Mass univariate analysis of event‐related brain potentials/fields II: Simulation studies. Psychophysiology DOI:10.1111/j.1469-8986.2011.01272.x
Kim, K. I., & van de Wiel, M. A. (2008). Effects of dependence in high-dimensional multiple testing problems. BMC Bioinformatics, 9(1), 114.
Romano, J. P., & Shaikh, A. M. (2006). On stepdown control of the false discovery proportion. Institute of Mathematical Statistics Lecture Notes – Monograph Series DOI:10.1214/074921706000000383
Storey, J. D. (2002) A direct approach to false discovery rates. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 64, no. 3 (2002): 479-498.