We are all familiar with basic statistical tests to proof that our results are statistically significant. Student’s t-test, ANOVA, Chi-square, Fisher’s exact test to name a few of the most commonly used ones. Any value (p-value) that is less than 0.05 is (often) considered statistically significant (or more accurately, the null-hypothesis is rejected). But do we, as scientist, use the proper statistical tests to determine that our results are statistically significant?
A recent perspective in Nature Neuroscience by Nieuwenhuis, Forstmann, and Wagenmakers argue that in many published results in behavioral, systems and cognitive neuroscience are incorrectly found to be statistically significant. In specific when comparing effect size in an experimental-control setting or comparing effect size pre- and post-testing or comparing different brain regions in respect to a particular effect.
… when making a comparison between two effects, researchers should report the statistical significance of their difference rather than the difference between their significance levels.
What does this mean?
Let’s use a comparative example to the on posted in the Guardian article by Ben Goldacre. We have a dividing cell. If we use a chemical X it seems that less cell in a cell culture divide. You also have a mutant cell line. How does the cell line respond to chemical X? There is 15% less cell division in the mutant cell culture compared to untreated. Nice finding. In the normal cell line, when treated with chemical X, 30% cell division is measured.
So we have just measured a statistically significant effect of chemical X on the normal cell line, and this was not observed in the mutant cell line. Can we therefore conclude that the normal cell line and the mutant cell line respond differently? No! To make this conclusion, you have to do an additional test: compare the “difference in differences”. If a 15% drop in cell division is not significant, a difference between mutant and normal under the same conditions is also 15% and thus not significant. Of course, this is a simplified hypothetical case, but this mistake was made in 79 of 513 reviewed publications, whereas only 78 of these publications used the correct statistical tests.
This perspective tells us that we need to know why we us which statistical test to support our findings and claim to have found a statistically significant result.
Link to Guardian article.
Link to original perspective in Nature Neuroscience.