Problem 1: Underpowered studies
Collecting data is expensive, not only
in terms of money but also in terms
of time, effort, technician hours, materials, lab space and so on. However,
operating with sample sizes too small
for the effect in question is dangerous. Most people think that low power should only affect your chance of
missing a true effect (i.e. beta, your
type II error rate, or more precisely:
the probability of incorrectly retaining
the null hypothesis when it is false).
And that is indeed the case. Power
(1 – beta) gives you the probability of
recognizing an effect when it is indeed
there (i.e. the probability of correctly
rejecting the null hypothesis). Low
power does not affect your type I error rate, which is given by your alpha
value, set by you in advance. So you
Why? There is more to consider than
the type I error rate or the p-value of
your study. The p-value tells you: Given
there is no effect, how likely is the data
you observed (i.e. if the null hypothesis
is true, what are the chances that you
will obtain the actually observed effect or larger). You will reject the null
hypothesis only if your data was very
unlikely. But as someone that reads
through the scientific literature in a
particular area, you want to know: If
this study tells me there is an effect,
how likely is it that there actually is an
effect? This is called positive predictive
value (PPV) of a study**. Please note
that the p-value tells you how likely
your data is given there is no effect.
The PPV tells you how likely the effect
is true, given the study (which is closer
to what we actually want to know).
Let us further explore the difference
that field? Since ideal homeopathic
remedies are identical to placebo you
will end up with 100 % false positive
studies, i.e. a positive predictive value
of a positive study of 0 (the homeopathy example is from here). That means
if you read a positive study from that
field showing a significant effect this
does not increase the chance of that
effect being true (admittedly this is an
extreme case, but you get the point).
When it comes to your own studies,
you want the positive predictive value
to be high.
The positive predictive value of a study
is higher
1. the more likely your effect, i.e. the
ratio of investigated true and false
hypotheses (which is very hard to estimate),
2. the lower your alpha value, and
3. the higher your power (and that is
crucial).
In this web app you can play with
the parameters a bit and see how
the PPV changes. Learn more about
the PPV and the problem of underpowered studies here and here. The
average PPV in neuroscience is probably somewhere around 50 % given
well-intentioned estimates of average
power and likelihood of an effect. That
means, even if the researcher conducts a perfectly unbiased and honest
study, the chances that a demonstrated effect is actually real is as likely as
winning a coin toss.
Problem 2: Scientific misconduct
Outcomes of the replication initiative hardly resembled the original studies. Almost all original studies had been reported ‘significant’ (97 out of 100) with p-values < .05, the traditional threshold for significance. P-values in the replications, however, were spread all
across the range between 0 and 1 and were mostly ‘not significant’ (p > .05). Even studies with very small p-values < .01 did not have a good chance to be replicated. Adapted
from Open Science Collaboration (2015).
could argue that if you plan an underpowered study it is your own risk. You
will be the one missing an amazing
effect because you chose to test 10
mice instead of 20.
However, in addition to potentially
wasting tax-payers money, which
could otherwise be used to build
schools and hospitals, you are also
distorting the publication landscape.
between the p-value and the positive
predictive value. Imagine you work on
homeopathy and you operate with an
alpha value of .05. For a given study,
you as an experimenter, will set the
type I error rate to 5 % by rejecting the
null hypothesis only if your observed
p-value is below your alpha value of
.05. However, if you look at the whole
field of homeopathy research, what
is the rate of false positive studies in
Another obvious explanation is plain
fraud. Despite several recent prominent cases of fraud (for a list of notable cases see the reference 19) people
seem to believe that scientific misconduct is very rare or a problem confined to countries outside of Europe
and North America. However, I think
fraud is more common than we like
to think and the limited data we have
on this topic seem to confirm this. I do
not think this is surprising given the
high incentives for ‘clean’ positive results, given how difficult it is to publish
negative results, given how easy it is
to make some ‘minor adjustments’ to
the numbers in your Excel table, and
given the extreme competition scientists are confronted with. But even
these ‘minor adjustments’, impossible
July 2016 | NEUROMAG |
17