to detect and maybe just enough to lift the effect over the significance barrier , deceive your colleagues , distort our view of reality , waste tax-payers money , may ultimately have real world effects . Like patients receiving useless or even dangerous treatments 12 , 13 .
I am not suggesting that the non-reproduced studies in the above publication operated with fabricated data or that most false positive studies do . I am merely trying to describe some potential problems in science , problems that we need to take seriously and fix .
Problem 3 : The researcher degrees of freedom
ent from your data , should be written down in advance , and should give you a definite result once you cast the data at it . Sure , this is not always realistic . Science can be messy and sometimes we can only develop the right analysis once we examine the actual data . But the further we diverge from the right path , the more will we bias the analysis – often unconsciously , often with good intentions , but often with detrimental effects to our scientific field . And this can happen faster than you may think . Simulations have shown that by exploiting only four degrees of freedom you will be able to get a positive finding from randomly drawn data up to 60 % of the time .
A particular and often encountered problem is the collection of more data if the effect is “ not yet significant ”. This can strongly inflate your false positive rate , particularly when you collect only a few more observations per condition or when you repeatedly collect more data if your result is ‘ still not significant ’. If you exploit your researcher degrees of freedom you may end up deceiving yourself and everyone else . This does not mean that a study cannot be exploratory or have exploratory components . But this should be stated clearly in the manuscript , since it weakens the conclusions that can be drawn from its findings .
What can we do about it ?
We can always argue that the error is in the system . If the journals more
readily accepted “ negative ” findings , if our job situation was better and the pressure to publish lower , if replication studies were appreciated by higher-ranked journals and would not be almost useless for our career … Yes , all these things would substantially improve the situation . But they are measures that require long-term collaborative effort . What can we do in our own workgroup , right now , while planning , conducting , or publishing the next study ? Needless to say I do not have the final solution . But I have collected some suggestions from papers , comments , and discussions with colleagues and fellow students .
Here are some of them : 1 . Introduce strict blinding during data collection & analysis . This is already quite common for data collection , at least in some scientific disciplines , but I guess it is still very rare during data analysis .
2 . Collect more data . This can be hard if data collection is very time- and money-consuming . But often it is worth adding another month of work to the project to be a little bit more confident about the data in the end .
3 . Have strict rules on when to stop data collection and make sure everyone in the project agrees on them beforehand .
4 . Keep lab notebooks that document and enforce study design , investigated variables , sample size , in- / exclusion criteria , and data analysis scheme .
5 . Register your studies before beginning data collection . This has very successfully decreased the rate of positive studies ( and probably increased the positive predictive value of studies ) for clinical trials in the US .
6 . Analyze the important dependent variables at the very end and decide on the inclusion or exclusion of data points before .
Likelihood of getting a false-positive result when exploiting researcher degrees of freedom . Simulations have shown that by combining four degrees of freedom you will be able to obtain a positive result from randomly drawn data 60 % of the time . Adapted from Simmons 2011 .
7 . Try out pre-print publication maybe even with publishing raw data . This has been very successful in physics and is now slowly coming to the life sciences as well . One example is the bioRxiv .
Every study comes with a large number of decisions , or “ degrees of freedom ”. You decide how much data to collect , how to identify outliers , which subgroups to introduce , which variables and interactions to analyze , which statistical tests to use – and how much of all this to report . The problem with our commonly used statistical framework is that it assumes all of these decisions are made before you look at the first data point . In the best case , your analysis should be entirely independ-
Which of these suggestions can help , which are naïve , which are missing ? Please start a discussion with us in the comments section .
The situation in science is bad but not hopeless . One of the much acclaimed features of science is that it is selfcorrecting . However , in this case it will
18 | NEUROMAG | July 2016