to detect and maybe just enough to lift the effect over the significance barrier, deceive your colleagues, distort our view of reality, waste tax-payers money, may ultimately have real world effects. Like patients receiving useless or even dangerous treatments 12, 13.
I am not suggesting that the non-reproduced studies in the above publication operated with fabricated data or that most false positive studies do. I am merely trying to describe some potential problems in science, problems that we need to take seriously and fix.
Problem 3: The researcher degrees of freedom
ent from your data, should be written down in advance, and should give you a definite result once you cast the data at it. Sure, this is not always realistic. Science can be messy and sometimes we can only develop the right analysis once we examine the actual data. But the further we diverge from the right path, the more will we bias the analysis – often unconsciously, often with good intentions, but often with detrimental effects to our scientific field. And this can happen faster than you may think. Simulations have shown that by exploiting only four degrees of freedom you will be able to get a positive finding from randomly drawn data up to 60 % of the time.
A particular and often encountered problem is the collection of more data if the effect is“ not yet significant”. This can strongly inflate your false positive rate, particularly when you collect only a few more observations per condition or when you repeatedly collect more data if your result is‘ still not significant’. If you exploit your researcher degrees of freedom you may end up deceiving yourself and everyone else. This does not mean that a study cannot be exploratory or have exploratory components. But this should be stated clearly in the manuscript, since it weakens the conclusions that can be drawn from its findings.
What can we do about it?
We can always argue that the error is in the system. If the journals more
readily accepted“ negative” findings, if our job situation was better and the pressure to publish lower, if replication studies were appreciated by higher-ranked journals and would not be almost useless for our career … Yes, all these things would substantially improve the situation. But they are measures that require long-term collaborative effort. What can we do in our own workgroup, right now, while planning, conducting, or publishing the next study? Needless to say I do not have the final solution. But I have collected some suggestions from papers, comments, and discussions with colleagues and fellow students.
Here are some of them: 1. Introduce strict blinding during data collection & analysis. This is already quite common for data collection, at least in some scientific disciplines, but I guess it is still very rare during data analysis.
2. Collect more data. This can be hard if data collection is very time- and money-consuming. But often it is worth adding another month of work to the project to be a little bit more confident about the data in the end.
3. Have strict rules on when to stop data collection and make sure everyone in the project agrees on them beforehand.
4. Keep lab notebooks that document and enforce study design, investigated variables, sample size, in- / exclusion criteria, and data analysis scheme.
5. Register your studies before beginning data collection. This has very successfully decreased the rate of positive studies( and probably increased the positive predictive value of studies) for clinical trials in the US.
6. Analyze the important dependent variables at the very end and decide on the inclusion or exclusion of data points before.
Likelihood of getting a false-positive result when exploiting researcher degrees of freedom. Simulations have shown that by combining four degrees of freedom you will be able to obtain a positive result from randomly drawn data 60 % of the time. Adapted from Simmons 2011.
7. Try out pre-print publication maybe even with publishing raw data. This has been very successful in physics and is now slowly coming to the life sciences as well. One example is the bioRxiv.
Every study comes with a large number of decisions, or“ degrees of freedom”. You decide how much data to collect, how to identify outliers, which subgroups to introduce, which variables and interactions to analyze, which statistical tests to use – and how much of all this to report. The problem with our commonly used statistical framework is that it assumes all of these decisions are made before you look at the first data point. In the best case, your analysis should be entirely independ-
Which of these suggestions can help, which are naïve, which are missing? Please start a discussion with us in the comments section.
The situation in science is bad but not hopeless. One of the much acclaimed features of science is that it is selfcorrecting. However, in this case it will
18 | NEUROMAG | July 2016