general public is focused so obsessively on a very specific epidemiology phenomenon, and that the
related incidence data is widely accessible for naive interpretations.
2.4
Naive extrapolation
Naive extrapolation is a forecast without attempting to understand causal factors or adjusting for
boundary conditions. One of the most common errors is to assume linear dependence between two
variables when there is no basis to do so.
Specific claim
As of April 15th, in Italy the number of confirmed positive cases is 165,000 and the number of
related deaths is 22,000. If all people in Italy (60 million) get infected we may get 8 million (60
million * 22/165) fatalities.
In this example, naive extrapolation is combined with selection bias. It has been explained
before that 22,000 is a reliable number here, 165,000 is not.
There are an unknown number of people in Italy who already have been exposed to the virus
and are either immune or developed no symptoms. We can do only rough estimates of this number.
For example, we can use the fatal/exposed = 0.19% ratio from Diamond Princess cruise ship (with
a big margin of error).
Reversing the equation and applying reported number of fatalities we get:
The exposed on April 1st in Italy = fatal/0.19% = 22,000/0.19% = 11.6 million
Knowing that the upper bound is a total population of Italy (60 million) we can extrapolate
only up to 5 fold (60 million / 11.6 million) what gives the maximum number of fatalities around
110,000 (5*22,000)
One cannot just scale numbers, it is necessary to understand what they mean and what are the
bounds.
2.5
Correlation is not causation
We cannot deduce a cause-and-effect relationship solely on the basis of observation data. Even if
the correlation in observation data is equal one.
Let’s consider the following example. We are trying to find a cause of death. In order to
eliminate selection bias, we collected observation data of the entire population. We measured the
number of hours spent in bed during one week by a person and we checked if the person died the
last day of the week. We will find a strong correlation. The more hours a person spends in bed,
the more likely is dead at the end of the week. Obviously it is an error to conclude that lying in
bed is causing death. Life experience tells us that there is a third factor - a serious illness that is
the common cause for both: lying in bed and death. In this case the third factor creates a spurious
correlation. We cannot use two events occurring together to imply a cause-and-effect relationship.
One reliable way to deduce cause-and-effect is to collect experimental data. Such data must
come from the experiment with at least one independent variable. Independent variables are con-
trolled inputs. In our example the controlled input is time spent in bed. We would have to make
people spend various numbers of hours in bed and measure how it affects the probability of death.
In the absence of experimental data it is a common practice to use observation data combined
with common sense judgement. In our example, we would have to exclude from our sample cases
related to serious illness. Since human judgement is fallible and subjective, it becomes a source
of new errors. Another issue is that the choice of the illness as a factor was somewhat arbitrary,
based on everyday experience. There may exist a number of other distorting factors that we are
6