The Perfect Meme The Perfect Meme | Page 6

general public is focused so obsessively on a very specific epidemiology phenomenon, and that the related incidence data is widely accessible for naive interpretations. 2.4 Naive extrapolation Naive extrapolation is a forecast without attempting to understand causal factors or adjusting for boundary conditions. One of the most common errors is to assume linear dependence between two variables when there is no basis to do so. Specific claim As of April 15th, in Italy the number of confirmed positive cases is 165,000 and the number of related deaths is 22,000. If all people in Italy (60 million) get infected we may get 8 million (60 million * 22/165) fatalities. In this example, naive extrapolation is combined with selection bias. It has been explained before that 22,000 is a reliable number here, 165,000 is not. There are an unknown number of people in Italy who already have been exposed to the virus and are either immune or developed no symptoms. We can do only rough estimates of this number. For example, we can use the fatal/exposed = 0.19% ratio from Diamond Princess cruise ship (with a big margin of error). Reversing the equation and applying reported number of fatalities we get: The exposed on April 1st in Italy = fatal/0.19% = 22,000/0.19% = 11.6 million Knowing that the upper bound is a total population of Italy (60 million) we can extrapolate only up to 5 fold (60 million / 11.6 million) what gives the maximum number of fatalities around 110,000 (5*22,000) One cannot just scale numbers, it is necessary to understand what they mean and what are the bounds. 2.5 Correlation is not causation We cannot deduce a cause-and-effect relationship solely on the basis of observation data. Even if the correlation in observation data is equal one. Let’s consider the following example. We are trying to find a cause of death. In order to eliminate selection bias, we collected observation data of the entire population. We measured the number of hours spent in bed during one week by a person and we checked if the person died the last day of the week. We will find a strong correlation. The more hours a person spends in bed, the more likely is dead at the end of the week. Obviously it is an error to conclude that lying in bed is causing death. Life experience tells us that there is a third factor - a serious illness that is the common cause for both: lying in bed and death. In this case the third factor creates a spurious correlation. We cannot use two events occurring together to imply a cause-and-effect relationship. One reliable way to deduce cause-and-effect is to collect experimental data. Such data must come from the experiment with at least one independent variable. Independent variables are con- trolled inputs. In our example the controlled input is time spent in bed. We would have to make people spend various numbers of hours in bed and measure how it affects the probability of death. In the absence of experimental data it is a common practice to use observation data combined with common sense judgement. In our example, we would have to exclude from our sample cases related to serious illness. Since human judgement is fallible and subjective, it becomes a source of new errors. Another issue is that the choice of the illness as a factor was somewhat arbitrary, based on everyday experience. There may exist a number of other distorting factors that we are 6