Another day, another exciting-looking correlation in the world of Covid-19! Researchers at Yale have published a preprint looking at the correlation between the amount of novel coronavirus in sewage and the number of hospital admissions three days later, and they found an almost perfect match.
Imagine you’re measuring two numbers and you want to see how much one affects the other. Let’s say height and weight. So you go and ask random people in the street how tall they are and how heavy they are. You’ll notice that, on average, tall people are heavier. But sometimes, you get tall skinny people or short fat people, so the correlation isn’t perfect.
In statistics, correlations are measured in a number called R. (Not that one.) If the correlation is perfect, so for every 1% increase in height you get a 1% increase in weight, then R=1; if there is no correlation, and height and weight vary totally randomly with no link to each other, then R=0. (It can also be negatively correlated: if one goes up, the other goes down.) You can get a sense of what different R values look like by playing this game.
The Yale researchers found that coronavirus in poo on day 0 correlated with hospital admissions on day 3 with an R=0.99. That is ridiculous. It is literally saying “if coronavirus levels in sewage go up by 300% on Sunday, you should see almost exactly three times as many people admitted to hospital on Wednesday.”
It got a lot of attention, but in any human-behaviour-related study, a correlation of 0.99 is frankly unbelievable. As Alex Danvers says in a good blog post which annoyingly scooped me as I was thinking about writing this, in psychology, an R of .1 is pretty good. Even height and weight probably only correlate at about 0.7.
Here’s what went wrong, as pointed out (and explained to me) by the indefatigable bad-science-debunker Nick Brown. They weren’t checking the correlation between the raw numbers — they were checking the correlation between correlations.
Join the discussion
Join like minded readers that support our journalism by becoming a paid subscriber
To join the discussion in the comments, become a paid subscriber.
Join like minded readers that support our journalism, read unlimited articles and enjoy other subscriber-only benefits.
SubscribeOne thing Covid-19 has done is expose just how politicised medicine is as a profession. To be on the safe side its probably a good idea to bash trump and Brexit and genuflect about China and the WHO if you want good quality care in 2020
Drug companies the world over are licking their lips while considering the astronomical amounts of money available for any drug which would treat CV19. Hydroxychloroquine, if effective, would be a disaster for these companies.
A load of poo, then.
I think the R you refer to is Pearson’s correlation coefficient. There is a further calculation (R squared) that measures the amount of effect that is attributable to one variable (as I remember it!)
Science does not only need to be done correctly, it also needs to be useful. Even if there is some sort of correlation here I am mystified as to what use it is. The big problem with science in this situation is if you follow it you are guaranteed to be well behind in your actions based on it.
The other problem with correlations is when they are then used as proof of a cause and effect. CO2 and global temperature springs to mind, except there is not really all that good a correlation across the reasonably reliable trends, then there is Al Gore’s “inconvenient truth”!
It seems to me that asymptomatic sufferers are at home pooing in their own toilets and then three days later they discover that the have Virus – now that would be a proper thing to look at.
The Lancet study on social distancing is another one that got the stats all wrong. They had a significant result in terms of distance vs no distance but no significant result for 1m vs 2m. They used their significant result as the basis of a correlation and projected the findings to 2m and indeed 3m – the former which was not significant and the latter for which there was not even any data. Really shoddy.