X Close

Another day, another misleading Corona study!

May 29, 2020 - 7:00am

Another day, another exciting-looking correlation in the world of Covid-19! Researchers at Yale have published a preprint looking at the correlation between the amount of novel coronavirus in sewage and the number of hospital admissions three days later, and they found an almost perfect match.

Credit: Yale University

Imagine you’re measuring two numbers and you want to see how much one affects the other. Let’s say height and weight. So you go and ask random people in the street how tall they are and how heavy they are. You’ll notice that, on average, tall people are heavier. But sometimes, you get tall skinny people or short fat people, so the correlation isn’t perfect.

In statistics, correlations are measured in a number called R. (Not that one.) If the correlation is perfect, so for every 1% increase in height you get a 1% increase in weight, then R=1; if there is no correlation, and height and weight vary totally randomly with no link to each other, then R=0. (It can also be negatively correlated: if one goes up, the other goes down.) You can get a sense of what different R values look like by playing this game.

The Yale researchers found that coronavirus in poo on day 0 correlated with hospital admissions on day 3 with an R=0.99. That is ridiculous. It is literally saying “if coronavirus levels in sewage go up by 300% on Sunday, you should see almost exactly three times as many people admitted to hospital on Wednesday.”

It got a lot of attention, but in any human-behaviour-related study, a correlation of 0.99 is frankly unbelievable. As Alex Danvers says in a good blog post which annoyingly scooped me as I was thinking about writing this, in psychology, an R of .1 is pretty good. Even height and weight probably only correlate at about 0.7.

Here’s what went wrong, as pointed out (and explained to me) by the indefatigable bad-science-debunker Nick Brown. They weren’t checking the correlation between the raw numbers — they were checking the correlation between correlations.

Brown uses an analogy: imagine instead of measuring height vs weight, we’d measured height vs the last digit on your National Insurance number. We’d find no correlation: R=0.

Then imagine we measured weight vs the fourth digit on your NI number. We’d find no correlation: R=0.

But then imagine we checked the correlation between those two correlations. We’d find two flat lines! They correlate perfectly! R=1!

This isn’t exactly what’s gone on, but it illustrates it. They’re comparing the correlation of virus-in-poo to time with the correlation of hospital admissions to time, rather than virus-in-poo to hospital admissions directly. That smooths the curves and makes it look like a closer correlation.

(There are other problems but this post is too long already.)

Brown pointed all this out to the authors, and they’ve taken down one hugely viral tweet and are looking to correct the preprint. In his own look at the data he finds a correlation of between 0.14 and 0.4. That’s still important and useful, if it’s real! But you can’t say “if virus in poo goes up 4x today, you’ll see a fourfold increase in hospital admissions in three days’ time.”

Addendum: The Guardian raises some very serious concerns with the Lancet study into hydroxychloroquine that I mentioned in my last piece. It’s worth noting because while it undermines my specific point about hydroxychloroquine, it very much supports my case for being wary of fast science in the pandemic.


Tom Chivers is a science writer. His second book, How to Read Numbers, is out now.

TomChivers

Join the discussion


Join like minded readers that support our journalism by becoming a paid subscriber


To join the discussion in the comments, become a paid subscriber.

Join like minded readers that support our journalism, read unlimited articles and enjoy other subscriber-only benefits.

Subscribe
Subscribe
Notify of
guest

7 Comments
Most Voted
Newest Oldest
Inline Feedbacks
View all comments
madeuop names
madeuop names
3 years ago

One thing Covid-19 has done is expose just how politicised medicine is as a profession. To be on the safe side its probably a good idea to bash trump and Brexit and genuflect about China and the WHO if you want good quality care in 2020

Michael McVeigh
Michael McVeigh
3 years ago

Drug companies the world over are licking their lips while considering the astronomical amounts of money available for any drug which would treat CV19. Hydroxychloroquine, if effective, would be a disaster for these companies.

Stephen Follows
Stephen Follows
3 years ago

A load of poo, then.

david bewick
david bewick
3 years ago

I think the R you refer to is Pearson’s correlation coefficient. There is a further calculation (R squared) that measures the amount of effect that is attributable to one variable (as I remember it!)

Adrian Smith
Adrian Smith
3 years ago

Science does not only need to be done correctly, it also needs to be useful. Even if there is some sort of correlation here I am mystified as to what use it is. The big problem with science in this situation is if you follow it you are guaranteed to be well behind in your actions based on it.

The other problem with correlations is when they are then used as proof of a cause and effect. CO2 and global temperature springs to mind, except there is not really all that good a correlation across the reasonably reliable trends, then there is Al Gore’s “inconvenient truth”!

Esmon Dinucci
Esmon Dinucci
3 years ago

It seems to me that asymptomatic sufferers are at home pooing in their own toilets and then three days later they discover that the have Virus – now that would be a proper thing to look at.

Jos Vernon
Jos Vernon
3 years ago

The Lancet study on social distancing is another one that got the stats all wrong. They had a significant result in terms of distance vs no distance but no significant result for 1m vs 2m. They used their significant result as the basis of a correlation and projected the findings to 2m and indeed 3m – the former which was not significant and the latter for which there was not even any data. Really shoddy.