September 3, 2021 - 3:47pm

There’s a row going on about a study into masks. The researchers took 300,000 people in Bangladeshi villages. To hugely oversimplify: in half of the villages, they promoted mask-wearing. In the other half, they didn’t. They then looked at whether the people in the mask-promotion villages were less likely to get Covid. They did: 8.6% of people in the control villages reported Covid symptoms, compared to 7.6% in the treatment villages. 

But some people said that this result may not be statistically significant. It’s a bit more complicated than that, but let’s just take the claim at face value. The row tells us something interesting about science and evidence.

Here’s what “statistically significant” means. In science, there’s a thing called the “p-value”. That is: how likely you are to see a given result by fluke. Say you’re trying to find out if your dice are loaded. You roll two sixes. That could mean that the dice are loaded, or you might have just rolled two sixes. Your chance of seeing two sixes on fair dice is one in 36. P-values are written as a score out of 1, so your p-value for that result is 1/360.028.

A result is “statistically significant” if your p-value is less than 0.05: if you would expect to see that result, or a more extreme one, by fluke less than one time in 20. (What it doesn’t mean is that there’s only a one in 20 chance that it’s wrong. Read this for more.)

If that sounds complicated: it is. Most psychology lecturers get it wrong, as do most psychology textbooks. This may explain quite a lot about psychology.

In science, statistical significance is often used as a cutoff: you can’t get your study published if your p-value is greater than 0.05. This system has led to people juking the stats to get their p-values below 0.05, because we say “if it lands on this side of the line it’s real”. 

But there’s nothing magic about p=0.05. It’s arbitrary. In theory, a finding with p=0.051 is almost exactly as good evidence as one with p=0.049. 

This is a profound point about scientific epistemology. Under one school of thinking, things are either shown or not. If you get a statistically significant result, you have Scientific Evidence, and if you don’t, you don’t. Masks don’t work, or your drug has no effect.

But the right way to look at it is: I have some prior assessment of how likely it is that masks help prevent spread. I’d say quite likely, because the virus travels in water droplets, and presumably a mask traps some of them; and also because of earlier evidence. Let’s say I think it’s 80% likely. Then I get some new evidence, and I use it to update my beliefs. A p=0.05 result might make me update to something like 95% sure, depending on how much you trusted the study.

If you thought for some reason it was really unlikely that masks worked, then you’d update from a lower base: say if you thought it was only 1% likely, then you might end up saying it’s now 15% likely. This is Bayesian thinking again. But it’s also just reasoning under uncertainty. You can never be certain: you just make the best guess you can with the evidence available. 

Whether this study is good evidence is up for debate: a stats-savvy friend warns that when you see p-values around, and especially just under, 0.05, it’s a red flag that some dodgy manipulations have gone on. 

But if it was well-carried-out and careful, then whether or not its result falls on one side or other of the p=0.05 boundary doesn’t matter a great deal. Either way, it would be evidence in favour of something we already think probably works. 

Tom Chivers is a science writer. His second book, How to Read Numbers, is out now.