by Tom Chivers
Friday, 3
September 2021
Explainer
15:47

The right way to look at the Bangladesh mask study

The evidence isn't conclusive, but then again it rarely is
by Tom Chivers
Garment workers in Dhaka, Bangladesh

There’s a row going on about a study into masks. The researchers took 300,000 people in Bangladeshi villages. To hugely oversimplify: in half of the villages, they promoted mask-wearing. In the other half, they didn’t. They then looked at whether the people in the mask-promotion villages were less likely to get Covid. They did: 8.6% of people in the control villages reported Covid symptoms, compared to 7.6% in the treatment villages. 

But some people said that this result may not be statistically significant. It’s a bit more complicated than that, but let’s just take the claim at face value. The row tells us something interesting about science and evidence.

Here’s what “statistically significant” means. In science, there’s a thing called the “p-value”. That is: how likely you are to see a given result by fluke. Say you’re trying to find out if your dice are loaded. You roll two sixes. That could mean that the dice are loaded, or you might have just rolled two sixes. Your chance of seeing two sixes on fair dice is one in 36. P-values are written as a score out of 1, so your p-value for that result is 1/360.028.

A result is “statistically significant” if your p-value is less than 0.05: if you would expect to see that result, or a more extreme one, by fluke less than one time in 20. (What it doesn’t mean is that there’s only a one in 20 chance that it’s wrong. Read this for more.)

If that sounds complicated: it is. Most psychology lecturers get it wrong, as do most psychology textbooks. This may explain quite a lot about psychology.

In science, statistical significance is often used as a cutoff: you can’t get your study published if your p-value is greater than 0.05. This system has led to people juking the stats to get their p-values below 0.05, because we say “if it lands on this side of the line it’s real”. 

But there’s nothing magic about p=0.05. It’s arbitrary. In theory, a finding with p=0.051 is almost exactly as good evidence as one with p=0.049. 

This is a profound point about scientific epistemology. Under one school of thinking, things are either shown or not. If you get a statistically significant result, you have Scientific Evidence, and if you don’t, you don’t. Masks don’t work, or your drug has no effect.

But the right way to look at it is: I have some prior assessment of how likely it is that masks help prevent spread. I’d say quite likely, because the virus travels in water droplets, and presumably a mask traps some of them; and also because of earlier evidence. Let’s say I think it’s 80% likely. Then I get some new evidence, and I use it to update my beliefs. A p=0.05 result might make me update to something like 95% sure, depending on how much you trusted the study.

If you thought for some reason it was really unlikely that masks worked, then you’d update from a lower base: say if you thought it was only 1% likely, then you might end up saying it’s now 15% likely. This is Bayesian thinking again. But it’s also just reasoning under uncertainty. You can never be certain: you just make the best guess you can with the evidence available. 

Whether this study is good evidence is up for debate: a stats-savvy friend warns that when you see p-values around, and especially just under, 0.05, it’s a red flag that some dodgy manipulations have gone on. 

But if it was well-carried-out and careful, then whether or not its result falls on one side or other of the p=0.05 boundary doesn’t matter a great deal. Either way, it would be evidence in favour of something we already think probably works. 

Join the discussion


  • Impressive summary of p values and even an intro to Bayesian statistics.
    So, to ask the obvious question, what was the p value for the treatment arm of the Bangladesh study? I’m guessing it was slightly less than 0.05 hence the controversy.

  • The analogy isn’t really correct. Sure a given antibiotic doesn’t work 100% of the time but then one switches antibiotics to one that the infecting bacterium is sensitive to. Indeed, if one is a smart doctor, one would take a blood culture and throat swab before prescribing the 1st antibiotics so as to see what antibiotics the bacterium was actually sensitive to, and then change antibiotic if appropriate. The correct antibiotic will work 100% of the time or certainly very close except under very exceptional circumstances. If antibiotics only had a 10% effect, no matter how statistically significant, they would never be prescribed. Similarly, if the contraceptive pill or even the lowly condom only worked 10% of the time, nobody would bother with them. So the issue of regular surgical/cloth masks is not whether they prevent transmission of a virus-containing droplet or two but whether they have a significant impact on transmission. With hindsight it would appear, unfortunately, that the impact of masks is actually minimal and represents theater and a safety blanket.
    The original idea, when it was believed that droplet transmission was the main mechanism of spread, is that masks would act as an almost 100% (or certainly greater than 80%) method of source control. Had that been the case, mandating masks in all public places would have brought the pandemic under control in the space of around 6 weeks (i.e. 2 weeks from the time of infection to development of symptoms, 2 weeks for the actual disease to either take its course or for the patient to die, and another 2 weeks to fully recover and no longer be a transmitter).

  • I think you miss the point. (a) Influenza and coronavirus are not all that different. Similar size, similar composition, similar physical properties, similar location. Hence mode of transmission will be the same as indeed it has found to be. (b) Irrespective of your criticism of the Danish study saying that it was set up to show no difference between masks and no masks, the fact of the matter is that if the effect had been large it would have been observed. It wasn’t. Ergo the effect is small in terms of protection which is what they were looking at. From the perspective of physics this makes sense, as it is far harder to protect the wearer than it is to block forward emission of viral particles carried in droplets. (c) Irrespective of the flaws in the design of the Bangladesh study, the fact is that even if statistically significant, the effect of masking on COVID spread through the community was found to be minimal – 9%. i.e. something of a nothing burger.

  • To get involved in the discussion and stay up to date, become a registered user.

    It's simple, quick and free.

    Sign me up