We all rely on the news to give us information about the world. That information lets us make decisions: whether it’s safe to fly to Spain, whether red wine causes cancer, whether we’re likely to lose our job for tweeting something. We use the media to help us understand the risks that surround us. The trouble is that the media is uniquely bad at telling us about those risks.
Let’s imagine that you want to know whether water causes cancer. You do a simple study: you go to your local hospital, look at all the people who’ve got cancer, and check whether they had recently drunk some water. You find that almost all of them had, so you conclude that water is a carcinogen.
Like what you’re reading? Get the free UnHerd daily email
Already registered? Sign in
Very obviously, this is not a good way of doing the study. Since almost everyone drinks water, you’d find that almost everyone who had cancer had drunk water, even if drinking water in no way increased your chances of getting cancer. What you need to do, instead, is compare people who did get cancer with people who didn’t, and see whether one group is more likely to have drunk water lately.
(As it happens, the scientific consensus at the moment leans towards the idea that drinking water is a good idea.)
This is an example of a mistake called “selecting on the dependent variable”. In science, the dependent variable is the thing you’re trying to measure; the independent variable is the thing you change. So in a drug trial, the independent variable might be the dose of the drug; the dependent variable might be whether or not the patient survives. If you only looked at the patients who survived, the results of your drug trial would not be very useful.
In a subtle way, though, this is what the media does all the time. Like the water-causing-death study, it selects on the dependent variable: and for the media, the dependent variable is whether it’s interesting.
For instance: you read about murders, and plane crashes, and mass shootings. Your daily intake of information about things that happen in the world is selected by the things that have actually happened. You won’t read about the people who walk through your local park and aren’t stabbed, or the planes that landed safely. So your sense of what is risky in the world is skewed, because all of the things that you read about have been selected on the dependent variable – everything you read has been chosen, not because it gives you a real sense of how likely it is to happen, but because it is interesting to read about.
It’s incredibly pervasive. You want to know whether, say, cancel culture is real. So you look out for examples: you read about, say, David Shor, the data analyst at a consulting firm who was fired after he tweeted some research by a black academic suggesting that violent protests are less politically effective than non-violent ones, which activists on Twitter told him “reeked of anti-blackness”.
Or Emmanuel Cafferty, the part-Latino electrical worker who was fired for making an “OK” symbol out of the window to a driver who filmed him and claimed it was a neo-Nazi sign. Or Majdi Wadi, whose catering business collapsed — almost all his contracts cancelled — after it was discovered that his daughter had, 10 years earlier at the age of 14, tweeted a lot of anti-semitic bile. You can very quickly get a lot of examples of extremely bad things happening.
You can equally quickly get a long list of black people who’ve been killed by the police in the US, or black children who’ve been arrested at gunpoint in the UK. You can do the same with trans women being murdered, or – to go on the other side of that particular argument – with sexual predators claiming to be trans to get into women’s refuges.
To be clear: these are dreadful things that shouldn’t happen at all. But in a world with billions of people, dreadful things that shouldn’t happen at all will happen quite often. The question of interest is how likely are they to happen to a given person. And even if the media is full of examples all day, every day, for years, that tells you nothing about how common it actually is, because the incidents have been selected on the dependent variable: how interesting they are.
Selecting on the dependent variable is just one way that — usually innocently — the media distorts our understanding of risk and numbers. In How to Read Numbers, an upcoming book I’m writing with my cousin David, an economist at the University of Durham, we try to look at a few of those ways.
For instance, numbers are often given without context: if 163 people have died in police custody over the last 10 years, is that a lot or a little? Well, you don’t know, unless you know how many people were in police custody in that time: if it was 1,000, that’s a very different story to if it was 10 million. In mathematical terms, it’s not enough to know the numerator, the number on top of the line in a fraction — you need to know the denominator, as well.
Or if a story says that children born to men in their 50s are 18% more likely to suffer from seizures, that sounds pretty bad. But presenting risks in this relative form can make small changes sound worse than they are — if, in reality, your risk goes up from 0.024% to 0.028%, as is in fact the case, then you may not care all that much. Without this sense of the absolute risk, rather than the relative increase, people can’t easily use it to make decisions.
Other times, it’s easy to misrepresent numbers by cherry-picking them, or to suggest a causal link where there isn’t one.
And by doing that, the media can make it look, probably falsely, like we’re suffering an epidemic of teen suicide, or an epidemic of loneliness; or that the red-tops caused Brexit, or that a Boris Johnson column caused a 375% jump in hate crimes.
This is all especially relevant now, as the Covid-19 epidemic has forced everyone to undertake a rapid and high-stakes crash course in statistics. Suddenly everyone, including journalists, has to be conversant in things like Bayes’ theorem, statistical modelling and Goodhart’s Law.
But misunderstanding them in even subtle ways can cause quite serious misrepresentations of the data: for instance, at one stage it led people to think that the virus was spreading faster when it was in fact on the decline. Conversely, if journalists had been more comfortable with ideas like exponential spread, there might have been fewer downplaying the risks early on, or laughing off suggestions that it was more dangerous than the flu.
None of this, we want to stress, is deliberate, or even especially blameworthy; journalists make mistakes, and like the rest of us, most are not especially numerate. But it is important, because we need good information to navigate the world.
We wanted to look at some of these ways that numbers can go wrong, and give some simple tips on working out which numbers you can trust. Vitally, we also wanted to suggest a few guidelines for the media — a sort of style guide for presenting numbers. Simple steps like giving absolute risk as well as relative, and trying to present numbers in context, or to be careful about saying that A causes B. Similarly, letting readers know some basic facts, like giving the sample size if you’re quoting a scientific study, and being aware of the issues in science such as publication bias, can help avoid misleading people.
Most of the time, we think, the mistakes are innocent — journalists aren’t statisticians or scientists, and will make the same mistakes the rest of us do. But when members of the media make them, it makes the rest of us understand the world less well. It would be amazingly easy, if you’re not careful, to end up writing a story about how water causes cancer.
Join the discussion
To join the discussion in the comments, become a paid subscriber.
Join like minded readers that support our journalism, read unlimited articles and enjoy other subscriber-only benefits.Subscribe