We like the idea that our politicians are doing things for evidence-based reasons. But it turns out we’re less comfortable with people trying to collect that evidence.
That’s according to a new study by Chris Chabris and colleagues, entitled “Objecting to experiments that compare two unobjectionable policies or treatments”. It’s a clever study – and, importantly, a large, preregistered one, which asked the same question 16 times in three different populations and found compatible answers, so it’s probably not garbage. What it did was look at how people feel about randomised experiments between unobjectionable policies or treatments.
That sounds a bit dry; here’s what it means. Imagine, for instance, a doctor’s surgery is trying to reduce deadly infections by providing doctors with a checklist of standard precautions to take before each procedure. The participants were asked how they felt if the checklist were displayed on the doctor’s badge, or on a poster on the wall. And they also asked them how they would feel if, instead, one room had the poster, and the other had the badge, and patients were randomly assigned to one or the other to test which worked best.
They asked the question four times of different groups, and on the whole, people were comfortable with either of the two options. People preferred the poster, but the percentage objecting to either poster or badge was never more than 25% and usually more like 15-20%.
But when they were asked about randomly assigning patients to one or the other – “A/B tests” – then the percentage objecting leapt up, to 30% in one test and 40-50% in the other three. They found similar results if the question was about a doctor choosing between one of two FDA-approved blood pressure drugs or randomly assigning patients to one or another, and then on other questions involving autonomous vehicle control systems, genetic testing, online dating and several more.
Even when people don’t mind option A, and they don’t mind option B, they really do mind you randomly assigning people to one or other of the two. The effect was just as strong among people with science degrees and professionals in the relevant field.
This is, obviously, a problem. The A/B test described is essentially a randomised controlled trial, and an RCT is the best way of establishing whether some intervention actually causes the good (or bad) things you want to happen. It is the gold standard for medical research, and there is no good reason not to use it in other policy areas – you can randomly assign patients to have either a drug or a placebo, or you can randomly assign schoolchildren to either have phonics or traditional reading lessons. While the interventions are entirely different, the structure of the test is exactly the same.
Not all policy can be evidence-based, in the sense that the evidence decides the policy. For instance, should we build another runway at Heathrow? There’s one body of evidence that says that it will significantly increase the UK’s greenhouse gas emissions and will not be compatible with our commitments under the Paris Agreement. There’s another body of evidence that says it will create jobs and grow the economy. But there’s nothing that says you have to choose the climate over the economy; that’s a political decision, or even a moral one.
Sometimes, though – often – there is a clearly defined goal, and two or more possible options for achieving it. Most of us would agree that, say, it would be good if we could stop drug addicts from relapsing or reoffending. Imagine we honestly didn’t know whether locking up drug offenders was more effective at reducing their relapse rate or their drug-related crime rate than putting them into rehab.
The best way to find out which of those options is most effective would be to randomly assign drug offenders to one or the other intervention, and seeing which had better outcomes. But, apparently, that would be far less popular than simply arbitrarily picking one and going with it.
This is, on the face of it, strange, because for at least 20 years we’ve had governments that, at the very least, pay lip service towards evidence-based policy. The Blair government loudly repeated that “what matters is what works”: one thing it did was to set up the (excellent) National Institute for Health and Care Excellence (NICE) in 1999 to evaluate new medical treatments for cost-effectiveness.
But sometimes, lip service is all it is. When, recently, the government decided to push through an age verification system for porn sites, it commissioned an impact assessment which purported to look at how this ban would affect users. But the impact assessment was shonky as heck: it misrepresented the statistics it was quoting, was heavily reliant on a silly poll which made obviously untrue claims, and at one point apparently didn’t understand the word “correlational”.
“The dirty secret of almost all government is that we don’t know whether we’re doing the right thing.” That’s according to David Halpern, the psychologist and civil servant who ran the UK government’s Behavioural Insight Team, aka the ‘Nudge Unit’, a programme designed to improve policy, in an interview with the website Apolitical.
“Whether what we’re doing is helping the patient or not helping the patient, or even killing him. People are often running blind, overconfident, and dispensing massive sums of money.” That’s despite 20 years of people making the right noises about evidence-based policy.
I had previously thought that this was largely a cynical thing, or an incompetence thing. Ministers want to look tough on drugs or porn, so they make a decision to crack down on them, and then retrospectively cobble together evidence to support that decision – policy-based evidencemaking, as I’ve heard it called. Or perhaps they just lack the understanding of what constitutes good evidence, and the skills to gather it.
But perhaps, also, it’s that they – like, apparently, around half the population – find the very idea of randomised controlled trials, the very thing that they need to do to best assess the effectiveness of the thing they want to do, deeply uncomfortable.
Chabris and his colleagues speculate that one reason behind this discomfort is “a lack of familiarity”: the RCT has only really existed for two centuries, which isn’t long for human brains, and they aren’t very good at thinking about them. Hopefully that’s right, and we’ll get more used to it. Because at the moment, people seem to be happier with barging randomly into a policy than with actually checking if it works.