When I worked, not that many years ago, at A Large Media Organisation, the bosses were rightly concerned about making it easier for non-straight-white-guy staff to join and prosper. They did, I think, a solid job at this. From memory, looking around news conference in the morning, depending who was out on stories, usually about 50% of the journalists in the room were female and something like 30% were non-white (compared to about 15% of the population).

It wasn’t just hiring practices: they also put systems in place to ensure that, once established, staff were made to feel that they were welcome, such as workshops led by minority journalists to help straight/white/male journalists avoid common pitfalls, or support groups for BME journalists.

Hiring more journalists from various minorities and listening to them seems pretty obviously a good thing to do. Of course it’s not going to immediately solve every one of the various problems that people of colour, or women, or LGBT people face in the workplace, but it was a start, and not every company I’ve worked for has managed even that.

At one point, though, something pulled me up short. During one meeting about diversity there was a lengthy segment about “implicit bias” — specifically, the implicit bias measured by something called the Implicit Associations Test (IAT) — and how we could reduce it in ourselves. That startled me, because — nerd that I am — I knew about the IAT’s troubled history.

That history is in my mind at the moment because a large new study is out about the IAT, and more specifically interventions intended to reduce our bias. And it tells us, I think, that sometimes there is a tension between activism and reporting.

It is hardly controversial to say that unconscious biases affect our judgment. It would be amazing if that weren’t true about biases against minorities; that there aren’t subconscious ways in which we treat people from other groups differently from those in our own group. “People might be racist in subtle ways of which they are unaware” should not be a surprising statement; after all, there is evidence showing that people are less likely to hire job applicants or accept tenants with stereotypical black-sounding names than they are ones with stereotypical white-sounding names.

The IAT, when it was introduced more than two decades ago, promised to offer a way of measuring that unconscious bias. It involves a simple test. In the classic “measuring racial bias” version, you sit down at a computer and, on screen, a series of images and words pop up. If you see a positive word like “happy”, you press “i”; if you see a negative one like “pain”, you press “e”. And if you see a picture of a black person, you press “i”; if you see a white person, you press “e”. Then the categories are swapped around (same button for white person as positive word, etc). And the computer measures your reaction times.

Your implicit bias score is the algorithmic product of those reaction times — how long it takes to associate white faces with good words and black faces with good words, and vice versa. You can have a slight, moderate, or strong “preference for white faces over black faces”, or less commonly for black faces over white. Most people, including a sizeable minority of black people, record a bias towards white faces, because they take a tenth of a second or so longer to press the button when black faces and good words are combined.

The IAT became a phenomenon. Tens of millions of people have taken the test at Harvard’s website. Hillary Clinton namechecked “implicit bias” in one of her debates with Donald Trump in 2016. Malcolm Gladwell mentions it in his bestseller Blink. It was described as a “revolution” in social psychology. Its proponents said that it could detect hidden racism even among people who honestly believed themselves to be egalitarians — that it had been “shown, reliably and repeatedly, to predict discriminatory behavior that was observed in the research”. And, of course, many organisations sprang up to offer diversity training to firms (such as my former employers) who wanted to reduce their employees’ implicit bias.

There was, however, a problem. Or more than one, but they became one big problem: it doesn’t work. For the full story you really (really) ought to go and read Jesse Singal’s long and splendid 2017 piece at New York Magazine, which documents the ways in which so much of it unravelled. But there were really two main issues, I think.

First, if you test someone with the IAT, they get a score. But if they take it again later, their score is likely to be very different. It has a low “test-retest reliability”. The lower a measure’s test-retest reliability, the more random and uninformative it is. The IAT’s score is about halfway between “perfect” and “completely useless”, so it’s not utterly without value, but it’s well below the standards of most measures: where the IAT scores about 0.55 on a scale of 0 to 1, a depression-diagnosing questionnaire, for instance, scores about 0.86, and that’s about the minimum requirement for most psychological evaluation tests.

As Singal says in his piece, “If a depression test … has the tendency to tell people they’re severely depressed and at risk of suicidal ideation on Monday, but essentially free of depression on Tuesday, that’s not a useful test.”

But perhaps more damningly, the IAT does not do what it says it does. It is meant to detect secret racism among people who claim — or believe themselves — to be non-racist. But it turns out that people who score higher on the IAT are not, in fact, significantly more likely to discriminate against black people than people who score lower.

Even the proponents of the test now acknowledge that it should not be used to diagnose racism — or a tendency “to engage in discrimination” — in individuals, although they say that there is still a use for it for examining implicit bias when aggregated across large groups. In short, people we would call “racist”, under most reasonable definitions of the word, do not score higher on the IAT than people who we would not.

The trouble is, that news does not seem to have managed to make it to employers. Facebook offers in-house implicit bias training. So do Google, and Starbucks, and many other places, including my own former workplace.

That’s where this latest study I mentioned comes in. It was a major meta-analysis, with 87,000 subjects across 492 individual studies. It was carried out by one of the original proponents of the IAT, Brian Nosek, an excellent and careful scientist. It was pre-registered, so it couldn’t p-hack its way to the results it wanted. These are all good reasons to trust it.

And it found that corporate training to reduce your implicit bias has only a weak effect on your actual implicit bias score, and no effect on your actual behaviour. If this is right, then millions of pounds and countless man-hours have been wasted on diversity training programmes that have zero impact. (Some people even suggest — with some empirical support — that they have the opposite effect of worsening implicit bias, because you’re reminding people of negative stereotypes and thus reinforcing them. But the new study didn’t find that.)

Why — in the face of all these problems — is the notion of implicit bias so sticky, then? Why is it something we all know about, and which companies still spend vast resources trying to eradicate?

Partly, I think, it’s just an intuitive idea. It is true that we are biased, implicitly. The specific “implicit bias” measure might be unhelpful, but still, we do have unconscious race- and gender-based biases. So we hear “implicit bias” and think “obviously true thing” without bothering to check the details.

But perhaps more importantly, I think it is because it is hard to argue against without looking like a horrible racist. Look at this piece: see how I spent the first several paragraphs establishing that I am totally in favour of measures to improve workplace diversity. That’s the rhetorical price that needs to be paid in order to then say “but this specific thing here, which purports to help with that, doesn’t work”. That’s because we don’t tend to think in terms of single, specific arguments; we work with a great messy agglomeration of interconnected thought-stuff which signals tribal loyalties.

If I don’t pay that price, if I leap right in and say “diversity training is a waste of time”, you’ll immediately lump me in with a certain kind of Right-wing anti-wokeness writer and (reasonably) ignore the actual argument I’m making, because I’m obviously not trying to persuade: I’m just trying to shock you and get a cheer from My Side.

This makes writing about the science of these topics immensely tricky. If I write something about, say, innate sex differences in interests, and whether they affect women’s participation in STEM careers, the conclusions I come to might be right or wrong, but they should be based purely on my assessment of the science. But whatever those conclusions are will inevitably lead people to make assumptions about my political views, because the tribal lines on that issue are so well-defined. Admitting that the whole concept of implicit bias is probably irretrievably flawed just sounds like you’re saying you don’t think racism is a problem in the workplace. It should be a simple question of trying to say true things, but instead it’s seen as flying a flag of allegiance.

I suspect a lot of people reading this will think that this is getting worse, that we all have to be super-woke now in everything we write and that science journalism is becoming a branch of social justice activism. I don’t think that’s fair. Sometimes, when I read pieces like this one blaming “whiteness” for the growth in opioid deaths and firearm suicides in the US, I do wonder; but I suspect that every generation had its shibboleths and hard-to-declare truths, and scientists and everyone else had to work within the political confines of the time.

Scott Alexander calls this “Kolmogorov complicity”, after the great Soviet scientist who carefully never said anything that would offend the Politburo and concentrated on the scientific truths he could discover without getting thrown into a gulag. (Getting cancelled might be bad, but the Soviets had it worse.)

Still. At least, now, we can probably say without great fear of being burnt at the stake like Giordano Bruno that the IAT is no use as a predictor of racism, and that corporate measures to reduce implicit bias are largely useless. Except – hang on! I just took the test myself, and was found to have “little to no automatic preference between White people and Black people”! So I’m much less racist than you. Scratch everything I just said: all hail the IAT. Just don’t make me take it again.