An AI has been trained to detect the “best” interview candidates from their facial expressions and use of language

There’s a famous anecdote about AI, a sort of cautionary tale. It’s about tanks, and it’s probably not true. But it is relevant to the ongoing debate about the use of AI algorithms in hiring, or in parole, and whether they will entrench racism and sexism with a veneer of objectivity. The latest is an AI trained to detect the “best” interview candidates from their facial expressions and use of language.
Anyway, the story goes that a military AI was trained to detect tanks in photographs. It got shown lots of pictures, some with tanks in, some without, and it was told which was which. It worked out features that were common to the tank-containing pics, and then, when given a new picture from the same source, would use that info to say “yes, tank”, or “no, no tank”, as appropriate.
But apparently, when the AI was given pictures from a new source, it failed utterly. And it turned out that the AI had worked out that the photos with tanks in had been taken on sunny days, and the others on cloudy ones. So it was just classifying well-lit pics as “yes, tank”. When new pictures, taken by other sources which hadn’t been photographing sunbathing tanks, were used, the system broke down.
The AI blogger Gwern has tried to trace the story back, and it transpires there are multiple iterations of it: sometimes it’s saying tank vs no tank, sometimes it’s identifying Soviet vs American tanks; sometimes it’s ‘sunny days’ that’s the confounding factor, sometimes it’s the time of day, or that the film had been developed differently. Versions go back at least to the 1980s and possibly to the 1960s. Sometimes it’s Soviet tanks in forests, sometimes it’s Desert Storm.
There another story about an AI set a task of telling huskie dogs from wolves. All the wolves in its training data were photographed on snow, so the AI learnt to call any animal photographed against snow a wolf. In this real story, the AI was deliberately badly trained, on deliberately badly chosen training data – it was a test. But when it was trained properly, it worked much better
This story is used to make the point that any AI is only as good as the data you train it on, and it is impossible to know how good the data you’re training it on actually is.
While this may be something of an oversimplification, it’s essentially true. The reason why AI is useful – or machine learning software that uses neural networks, which is usually what people mean by AI – is that it can work through absolutely vast amounts of data and find patterns and correlations that humans can’t. No human could go through that much information.
This capacity has profound impacts. In science, the huge amounts of data that are thrown up by, for instance, biomedical research, or astronomy, can be analysed to reveal previously unexpected links: genome-wide association studies found that multiple sclerosis, is in fact, a disease of the auto-immune system, like rheumatoid arthritis, even though it presents as a neurodegenerative disease like Parkinson’s or Alzheimer’s.
But the trouble is that – pretty much by definition – it is impossible for a human to check those datasets. If a human could check them, you wouldn’t need the AI. And if a dataset is, for some reason, imperfect, then the AI will learn things from it that you won’t want it to. It won’t learn to say there’s a tank in every picture of a sunny day – modern image-recognition software is cleverer than that, and trained on much wider datasets, and there are ways around that sort of problem anyway – but it may have analogous, but more subtle and perhaps more insidious, problems.
That’s fundamentally the worry about AI being racist, or sexist. You might train your AI on some dataset of people who previously performed well at a job, or people who have or haven’t reoffended after release from prison, and those datasets contain large amounts of information about each person; years of previous experience, say, or number of previous offences – but also age, sex, ethnic makeup.
And if the training data tended to favour people of a certain sex or race, then the AI may learn to preferentially pick those people as well. It doesn’t even matter if the data isn’t the product of people being racist or sexist themselves. The training data may be full of people who really did do well at their job, or not reoffend – analogously to the training data correctly labelling whether or not a picture is of a tank.
Then it may turn out that, for societal reasons, women or black people turn out to do, on average, less well on those criteria. Just as the apocryphal AI was able to categorise the tank-pictures by whether the pic was taken in sunshine, the hiring-algorithm AI might categorise potential hires by whether they are white or male.
Even if you don’t tell the AI people’s sex or race, it may not help, because it may be able to work it out to a high degree of accuracy from proxies – postcodes, for instance, or first names.
The job-interview AI apparently looks at facial expressions, and listens to the words the interviewee uses, and compares it to 25,000 “pieces of facial and linguistic information” taken from people who have gone on to be good at a job. (Incidentally, the description that the company’s CTO gives of the analysis of verbal skills – “do you use passive or active words? Do you talk about ‘I’ or ‘we’?” – is deeply suspect from a linguistic point of view. The idea that people who say “I” a lot are narcissistic is a myth, and people talk an awful lot of crap about “the passive”.)
But, again, the AI can only be as good as the training data. It may be that “people who turned out to be good at their jobs” were more likely to have certain facial expressions or turns of phrase, and it may well be that those facial expressions or turns of phrase are more common among certain ethnic groups. And “people who turned out to be good at their jobs” are, of course, people who got hired in the first place. It is almost impossible to remove hidden biases. (I would be intrigued to know how, for instance, autistic people, or disabled people, would do on this facial-expressions stuff.)
That is not a reason to throw the whole idea of AI in hiring, or other areas, out. Algorithms – even simple ones – have been shown to do a better job of predicting outcomes than have humans in a wide variety of areas. The AI might be biased, but they are only biased because the humans they are replacing were biased too.
For instance, Amazon’s famously sexist hiring algorithm that got scrapped last year only undervalued female applicants because the human hiring decisions had been systematically undervaluing female applicants themselves. It’s not that getting rid of the AI will get rid of the bias. In that case, it made it explicit. Amazon now knows that its hiring practices were biased. But it is a reason to be extremely wary of throwing AI into the mix and saying, well, now we’re unbiased, look, an algorithm did it, so it must be OK. And it’s doubly the case where the AI algorithm is proprietary — so you can’t look into its guts and see where it goes wrong.
That’s key. I said at the beginning that it is impossible to know how good the data you’re training an AI on actually is, and that’s true. But there are things you can do to try to find out. There are researchers working on something called “provably correct software systems”, which are somewhat misnamed – you can’t ever prove that software is correct – but you can go in and check parts of the data, or the weighting of the nodes in its network, which can increase your confidence.
If an AI is owned by a private company which won’t let you go and check the data or the algorithms, though (as is the case with the hiring one, and some of the parole ones), it becomes very hard to be confident. So for the time being, it’s worth being very, very wary of anyone who says their fancy AI can tell you who’s going to be good at their job. You can probably trust it to identify tanks, though.
Join the discussion
Join like minded readers that support our journalism by becoming a paid subscriber
To join the discussion in the comments, become a paid subscriber.
Join like minded readers that support our journalism, read unlimited articles and enjoy other subscriber-only benefits.
SubscribeMost of us aren’t obsessing over it Tom. It is only the wretched BBC that is obsessing over it in their vicious and demented desire to discredit the government in any way they can.
I agree that as usual, journalists are being unhelpful. But there is nothing wrong with a stretch target, a very common and normal business tactic. Whether you get there or not, it makes the workforce try hard..
I agree with the sentiment behind the points made in this article but politicians and governments should be held to account when they make pledges of any kind.
Even in the case of the arbitrary 100,000 number, Hancock pledged that this would be the number of tests carried out daily by now, and has since changed his statement and spoken in terms of capacity. He offers no rationale for the change of phrasing and thus should be pressed to provide one.
Although this particular case of moving the goalposts is ultimately irrelevant in real terms, the principle of politicians promising things only to go back on them at later date without acknowledging that they have done so, is a habit that needs to be booted out of politics.
The minute journalists cease to press politicians to explain their u-turns or subtle rephrasing of explicit pledges, we encourage the cycle of dishonesty and ambiguity, and hand more power from the people over to the politicians.
The Government has been very clear, the target is for tests carried out, not for testing capacity.
That’s what they initially said, yes. Then in the last week of April, Hancock et al switched and started talking in terms of capacity rather than tests carried out. They moved the goalposts without explanation or acknowledgement. Not very clear at all…
And today (1st May) we find out that the government is claiming to have met (and exceeded) its target. However, when looking at the numbers, “people tested” falls well below the 100k threshold. They topped up the numbers with “number of tests sent out to people”. Even if we take your comment as correct (which it isn’t), they have still fudged the numbers without explanation or acknowledgement. It’s shady, dishonest, and should be queried.
Absolutely 100% agree. It has just become another foil for bloviating TV and Radio 4 pundits to gibber about.
The German testing success, which the media and government keep referring to has nothing to do with testing. On Marr on Sunday the next German Ambassador to the UK (Andreas Michaelis) distanced himself from the assertion. How can it have? The best guesstimate for the number of Germasn having had the virus is 7.5 – 10 million. Their test program has found 160,000, That represents circa 2% of the cases that are beleived to have occurred. What difference does that make? The German success is the death rate not the case rate. They must have wrapped up their old and vulnerable exceptionally well. We need to ask them how they did it.
Hancock set himself up to fail, he panicked pulled out a figure to satisfy press scrutiny and then crossed his fingers that it could be done, it wasn’t a goal it was a wish.Clearly he’s never been in business, if he had been he would have known that this situation was crying out for the classic under promise over deliver. As a result Hancock, I am afraid, is serving on borrowed time.
I hope so. Another chancer in a cohort of charlatans.
Tom Chivers identified the system which the TV Leftwing political hacks used to try to get a GOTCHA over a Tory government minister. This was just a political ploy to rubbish the government.
The trap the Luvies failed to see: Criticism of the governments efforts over their COVID-19 campaign have now brought calls for a review, which will bring the beloved NHS into the spotlight. NHS Quangoes may not be able to dodge.
But he did not fail. He smashed his target. Obviously he had to pull a number of stokes to do it, but it was important to show the immense challenges could be overcome and thereby give confidence that the even bigger challenge which is to get out of this mess and get our economy back in some semblance of order can also be met.
Hear hear
A program to avoid
Don’t you mean progrom? Or have you resorted to American spelling as a means of stressing your Celtic identity?
All pogroms should be avoided. They’re beyond the Pale of Settlement.
Having bee sexually active since the early 1970s, I believe I have the experience to state that sex between brain-functioning men and women need not be continually punctuated by requests for permission, apologies, self-doubt, etc. A truly “normal” person can sense how far to go, what to do or not to do, and just enjoy him or herself and get on with it.
It was certainly foolish of Hancock to pluck an arbitrary figure out of the air and then commit to reaching it by a specific date. However, Tom, I can tell you that, now it looks like – amazingly – the target might be met, the media have already stopped obsessing about it and changed the goalposts.
See the BBC website yesterday: “Is who we test more important than how many?”
But .. an independent review of a global pandemic is impossible, unless Star Trek is real and you have some Vulcans handy
Meantime, here is an alternative point of view. It makes for difficult reading, but an uncomfortable amount of it rings true: https://medium.com/@indica/…
“Judge Wilson believes that if a living thing is not a person, then one has the right to end its life. She also believes that a foetus is a person. Therefore, Judge Wilson concludes that no one has the right to end the life of a foetus.” This is the logically unsound argument (I think).
If I’m right then an initial reaction might be to pat myself on the back and tell myself how rational and logical my thinking is. I would, however, also do well to notice that the focus of that particular syllogism (i.e. abortion) is a topic that I am uncertain of and have yet to formulate a strong opinion about…
Following the idea that partisanship blinds in the face of confirmation, my indecisiveness and non-partisanship on the topic of abortion meant that the syllogism stuck out to me like a sore thumb as logically unsound, whereas this may not have been so glaring if I was strictly “pro-life” or “pro-choice”. In other words, my opinion that “I don’t yet have a firm opinion on the topic of abortion” may have meant that syllogism stuck out to me immediately as logically unsound because it does contain “firm opinions” on the topic of abortion – effectively the opposing stance to mine.
Or I’m over-thinking things and it’s simply just the one that is logically unsound and not actually that difficult to spot in any case. But at least I was right…
Unfortunately the government and the various supporting quangos have shown themselves to be repeatedly wrong and easy targets. The testing is just one area, the others insufficient and inadequate PPE, ventilators, beds, general preparedness when they have even run scenario exercises, centralisation, logistics (only worked with support from the military). The bloated ineffective quangos such as PHE who nobody is tackling. They should thank themselves lucky the news system is not more thorough.
This comments system is glitching, replacing about half a dozen comments with just one.
Great opinion piece from Sarah. On a minor point, “recherche theories” might be better written as “recherché theories”. The English language gets along quite well without any diacritical marks for homegrown words, and there is a temptation to omit them all from all words borrowed from foreign borrowings as well. In Canada, my country, where French is an official language, one is more likely to see the diacritical marks included, but usage varies in English-language publications, so one will find, for example, both Rivière-du-Loup (it means “Wolf River”) and Riviere-du-Loup being used for the city in Eastern Quebec, but no Anglophone would ever pronounce “Riviere” with two syllables instead of three. The word “recherché” poses a particular problem, because unlike “rivière”, the unaccented word is a noun with a meaning, “research” or “search”, quite different from the adjective, “exotic” or “pretentious”. It would seem to me that hear the danger may be, not so much that the use of the diacritical mark may be seen as an affectation by an anglophone reader as its omission may be misleading to a francophone reader with imperfect grasp of English, who might misleadingly think that the idea that girls’ eyes are specially adapted to spotting berries is one of the most carefully researched theories of sexual differences, which is not what Sarah is saying. I presume Sarah is already reaching an international audience; I am reading her after all. To my mind, even if the diacritical mark is generally omitted, it should always be included where the omission implies a difference in meaning, and therefore a chance of misunderstanding. Incidentally, the great Henry Fowler, who was opposed to the pretentious use of French in English, seemed to have no objection to diacritics as such, and lists “recherché” as a French word in common use in his “Modern English Usage”.
To eradicate the virus we’ll need the capability to test millions a day and get more or less instant results. I’d like to see journo’s sticking that on the table and challenging the government to say it’s unnecessary or unachievable.
Yawn. UnHerd is losing it with this sort of sub-women”s page filler (which is sexist in itself). More generally, UnHerd seems to have moved to too much in quantity, too little in quality. Time to get back to the basics that made it so refreshing as a start-up.
Developers looking to buy up hotel blocks at quiet spots in the UK e.g. Heathrow.
“
What sort of sexist are you?
The superficial niceness of benevolent sexism allows boys to hang onto it more easily”
Isn’t your headline sexist?
Well said, Peter. I hope all the people who read Giles Fraser’s condescending column, “What Peterson Shares with Pelagius” read yours, which almost appears to be written as a rebuttal.
I didn’t know anything about Claire Lehmann before I read this, so I watched an interview of her with John Anderson, and she is simply marvelous. Thank you so much, Peter, for getting me interested in her work. The “intellectual dark web” sounds like something Doctor Strange will have to defeat if there is a movie sequel, and it is just some NYT journo’s pejorative term for a gaggle of conservative thinkers who can be found on the internet. It shows how the woke left seeks to demonize conservatives rather than to understand them. By the way, Peter makes no mention of Debra Soh, the brilliant Ontario sex researcher who is an ally of Jordan Peterson, although she is mentioned as a member of the intellectual dark web in the NYT article. Peter may have inadvertently given the impression that the so-called IDW intellectuals are all Caucasian, or even worse, that one must be a Caucasian to be a member of the club. That’s not the case.