'Ditch Watson and Glaser; hire a few Kant scholars instead.' Credit: Industry

One of my most vivid memories from university is of sitting in my college café, looking up from my sputtering essay, and seeing the tables in front of me full of undergraduates clicking through the identical sets of multiple-choice questions on their laptops. The university was an ancient and venerable institution, where students were encouraged to leave the world behind and plunge themselves into humanist curricula that stretched back hundreds of years. Yet no one in front of me was doing anything of the sort. Instead, they were all furiously trying to master a branch of knowledge not one of them had actually applied to study: the professional aptitude test.
In the past few years, such tests have become unavoidable for any aspiring professional. As of 2017, 75 of The Times’s Top 100 UK companies used psychometric assessments as part of their recruitment strategies, and the year before last, the global industry that designs, administers and coaches these tests was valued at over nine billion dollars. No one can escape, if they want to get on in the white-collar world. University students regularly shell out sums greater than the average termly maintenance loan on slick preparatory courses, and adult career-switchers often have to take multiple days off work to participate in whatever “assessment suite” has been dreamed up by their prospective new employer.
The purveyors of these tests certainly seem sure of their efficacy. Pearson, the multi-billion-pound market-leader, informs prospective buyers that such tests are “seen as a successful tool for predicting job success”, and that “research shows that organisations can predict over 70% of performance [sic] by using the right tools”. Those professionals who have done well by them are equally optimistic: “It’s basically all about being logical,” a corporate lawyer tells me. “It suits me, because I have a very logical brain.”
Yet despite such convincing testimonies, it remains remarkably difficult to extract from these tests any kind of philosophically robust procedure. One study of one of the most popular “critical thinking” formats found that candidates with formal training in logic and the philosophy of language might actually be at a disadvantage compared to their more ignorant counterparts. What, we might reasonably ask, is going on?
Though there are, these days, nearly as many test formats as there are HR departments, a few clear market-leaders have emerged. The most widely assessed capacity seems to be “critical thinking” — a largely undefined term that seems to have something to do with reading chunks of text and sorting valid inferences from invalid ones. In the UK, at least, the most popular rubric is the notorious “Watson-Glaser” format, which forms the major criterion for entry into professions like law, but also underpins many of the verbal reasoning tests used in the educational system, such as TSA tests employed by Oxford and Cambridge and the UKCAT test that guards the gates of medical school. It is also comparatively ancient, originating in America’s Progressive Era, the high point of race realism, eugenics, degeneracy theory, and a host of other now-unfashionable beliefs in man’s capacity to measure and taxonomise his fellow man.
Here is a paradigmatic Watson-Glaser test question, which I have adapted from Pearson’s own website:
Statement: Two hundred school students in their early teens voluntarily attended a recent weekend student conference in Leeds. At this conference, the topics of race relations and means of achieving lasting world peace were discussed, since these were problems that the students selected as being most vital in today’s world.
Based on the statement above, evaluate the following inference: “As a group, the students who attended this conference showed a keener interest in broad social problems than do most other people in their early teens.”
A) True
B) Probably true
C) Insufficient data
D) Probably false
E) False
According to Pearson’s own marks scheme, this statement is “probably true” — “because, as is common knowledge, most people in their early teens do not show so much serious concern with broad social problems.”
But is this common knowledge? Having worked in a school myself, I am struck by the fact that, if anything, young people in this country are more attentive to broad social problems than their jaded elders, often saturated in online political talking-points to the point of downright unteachability. In fact, a plausible case can be made that the best answer is C) Insufficient data, on the basis that there is nothing about students’ interest in the prompt.
Such issues may seem trivial, but they stem from a deep philosophical incoherency in all the major critical thinking formats. Tests claim to be assessing analytic ability — that is, the ability to draw conclusions from logical premises, according to the ancient form of the syllogism: A is B; All Bs are C; therefore, A is C. But there is no “probably” in a syllogism. What ends up happening, then, is that applicants are invited to bring outside judgements about semantics and the probabilistic relations of natural facts on their answer.
This raises the question of what criteria the test-takers are meant to use to make such judgements. The preparatory course-providers set up complicated rubrics that they claim will help their students sort everything out: “Understanding the difference, between common knowledge (allowed in the inference section)” reads one handy guide on a popular site called JobTestPrep, “and other types of knowledge (not allowed) is what ultimately allows you to find the correct answers to questions”. Thus, from the fact “Dan is standing at the bus stop” we can make the “common knowledge” inference that “he is probably waiting for the bus” (other possibilities, such as “he is sheltering from the rain” are here deemed to be wild and unreasonable). We cannot infer, however, from the fact that “She is driving a BMW” that “she is probably rich”. This is not “common knowledge”, but “general knowledge” — which, apparently, is a totally different thing. Even if we wrap our heads round these distinctions, not one marks scheme I have consulted acknowledges that the word “probably” could just as easily denote a semantic uncertainty as an epistemological one. Candidates with a background in philosophy of language, for example, would likely perceive that words like “rich” are relative and depend on context; they would thus be more likely to include hedging words like “probably”, and thus fall afoul of the marks scheme in a different way.
In other critical thinking formats, this blindness to the way in which logic is underpinned by semantics yields even sillier results. Consider this question, from the “evaluation of arguments” section of the JobTestPrep website:
Evaluate the answer to the following question as STRONG or WEAK:
Q: Should a company grant its employees some free time to spend in any manner they choose?
A: No — employees are likely to use the free time to clean their homes, run errands, and meet with friends, and thus bring no benefit to the company whatsoever.
According to JobTestPrep, this is a strong argument. “It is important, as it refers to the benefit the company might (or might not) get from this policy.” Here, the “common knowledge” required is nothing less than a total and unquestioning identification, on the part of the test-taker, with the mercenary ethos of the shareholder corporation. The idea the word “should” might denote any other type of obligation — moral, perhaps — is totally impermissible.
The procedures underpinning the test-setters’ categorisations are not just difficult to divine: they are completely arbitrary. The poor test-takers end up desperately trying to channel the spirit of the test, spending vast amounts of time (and, conveniently, money) on test-prep programmes in the hope of acquiring an instinctive sense of what the test-setting mandarins do and don’t think is self-evident. Needless to say, such arbitrariness encodes into the tests an immense cultural bias. If you come from a background similar to that of the test-setters, then you’ll likely do well. If you were raised in a different environment where, say, people are taught to believe themselves to have moral duties that extend into the limited-liability corporation, you will likely fail.
Even more tellingly, the shiniest, newest professional aptitude tests increasingly do away with logical deduction altogether, and instead simply measure the test-taker’s spontaneous identification with official rubrics. This procedure is the core of the “Situational Judgement Test”, or “SJT”, that dominates hiring in healthcare and human resources, and even forms a major part of the UK Civil Service’s Fast Stream exam. In the SJT, it is the candidate’s “character” that is assessed, via an online widget that dreams up hypothetical workplace conflicts and asks candidates how they would respond. Test-takers are exhorted to “research company values” in advance — and, essentially, parrot the company’s own HR policies back at it.
Clearly, what this really amounts to is a perfect recipe for workplace conservatism: the company hires graduates who do exactly the kind of thing the procedures tell them to do, always. This would be one thing if the Civil Service was a paragon of leanness, efficiency and innovation. But does anyone in the UK really think this is the case? Incidentally, several civil servants inform me that almost no one in the Civil Service believes in the Fast Stream exam: most of these hiring practices were dreamed up by private outside consultancy firms with an obvious financial interest in the Civil Service’s stultification.
This institutionalised skull-measuring has the potential to wreak catastrophic effects. It is no secret that the UK is facing a slump in productivity, its vast firms with their swollen internal bureaucracies rapidly being overtaken by competition abroad. Thomas Piketty and Katharina Pistor have argued convincingly that most of what counts for growth these days is really sterile manipulation by the administrative class: changing the legal rules surrounding one’s assets to increase their value, lobbying for new schemes to siphon public money into one’s latest quixotic startup. Brett Christophers, meanwhile, has shown how the biggest, most profitable companies these days are more interested in extracting rents than making profits, precisely because no one is doing or making anything new or exciting enough to sell. Few, however, have noted that these are exactly the pathologies we would expect to find downstream of a selection procedure that prizes ploddingly interchangeable candidates and systematically overlooks those who think differently. How many of the UK’s sclerotic institutions could be rescued by taking in people who are able to think critically, rather than people who are good at “critical thinking”?
When I put these objections to defenders of such tests, the response is uniform. Even if their methodologies aren’t fully understood, the Pearson website assures its doubters, the fact remains that such tests predict success; they are highly correlated to something called “outcomes”. Yet, when I look, there don’t seem to be robust studies proving anything of the sort — not least because the supplest and most interesting thinkers probably don’t score well enough to be given jobs in the first place. In any case, this fêted correlation could be attributed to the simple fact that “critical thinking” tests reliably weed out a few illiterate candidates. The results of a spelling test would correlate to professional success, too.
Arguably, given the well documented racial and social disparities in the UK’s professional services sector, so too would outright prejudice. One of the most astonishing facts about the literature defending the Watson-Glaser test is that it actually acknowledges that this is the case: different cultural and ethnic groups tend to achieve vastly different results. Of course, this is entirely to be expected if most of what is being measured is the test taker’s cultural proximity to the test-setter; the high priests of psychometry, however, remain determined to find a way to make their discipline seem compatible with the egalitarian ethos of the average HR department. One report by Imperial College London admits that “Pearson Vue’s own literature on the Watson Glaser Critical Thinking Assessment reports… ethnicity ha[s] been previously associated with differential group performance on the test,” only to observe that “there was no difference between groups when predicting [career] progression”. Thus, the report concludes, the playing field can be levelled via various baroque DEI initiatives. In other words: yes, the tests seem to be racially biased, but you needn’t worry because the tests don’t actually work, so you can impose whatever quotas you like in the name of compliance without feeling too guilty. Conveniently, most of the companies that sell professional assessment suites seem to be hawking “equality and diversity” services too.
Ultimately, professional assessors are guilty of making not one, but two of the great philosophical errors of our time. First, they mistake the fact that tests give quantifiable results for the idea that such results are somehow objective. Second, they assume that facts about the world can be derived from pure logic, without any theory of semantics intervening. These assumptions run rampant in the culture these days, animating everyone from online IQ fetishists to drab technocrats who think that everything can be solved with a nice, healing dose of economics. Anyone with a shred of philosophical training knows enough to question them — indeed, if one traces back the genealogy of “critical” thought far enough, one arrives at the work of Immanuel Kant, whose entire philosophical project aimed to show how “pure” or “logical” reasoning is always infected by a priori structures that it cannot itself justify.
Sadly, however, academic humanities like philosophy seem to be the one type of discipline that employers are determined not to take into consideration in their hiring procedures. This is a shame: to an unparalleled extent among branches of human knowledge, the humanities involve a questioning of abstract categories and measurement systems by which we parse and understand the world, and a consideration of whether the questions we are asking in our more mundane, technical pursuits are even the right questions. They are thus the perfect antidote to the inane proceduralism, the waterlogged bureaucracy, the creeping sense of stagnation, that haunt the contemporary workplace. If people really want to galvanise an organisation, they should start trying to rediscover some of this critical, humanistic spirit, and do away with whatever crude new solution the latest smarmy consultant has sold them. Ditch Watson and Glaser; hire a few Kant scholars instead. Maybe then, one day, the students in college cafés like mine will be able to close their test-prep software and get back in the libraries where they belong.
Join the discussion
Join like minded readers that support our journalism by becoming a paid subscriber
To join the discussion in the comments, become a paid subscriber.
Join like minded readers that support our journalism, read unlimited articles and enjoy other subscriber-only benefits.
SubscribeYou could easily argue that Pearson are wrong to say Probably True. Since the students in question voluntarily choose to spend a weekend on this stuff, then whether or not they had a keener interest, they showed a keener interest (ie displayed it by the actual fact of participation). Since the question asked whether they showed it or not, surely the correct answer should be True.
Which all proves the author’s point.
Unless the list they were offered to choose from was ‘world peace’, ‘race relations’, ‘needlework’ and ‘fly fishing’. In which case the answer is false.
The JobTestPrep question contains the same bias. In creative industries the job is to solve problems and come up with new ideas, enforced work often doesn’t help.
Both questions therefore identify candidates with a propensity to assume things and a tendency towards group think. Which actually may make them perfect job fodder for businesses where inventive thought is not required, but plodding adherence to company policies is. So the flaws in the test may be exactly what is wanted.
That’s essentially it.
Someone who told truth to power by saying ‘insufficient data’ is surely likely to be immediately outing herself as ‘difficult’.
Someone who said “insufficient data” would be outing themselves as unable to draw a conclusion unless it was staring them in the face! But yes, such people are difficult to work with – if arrogant they are blockers and a huge waste of everybody else’s time.
Someone who said insufficient data would be correct.
Thick people find it hard to work with intelligent people, that is true.
There could be both insufficient data and a decision has to be made. Many, it seems, prefer to make that decision using fake data, helps to avoid the crushing sense of reponsibility.
Sure, but in the example given, the information is there, but not everyone is able to see it. That’s the whole point of the test, and most tests: to separate those who are capable of something from those who are not.
That some commenters can’t see it, while others can, shows the test is actually working.
We’ve got a right one here! Is it National Idiot Day?
I never realised Unherd readers were such a bunch of duffers. I’m going to have to explain it to you aren’t I. Wonder if you’ll get it if I do?
In every productive industry you need people who can solve problems and come up with new ideas. These nonsensical tests with illogical “correct” answers seem to select for the opposite.
Since there is no shortage of tests for verbal reasoning, non-verbal reasoning, problem solving etc I am baffled as to why any private sector business would pay for tests that select for stupidity.
Taking responses on Unherd as an indicator, they do actually seem to do a reasonable job.
I would have said True, though Probably True would be a more conservative answer. I’m assuming points are ascribed to answers so True would still be a positive choice.
Hi Seb – you seem to be the only other person on here who has grasped the reasoning.
Hi David. Not sure what’s getting the downvotes on here. I can do Probably True too – as Saul D says, without complete information you can never say True with complete confidence. But that would presumably rule out True as an answer ever, with the possible exception of pure Mathematical questions.
Point is, the premise of the entire test is false, since you can legitimately argue two or three of the possible answers, and the test claims there is only one possible argument.
Then again, if your HR department requires you to obediently drag yourself off to DEI courses and quote chapter and verse on White Privilege, then the test, in weeding out people who question things, is I suppose doing its job.
It’s also enabling the HR department to work from home, since in automating selection processes it’s saving them from having to come into the office and actually interview candidates.
A “somewhat” interesting analysis, but what i really wanted to know was: does having a quixotic middle moniker confer an advantage when seeking publication of one’s writings?
A) Definitely
B) Probably
C) If you haven’t got one, make one up
Seriously though, the wider point about corporate conformity begs an entirely different question: what constitutes “success” in western culture these days? This is what the tests seek to determine (the likelihood of career success) but who defines what that actually means, and on what grounds?
70% success doesn’t sound particularly high though.
My firm would have at least a 70% success rate with lads we’ve employed, and the “selection process” involves little more than an hours chat in the pub
Is that because men are easier to diagnose?
In my view the author has actually misread the first example question that he has used. The point of the question is not a comparison with older people, it is that the attendees have self selected by being at the conference.
That doesn’t alter that fact that the only logically correct answer is “insufficient data”.
The question is nonsense and the “correct” answer is not logical.
These tests are actively selecting for stupid people. If this is how the civil service is selecting people to hire it goes a long way to explaining why we are living in Idiocracy.
I’m afraid you’ve misunderstood it too. And made it far worse by being arrogant about it.
But you are clever enough to see the Emperor’s beautiful clothes, right?
No, the question is stupid and the given answer is illogical:
The statement given does not logically justify the inference, and the explanation given referring to “common knowledge” is flawed.
There are loads of perfectly good tests available for verbal reasoning and English comprehension and this isn’t one of them.
I’m sorry but you are simply wrong. Seb seems to be the only other person who gets it. In a sense the tests clearly do work. Some people get the answer right, and some people get it wrong. Unsurprisingly those who get it wrong blame the test.
In most cases in life and work we are not faced with certainty, but only with probability. But we still have to make choices. The skill that needs to be evaluated is the ability to make good choices and decisions in conditions of uncertainty. This is quite a rare skill.
I’m not quite sure what is meant here by strong and weak arguments, but for me a strong argument is one based on (supposed) facts and evidence which actually follows. It may still be wrong if these facts are wrong, but it is not weak qua argument. In weak arguments the conclusion simply doesn’t follow from the supposed facts.
excellent article. this is the non conventional content that Unherd should regularly produce.
To help my daughter, who misguidedly thought she would like to be a civil servant, I created a profile and did the online initial test, taking screenshots of each question. I passed the test because I know how to play the game. She passed too, using my answers, but fortunately came to her senses (and failed the personality test, which I knew she would).
Civil servants have a personality test?
Brazil is here.
I’ll remain agnostic on whether these tests actually do what they claim, but all job selection takes character into account.
At the very least you want to select out people who are lazy. You might also want to select out people whose arrogance far outstrips their ability – they don’t just underperform, but they block everyone else. Some people are just jerks.
Interviews are often very poor at weeding out such people.
You can’t test for character. A person can lie on a test about their character.
You can test for intelligence, problem solving, verbal and non verbal reasoning, numeracy and subject matter knowledge. When an organisation choses not to test for these things, but run the nonsensical tests described in the article, you have to question the motive for this.
I tend not to do well on these tests. Safe to say, I am ‘probably’ not a dummy.
Then you’ve missed an opportunity to write a piece for Unherd damning the tests on that basis. 🙂
Insufficient data to support that thought? Probably false!
Of course they don’t, this is why the civil service is not:
The author is making his opponents point for them. These tests are a threat to a complacent status quo.
These tests weed out intelligent people who have good reading comprehension, verbal reasoning and ability to think logically and they award marks for adherence to groupthink and false “common knowledge”.
The opposite of a threat to a complacent status quo.
Presumably because all organisations are intent on committing suicide.
The question isn’t “are these tests perfect” but are they better than (or do they supplement) more traditional approaches: old boy/girl network, wealthy parents, which school/university you went to, job interview etc.
By all means, better tests, but if they do a better job than the alternatives at getting real talent where it can do most good, that’s a good thing.
The question is – are these tests worthless and do they award marks for logically incorrect answers.
Yes they do.
Tests that actually work at identifying intelligence, reasoning, comprehension, logic, problem solving, numeracy, understanding of statistical information already exist. But the Civil Service choose not to use them, and use these tests instead.
And this is why we are living in Idiocracy.
The authors view of companies looking for organisation men to fit into their conformist bureaucratic organisations is decades out of date.
Like it or not, many private sector organisations (but it depends on type) have swung the other way, desperately looking for competitive advantage anywhere it can be found. For these organisations a single great idea can make or save millions. They actively want people who can think “out of the box”.
This is part of the reason they jumped on the diversity bandwagon – they thought a diverse workplace would be a more creative workplace.
Quite a few years ago now, but Lucy Kellaway wrote an article in the FT exploring this, and the answer was resoundlingly that organisations say they want people to ‘think out of the box’, but in practice they certainly do not.
What they actually want is people who do what they have been asked to do, and in the way they have been asked to do it.
Leaving aside (the very few) businesses which need inventors or innovators, managements regard those who think for themselves as nothing but trouble.
Some truth in this obviously. Organisations frequently scupper their own attempts to become more effective. But that’s not the same as suggesting they deliberately use tests, ostensibly to help select for the best, but in reality to fill their ranks with unimaginative duffers.
When I wanted to transfer to another part of the Army, I was required to do an aptitude test specific to that role which I failed; I assumed I wouldn’t be accepted but I was told that it was just a box ticking exercise and that going through the training course was far more important, which I completed and passed easily.
If they had any idea what it takes ai could do it.
“Aptitude tests are making us stupid They encourage workplace conservatism”
That’s why bureaucracies like the civil service, trade unions and giant conglomerates love them. Staff who actually think for themselves, try to improve processes by questioning the status quo and care more about fulfilling their responsibilities than toeing the party line are dangerous beasts!
I was in an investment bank about 20 years ago as a contractor. The contractors were brought in to get the job done & took risks. I looked around at the permanent staff who had been selected via a psychometric test. They were all intelligent “yes men/women” who didn’t rock the boat, never criticised someone higher than them and most were doing menial, repetative work.
What I found particularly dumbfounding regarding the Situational Judgement Test I had to take for a shelf stacking job at Asda and a postie for the Royal Mail was the extent to which the various situations were decontextualised so that there was no context variety in which interpersonal relations took place or dilemmas occured.
For example would I leave a parcel outside a flat door in a hi-rise block of flats with an inner city. No. Would I leave a parcel outside the door of a cottage surrounded by gardens. Yes.
So I presumed the test was expecting me to conform to established operating procedures which I wasn’t familiar with. I failed the Royal Mail test and passed the Asda test but did not proceed to the next stage for either job. In other words I had no clue what they were looking for, whether it was Initiative or subservience!
My overall impression was that these tests weren’t developed by people who live in the real world of common sense but an imaginary world of conveyor belt automatons.
My story with this was about 5 years ago everyone at my company had to complete one of these. 140 minutes to complete 19 questions. Everyone thought this was a bit of a joke. They weren’t laughing by the end. It worked out to about 7 minutes per question and the questions were each several paragraphs worth of twaddle before the end multiple choice answer had to be given. I was one of the first to go and worked out the amount of time on each question and had a timer next to me at 6 mins to tell me I had just over a minute left. I finished all the questions but there no rereading through the text to look for answers or really thinking the answers through. Most people got about halfway before time ran out. It was such a farce that in the end they had to scrap the “score” aspect and just said people would be judged on the answers to the questions they had completed. I don’t imagine that would happen nowadays.
I’m perpetually confused by the success of runners with African heritage. As no human beings have any genetic advantages whatsoever I can only assume it’s related to attitude.
I was talking to a young chap (10ish) the other day who is at a prep school and is aiming at a very serious boarding achool.
One of the questions he was asked at an interview was ‘is everything art?’. He’s 10. But if you can extrapolate on that, wnat better way of identifying a sharp mind?
My primary school headmaster observed presciently to me after I was made up at coming top in the 11-plus, that “All tests only test your ability to pass the test”.
It would seem to this layman that tests like these are designed principally to recruit conformists and week out anyone who might rock the boat.
One of the issues with these tests is that they are multiple choice. I.e. the only answer is to be found from those presented. It therefore narrows thought, as it forces you to only select from the menu, there are no “off menu” answers to the problem. Talk about siloized thinking!. I have found myself disagreeing with the presented answers as being inadequate, as some subtlety or valid interpretations of the question is not permitted.
I think the description should be such make us biased.
And there us nothing conservative about being stupid.
I think the concern is the questions are not woke enough.
Very smartly written – much kudos