Aptitude tests are making us stupid

'Ditch Watson and Glaser; hire a few Kant scholars instead.' Credit: Industry

Civil service DEI HR Society students UK University

Thomas Peermohamed Lambert

Mar 12 2025 - 12:00am 9 mins

One of my most vivid memories from university is of sitting in my college café, looking up from my sputtering essay, and seeing the tables in front of me full of undergraduates clicking through the identical sets of multiple-choice questions on their laptops. The university was an ancient and venerable institution, where students were encouraged to leave the world behind and plunge themselves into humanist curricula that stretched back hundreds of years. Yet no one in front of me was doing anything of the sort. Instead, they were all furiously trying to master a branch of knowledge not one of them had actually applied to study: the professional aptitude test.

In the past few years, such tests have become unavoidable for any aspiring professional. As of 2017, 75 of The Times’s Top 100 UK companies used psychometric assessments as part of their recruitment strategies, and the year before last, the global industry that designs, administers and coaches these tests was valued at over nine billion dollars. No one can escape, if they want to get on in the white-collar world. University students regularly shell out sums greater than the average termly maintenance loan on slick preparatory courses, and adult career-switchers often have to take multiple days off work to participate in whatever “assessment suite” has been dreamed up by their prospective new employer.

The purveyors of these tests certainly seem sure of their efficacy. Pearson, the multi-billion-pound market-leader, informs prospective buyers that such tests are “seen as a successful tool for predicting job success”, and that “research shows that organizations can predict over 70% of performance [sic] by using the right tools”. Those professionals who have done well by them are equally optimistic: “It’s basically all about being logical,” a corporate lawyer tells me. “It suits me, because I have a very logical brain.”

Yet despite such convincing testimonies, it remains remarkably difficult to extract from these tests any kind of philosophically robust procedure. One study of one of the most popular “critical thinking” formats found that candidates with formal training in logic and the philosophy of language might actually be at a disadvantage compared to their more ignorant counterparts. What, we might reasonably ask, is going on?

“No one can escape, if they want to get on in the white-collar world.”

Though there are, these days, nearly as many test formats as there are HR departments, a few clear market-leaders have emerged. The most widely assessed capacity seems to be “critical thinking” — a largely undefined term that seems to have something to do with reading chunks of text and sorting valid inferences from invalid ones. In the UK, at least, the most popular rubric is the notorious “Watson-Glaser” format, which forms the major criterion for entry into professions like law, but also underpins many of the verbal reasoning tests used in the educational system, such as TSA tests employed by Oxford and Cambridge and the UKCAT test that guards the gates of medical school. It is also comparatively ancient, originating in America’s Progressive Era, the high point of race realism, eugenics, degeneracy theory, and a host of other now-unfashionable beliefs in man’s capacity to measure and taxonomize his fellow man.

Here is a paradigmatic Watson-Glaser test question, which I have adapted from Pearson’s own website:

Statement: Two hundred school students in their early teens voluntarily attended a recent weekend student conference in Leeds. At this conference, the topics of race relations and means of achieving lasting world peace were discussed, since these were problems that the students selected as being most vital in today’s world.

Based on the statement above, evaluate the following inference: “As a group, the students who attended this conference showed a keener interest in broad social problems than do most other people in their early teens.”

A) True
B) Probably true
C) Insufficient data
D) Probably false
E) False

According to Pearson’s own marks scheme, this statement is “probably true” — “because, as is common knowledge, most people in their early teens do not show so much serious concern with broad social problems.”

But is this common knowledge? Having worked in a school myself, I am struck by the fact that, if anything, young people in this country are more attentive to broad social problems than their jaded elders, often saturated in online political talking-points to the point of downright unteachability. In fact, a plausible case can be made that the best answer is C) Insufficient data, on the basis that there is nothing about students’ interest in the prompt.

Such issues may seem trivial, but they stem from a deep philosophical incoherency in all the major critical thinking formats. Tests claim to be assessing analytic ability — that is, the ability to draw conclusions from logical premises, according to the ancient form of the syllogism: A is B; All Bs are C; therefore, A is C. But there is no “probably” in a syllogism. What ends up happening, then, is that applicants are invited to bring outside judgments about semantics and the probabilistic relations of natural facts on their answer.

This raises the question of what criteria the test-takers are meant to use to make such judgments. The preparatory course-providers set up complicated rubrics that they claim will help their students sort everything out: “Understanding the difference, between common knowledge (allowed in the inference section)” reads one handy guide on a popular site called JobTestPrep, “and other types of knowledge (not allowed) is what ultimately allows you to find the correct answers to questions”. Thus, from the fact “Dan is standing at the bus stop” we can make the “common knowledge” inference that “he is probably waiting for the bus” (other possibilities, such as “he is sheltering from the rain” are here deemed to be wild and unreasonable). We cannot infer, however, from the fact that “She is driving a BMW” that “she is probably rich”. This is not “common knowledge”, but “general knowledge” — which, apparently, is a totally different thing. Even if we wrap our heads around these distinctions, not one marks scheme I have consulted acknowledges that the word “probably” could just as easily denote a semantic uncertainty as an epistemological one. Candidates with a background in philosophy of language, for example, would likely perceive that words like “rich” are relative and depend on context; they would thus be more likely to include hedging words like “probably”, and thus fall afoul of the marks scheme in a different way.

In other critical thinking formats, this blindness to the way in which logic is underpinned by semantics yields even sillier results. Consider this question, from the “evaluation of arguments” section of the JobTestPrep website:

Evaluate the answer to the following question as STRONG or WEAK:

Q: Should a company grant its employees some free time to spend in any manner they choose?

A: No — employees are likely to use the free time to clean their homes, run errands, and meet with friends, and thus bring no benefit to the company whatsoever.

According to JobTestPrep, this is a strong argument. “It is important, as it refers to the benefit the company might (or might not) get from this policy.” Here, the “common knowledge” required is nothing less than a total and unquestioning identification, on the part of the test-taker, with the mercenary ethos of the shareholder corporation. The idea the word “should” might denote any other type of obligation — moral, perhaps — is totally impermissible.

The procedures underpinning the test-setters’ categorizations are not just difficult to divine: they are completely arbitrary. The poor test-takers end up desperately trying to channel the spirit of the test, spending vast amounts of time (and, conveniently, money) on test-prep programs in the hope of acquiring an instinctive sense of what the test-setting mandarins do and don’t think is self-evident. Needless to say, such arbitrariness encodes into the tests an immense cultural bias. If you come from a background similar to that of the test-setters, then you’ll likely do well. If you were raised in a different environment where, say, people are taught to believe themselves to have moral duties that extend into the limited-liability corporation, you will likely fail.

How to free the universities

By Kathleen Stock

Even more tellingly, the shiniest, newest professional aptitude tests increasingly do away with logical deduction altogether, and instead simply measure the test-taker’s spontaneous identification with official rubrics. This procedure is the core of the “Situational Judgment Test”, or “SJT”, that dominates hiring in healthcare and human resources, and even forms a major part of the UK Civil Service’s Fast Stream exam. In the SJT, it is the candidate’s “character” that is assessed, via an online widget that dreams up hypothetical workplace conflicts and asks candidates how they would respond. Test-takers are exhorted to “research company values” in advance — and, essentially, parrot the company’s own HR policies back at it.

Clearly, what this really amounts to is a perfect recipe for workplace conservatism: the company hires graduates who do exactly the kind of thing the procedures tell them to do, always. This would be one thing if the Civil Service was a paragon of leanness, efficiency and innovation. But does anyone in the UK really think this is the case? Incidentally, several civil servants inform me that almost no one in the Civil Service believes in the Fast Stream exam: most of these hiring practices were dreamed up by private outside consultancy firms with an obvious financial interest in the Civil Service’s stultification.

This institutionalized skull-measuring has the potential to wreak catastrophic effects. It is no secret that the UK is facing a slump in productivity, its vast firms with their swollen internal bureaucracies rapidly being overtaken by competition abroad. Thomas Piketty and Katharina Pistor have argued convincingly that most of what counts for growth these days is really sterile manipulation by the administrative class: changing the legal rules surrounding one’s assets to increase their value, lobbying for new schemes to siphon public money into one’s latest quixotic startup. Brett Christophers, meanwhile, has shown how the biggest, most profitable companies these days are more interested in extracting rents than making profits, precisely because no one is doing or making anything new or exciting enough to sell. Few, however, have noted that these are exactly the pathologies we would expect to find downstream of a selection procedure that prizes ploddingly interchangeable candidates and systematically overlooks those who think differently. How many of the UK’s sclerotic institutions could be rescued by taking in people who are able to think critically, rather than people who are good at “critical thinking”?

American education's new dark age

By William Deresiewicz

When I put these objections to defenders of such tests, the response is uniform. Even if their methodologies aren’t fully understood, the Pearson website assures its doubters, the fact remains that such tests predict success; they are highly correlated to something called “outcomes”. Yet, when I look, there don’t seem to be robust studies proving anything of the sort — not least because the supplest and most interesting thinkers probably don’t score well enough to be given jobs in the first place. In any case, this fêted correlation could be attributed to the simple fact that “critical thinking” tests reliably weed out a few illiterate candidates. The results of a spelling test would correlate to professional success, too.

Arguably, given the well documented racial and social disparities in the UK’s professional services sector, so too would outright prejudice. One of the most astonishing facts about the literature defending the Watson-Glaser test is that it actually acknowledges that this is the case: different cultural and ethnic groups tend to achieve vastly different results. Of course, this is entirely to be expected if most of what is being measured is the test taker’s cultural proximity to the test-setter; the high priests of psychometry, however, remain determined to find a way to make their discipline seem compatible with the egalitarian ethos of the average HR department. One report by Imperial College London admits that “Pearson Vue’s own literature on the Watson Glaser Critical Thinking Assessment reports… ethnicity ha[s] been previously associated with differential group performance on the test,” only to observe that “there was no difference between groups when predicting [career] progression”. Thus, the report concludes, the playing field can be leveled via various baroque DEI initiatives. In other words: yes, the tests seem to be racially biased, but you needn’t worry because the tests don’t actually work, so you can impose whatever quotas you like in the name of compliance without feeling too guilty. Conveniently, most of the companies that sell professional assessment suites seem to be hawking “equality and diversity” services too.

Ultimately, professional assessors are guilty of making not one, but two of the great philosophical errors of our time. First, they mistake the fact that tests give quantifiable results for the idea that such results are somehow objective. Second, they assume that facts about the world can be derived from pure logic, without any theory of semantics intervening. These assumptions run rampant in the culture these days, animating everyone from online IQ fetishists to drab technocrats who think that everything can be solved with a nice, healing dose of economics. Anyone with a shred of philosophical training knows enough to question them — indeed, if one traces back the genealogy of “critical” thought far enough, one arrives at the work of Immanuel Kant, whose entire philosophical project aimed to show how “pure” or “logical” reasoning is always infected by a priori structures that it cannot itself justify.

Sadly, however, academic humanities like philosophy seem to be the one type of discipline that employers are determined not to take into consideration in their hiring procedures. This is a shame: to an unparalleled extent among branches of human knowledge, the humanities involve a questioning of abstract categories and measurement systems by which we parse and understand the world, and a consideration of whether the questions we are asking in our more mundane, technical pursuits are even the right questions. They are thus the perfect antidote to the inane proceduralism, the waterlogged bureaucracy, the creeping sense of stagnation, that haunt the contemporary workplace. If people really want to galvanize an organization, they should start trying to rediscover some of this critical, humanistic spirit, and do away with whatever crude new solution the latest smarmy consultant has sold them. Ditch Watson and Glaser; hire a few Kant scholars instead. Maybe then, one day, the students in college cafés like mine will be able to close their test-prep software and get back in the libraries where they belong.

Thomas Peermohamed Lambert is a writer from London. He is the author of a novel, Shibboleth, to be published in May 2025.

	This comment is spam
	This comment should be marked mature
	This comment is abusive
	This comment promotes self-harm
	Other

Aptitude tests are making us stupid They encourage workplace conservatism

Thomas Peermohamed Lambert

Mar 12 2025 - 12:00am 9 mins

How to free the universities

American education's new dark age