Exam grades are predicted to plummet this year, according to reports, after the Government indicated a wish to return to pre-pandemic levels. UK schools have seen rampant grade inflation since 2020, when formal exams were replaced by teacher assessment — a metric both more subjective and with obvious incentives to err on the side of generosity.
Now Ofqual, the Government’s exam standards regulator, has told examiners to aim for approximately the proportion of top grades achieved in 2019. The result is expected to be 100,000 fewer A and A* grades.
This is hard luck for any young person sitting exams this year, who will be facing a higher bar for the same grades than those a mere 12 months older. But I also wonder what the tutelary effect will be for this academic cohort, of seeing the lie given so starkly to the supposed objectivity of the main measure of their academic success. How would it feel to see a purportedly detached evaluation mechanism, with significant implications for their future prospects, casually revealed as a political football — or even to experience the downgrade of their own life chances as a consequence?
Of course, this isn’t to say that exam results are wholly political and arbitrary, or that there’s no relation at all between study and grade. But, in principle at least, if what they measured was genuinely a “standard” of attainment, one would expect results to fluctuate more over the years. More, the proportion of top grades would sometimes go down as well as up, without any need for prompting from the standards quango.
But, as is well-documented, this doesn’t happen. Instead, nigh on everyone, from exam boards to teachers to students to education ministers themselves, has an interest in seeing the line go up — and so it does, except when the quango says it shouldn’t.
This has given exam statistics a post-truth edge at least since I was at school, which is some time ago now. But I can’t imagine how dispiriting it would be as a school leaver now, to be confronted with the fact that an apparently objective metric with wide-ranging consequences for one’s individual life is significantly governed by the consequences of its contribution to an abstract measure called “exam data”.
Join the discussion
Join like minded readers that support our journalism by becoming a paid subscriber
To join the discussion in the comments, become a paid subscriber.
Join like minded readers that support our journalism, read unlimited articles and enjoy other subscriber-only benefits.
SubscribeIt’s hilariously naive to say so now, but when I was at school I believed that there was a fixed threshold for each grading. The first time I became aware that overall results were improving year on year it was a rather discombobulating experience.
This used to be the standard. Top 10% got the top grade, next 10% got the next etc.
But how is that fair? Some years would require to you score 90% to be in the top
10% and acquire the top grade, whereas fluctuations the following year may mean you only need to score 80% for a top grade. You could end up with students a year apart where one scored higher than the other but received a lower grade
Yes. And objectively it would be fairer than what we have now with top marks given out to average students who are over-prepped for exams.
…it was fair because it didn’t assume that everybody was getting smarter every year, because there is no empirical evidence that we are…but it knew that the quality of exam setting WOULD definitely vary…because exams are set by people, with all kinds of quirks and foibles…
…so the aim was to place people in the right percentile within their age cohort, confident that the top ten percent each year would be about as smart as those who went before, and those who followed after.
Your example actually shows that it worked exactly as it should have done…in year 1, the top 10% got 90% because the exam was comparatively easy…the next year the top 10% only got 80%, because the exam was harder…but you still identified the top 10%, and accurately placed all the candidates according to comparative ability within their cohort.
The marks don’t matter, and in fact we were not even told what they were beyond a pretty wide range…but we did know where we stood against our peers…and how fitted we were for academic life…as did Universities, Colleges and Employers…
How on earth is that to be achieved with everybody getting 3 A*s?
But that doesn’t answer my question. You could have somebody who outperforms another who is in the year below, but end up with a lower grade simply because of the achievement of their peers in the same age group. Exams should tell you how good you are against everybody else, not just compare you to others who happen to be the same age
No they shouldn’t. That’s nonsense. Common sense and experience tells us that achievement varies hardly at all between years anyway. And there really is little need to “measure” that in any case – and if you do you’ll likely be measuring noise anyway.
RSF’s comment explains perfectly the main purpose of the exam grades. And it’s not “fairness”. Whatever that means. It’s objective ranking of students by competence in order to allocate places at the next level of education in a “fair” and objective way.
No they shouldn’t. That’s nonsense. Common sense and experience tells us that achievement varies hardly at all between years anyway. And there really is little need to “measure” that in any case – and if you do you’ll likely be measuring noise anyway.
RSF’s comment explains perfectly the main purpose of the exam grades. And it’s not “fairness”. Whatever that means. It’s objective ranking of students by competence in order to allocate places at the next level of education in a “fair” and objective way.
But that doesn’t answer my question. You could have somebody who outperforms another who is in the year below, but end up with a lower grade simply because of the achievement of their peers in the same age group. Exams should tell you how good you are against everybody else, not just compare you to others who happen to be the same age
Statistically you will not have such large fluctuations between years if the exams are competently designed (as they used to be). I think this is an imaginary problem. It certainly was – and should be.
The goal of the grading should be to sort the students into % achievement buckets and not to attempt to try to measure some absolute achievement level. The grades are principally to help decide who qualifies for which place at the next level of education. Inflate all the grades and you’re no longer able to do that quite so “fairly”.
In any case, the goal is to be objective and not “fair” (which can mean whatever people want it to mean and is never defined).
Yes. And objectively it would be fairer than what we have now with top marks given out to average students who are over-prepped for exams.
…it was fair because it didn’t assume that everybody was getting smarter every year, because there is no empirical evidence that we are…but it knew that the quality of exam setting WOULD definitely vary…because exams are set by people, with all kinds of quirks and foibles…
…so the aim was to place people in the right percentile within their age cohort, confident that the top ten percent each year would be about as smart as those who went before, and those who followed after.
Your example actually shows that it worked exactly as it should have done…in year 1, the top 10% got 90% because the exam was comparatively easy…the next year the top 10% only got 80%, because the exam was harder…but you still identified the top 10%, and accurately placed all the candidates according to comparative ability within their cohort.
The marks don’t matter, and in fact we were not even told what they were beyond a pretty wide range…but we did know where we stood against our peers…and how fitted we were for academic life…as did Universities, Colleges and Employers…
How on earth is that to be achieved with everybody getting 3 A*s?
Statistically you will not have such large fluctuations between years if the exams are competently designed (as they used to be). I think this is an imaginary problem. It certainly was – and should be.
The goal of the grading should be to sort the students into % achievement buckets and not to attempt to try to measure some absolute achievement level. The grades are principally to help decide who qualifies for which place at the next level of education. Inflate all the grades and you’re no longer able to do that quite so “fairly”.
In any case, the goal is to be objective and not “fair” (which can mean whatever people want it to mean and is never defined).
But how is that fair? Some years would require to you score 90% to be in the top
10% and acquire the top grade, whereas fluctuations the following year may mean you only need to score 80% for a top grade. You could end up with students a year apart where one scored higher than the other but received a lower grade
At many UK universities, more than 40% of undergraduates leave with a first class degree. It’s absurd.
And 50 years ago some of that 40% would have struggled to get good enough O levels to go on to 6th form education
And 50 years ago some of that 40% would have struggled to get good enough O levels to go on to 6th form education
This used to be the standard. Top 10% got the top grade, next 10% got the next etc.
At many UK universities, more than 40% of undergraduates leave with a first class degree. It’s absurd.
It’s hilariously naive to say so now, but when I was at school I believed that there was a fixed threshold for each grading. The first time I became aware that overall results were improving year on year it was a rather discombobulating experience.
As someone who has taught in secondary education and been a teacher trainer I can honestly say that exams and testing (as they exist in their current form) say nothing worthwhile about a pupil beyond whether they can memorize packets of information or formulate an answer in appropriate ‘exam’ language. In short, the process has become mindless. The education system needs a serious rethink and parents need to be more involved in educating their own children i.e. don’t leave it wholly up to school teachers.
Yes to parents being more involved: so give us an option to receive the funding that goes to the system for each child.
Yes to parents being more involved: so give us an option to receive the funding that goes to the system for each child.
As someone who has taught in secondary education and been a teacher trainer I can honestly say that exams and testing (as they exist in their current form) say nothing worthwhile about a pupil beyond whether they can memorize packets of information or formulate an answer in appropriate ‘exam’ language. In short, the process has become mindless. The education system needs a serious rethink and parents need to be more involved in educating their own children i.e. don’t leave it wholly up to school teachers.
All Teacher assessment/Course Work based assessment now lies face down in the dust – the LLMs are here. I should also mention the chances of detecting what was generated by software vs created by the candidate is pretty much zero, output of that sort cannot be watermarked. So all existing forms of assessment are now moot – and different skillsets now arise and will be successful – for a while. And a while after that, nothing any human does will constitute any value because the algorithms will outdo anything any humans (individually or as a group) are capable of. If you don’t believe me, look at Alphafold. And my pessimistic timeline for all this is: all of it, this decade, probably in under five years.
Welcome, you have entered the eye of the whirlwind.
How we adapt as individuals (and therefore as a species) to the threats we’re creating will depend on wisdom that can’t be taught, only acquired over a long period of time. The current education/exam systems are woefully inadequate in preparing young people with a baseline from which to start, especially as MH points out, where it’s manipulated and can lead to early disillusionment. But therein may lie the beginnings of wisdom.
There’s an interesting juxtaposition between the “flexible” employee required by corporations in today’s Plastic Woman article and the kind of self-realisation required to flourish in “the eye of the whirlwind”, or at least avoid being blown away.
‘….So all existing forms of assessment are now moot – and different skillsets now arise and will be successful – for a while.’ Yes, Prashant. Universities – if my local one is typical – now allow ‘take home’ exams done over a day with no restriction on Internet use or other recourse to outside sources. The argument given is that this replicates the real -world skill of using sources such as the Internet in producing a paper or report. However, as you say, this is a different skillset, and deprives assessors of the chance to find out how much a candidate actually knows, and whether the structure of an answer is one they have developed themselves as against one provided by the Internet or an AI bot. As you also say, this will only be successful ‘for a while’. Nations whose young people are assessed more rigorously, and whose memories and analytical skills have not been dulled by reliance on computer-provided texts, will overtake us, and the limitations of our school-leavers and graduates will be even more painfully exposed than at present.
Instructors give them a whole day to fill out a take home exam? Surely this is a waste of time when AI will take care of the exam in seconds. Perhaps the faculty are factoring in the time it takes students to travel to and from the university.
Instructors give them a whole day to fill out a take home exam? Surely this is a waste of time when AI will take care of the exam in seconds. Perhaps the faculty are factoring in the time it takes students to travel to and from the university.
Perceptive as usual PK. What do you think the effect will be on universities when it becomes impossible in most cases to distinguish student work from AI generated text and answers? Will there be any way to assess whether students have learned anything? I can imagine examinations in some STEM subjects might tell the tale if they are done by hand without computers nearby. As for the humanities, how can proficiency be tested without essays and persuasive papers? The classics are too good for the Academy; it doesn’t deserve them, so it doesn’t concern me if Lit Schools at universities suffer or close, but I do wonder what to be on the look-out for as things play out. Any ideas?
As things stand, there is a rabbit-in-the-headlights reaction across many many sectors, Education not least, and I expect chaos to ensue in short order.
The established model of how Education is delivered to young people was already under assault from the ease of getting away with internet based plagiarism (except at the edges), and this was combatted by software like Turnitin etc. for coursework, but the LLMs make it pretty much impossible to detect copying. Even in STEM, for example in software courses it is routine to ask candidates to create a large program as a project across many weeks, but the LLMs can now of course do most of the heavy lifting. I’m guessing that the providers of the most capable LLMs will now come under pressure to put watermarks in their output words such that checksums across sections of the output might allow black-box software to know what is software generated, but even assuming this is technically possible in truth it is trivially easy to solicit completely unique output from the LLMs to any question you care to ask them, and then manually tweak the output such that the checksums are invalid. Also, the proliferating explosion of very capable open-source LLMs means the genie is out of the lamp.
It should be very obvious that the entire sector now breaks and should be rebuilt differently from the ground up with an eye on not what is there now, but what’s coming. What I instead expect, is that they will tinker at every edge they can find, with the left and right coming up with solutions to match their political tastes. For example, I expect the right to push for a lot less coursework and a lot more closed book and verbal examinations. I expect the left to be dead against this and instead push for regulation of the tech around LLMs, no doubt with calls to nationalise Microsoft and Google.
Regardless, I have no doubt that the top end of governance for Education across the world, in both government and the sector itself is frantically debating how to react to the LLMs. (And if they aren’t then of course the whole lot of them should be taken round the back of the bicycle sheds and shot). What no one (outside of tecchies in AI research) seems to be projecting is the trajectory of capabilities gain. If the current rounds of LLMs can do what they do now, and literally tens of billions is being poured into AI research right now, what does everyone think the next LLM but three will be capable of?
Thanks for the detailed reply.
Thanks for the detailed reply.
As things stand, there is a rabbit-in-the-headlights reaction across many many sectors, Education not least, and I expect chaos to ensue in short order.
The established model of how Education is delivered to young people was already under assault from the ease of getting away with internet based plagiarism (except at the edges), and this was combatted by software like Turnitin etc. for coursework, but the LLMs make it pretty much impossible to detect copying. Even in STEM, for example in software courses it is routine to ask candidates to create a large program as a project across many weeks, but the LLMs can now of course do most of the heavy lifting. I’m guessing that the providers of the most capable LLMs will now come under pressure to put watermarks in their output words such that checksums across sections of the output might allow black-box software to know what is software generated, but even assuming this is technically possible in truth it is trivially easy to solicit completely unique output from the LLMs to any question you care to ask them, and then manually tweak the output such that the checksums are invalid. Also, the proliferating explosion of very capable open-source LLMs means the genie is out of the lamp.
It should be very obvious that the entire sector now breaks and should be rebuilt differently from the ground up with an eye on not what is there now, but what’s coming. What I instead expect, is that they will tinker at every edge they can find, with the left and right coming up with solutions to match their political tastes. For example, I expect the right to push for a lot less coursework and a lot more closed book and verbal examinations. I expect the left to be dead against this and instead push for regulation of the tech around LLMs, no doubt with calls to nationalise Microsoft and Google.
Regardless, I have no doubt that the top end of governance for Education across the world, in both government and the sector itself is frantically debating how to react to the LLMs. (And if they aren’t then of course the whole lot of them should be taken round the back of the bicycle sheds and shot). What no one (outside of tecchies in AI research) seems to be projecting is the trajectory of capabilities gain. If the current rounds of LLMs can do what they do now, and literally tens of billions is being poured into AI research right now, what does everyone think the next LLM but three will be capable of?
How we adapt as individuals (and therefore as a species) to the threats we’re creating will depend on wisdom that can’t be taught, only acquired over a long period of time. The current education/exam systems are woefully inadequate in preparing young people with a baseline from which to start, especially as MH points out, where it’s manipulated and can lead to early disillusionment. But therein may lie the beginnings of wisdom.
There’s an interesting juxtaposition between the “flexible” employee required by corporations in today’s Plastic Woman article and the kind of self-realisation required to flourish in “the eye of the whirlwind”, or at least avoid being blown away.
‘….So all existing forms of assessment are now moot – and different skillsets now arise and will be successful – for a while.’ Yes, Prashant. Universities – if my local one is typical – now allow ‘take home’ exams done over a day with no restriction on Internet use or other recourse to outside sources. The argument given is that this replicates the real -world skill of using sources such as the Internet in producing a paper or report. However, as you say, this is a different skillset, and deprives assessors of the chance to find out how much a candidate actually knows, and whether the structure of an answer is one they have developed themselves as against one provided by the Internet or an AI bot. As you also say, this will only be successful ‘for a while’. Nations whose young people are assessed more rigorously, and whose memories and analytical skills have not been dulled by reliance on computer-provided texts, will overtake us, and the limitations of our school-leavers and graduates will be even more painfully exposed than at present.
Perceptive as usual PK. What do you think the effect will be on universities when it becomes impossible in most cases to distinguish student work from AI generated text and answers? Will there be any way to assess whether students have learned anything? I can imagine examinations in some STEM subjects might tell the tale if they are done by hand without computers nearby. As for the humanities, how can proficiency be tested without essays and persuasive papers? The classics are too good for the Academy; it doesn’t deserve them, so it doesn’t concern me if Lit Schools at universities suffer or close, but I do wonder what to be on the look-out for as things play out. Any ideas?
All Teacher assessment/Course Work based assessment now lies face down in the dust – the LLMs are here. I should also mention the chances of detecting what was generated by software vs created by the candidate is pretty much zero, output of that sort cannot be watermarked. So all existing forms of assessment are now moot – and different skillsets now arise and will be successful – for a while. And a while after that, nothing any human does will constitute any value because the algorithms will outdo anything any humans (individually or as a group) are capable of. If you don’t believe me, look at Alphafold. And my pessimistic timeline for all this is: all of it, this decade, probably in under five years.
Welcome, you have entered the eye of the whirlwind.
There are deep implications to grading, how it should be used, and the responsibilities it entails.
As a principle, teachers should teach and should not be in charge of credentialling students or awarding diplomas, due to an obvious conflict of interest.
Then how examiners should grade students is another topic for debate, but the success thresholds should be somewhat simlar per year, location, school …
A complex but important topic indeed.
There are deep implications to grading, how it should be used, and the responsibilities it entails.
As a principle, teachers should teach and should not be in charge of credentialling students or awarding diplomas, due to an obvious conflict of interest.
Then how examiners should grade students is another topic for debate, but the success thresholds should be somewhat simlar per year, location, school …
A complex but important topic indeed.
Yes, but what is the solution? The latest problem started with COVID, are we supposed to carry on like everything is fine?
Yes, but what is the solution? The latest problem started with COVID, are we supposed to carry on like everything is fine?
Comment deleted
Since when was it the job of the Government to direct how many people should be awarded which exam grades?
In the blue corner, there is an alliance of those who abominate above all else the very thought of the schools attended by 94 per cent of the population, and those who will never be satisfied with anything short of confirmation that they had been cheated of the glittering lives to which their obvious genius ought to have entitled them.
And over in the shocking (not pale) pink corner, there are those by whom working-class pupils are twice as likely to be predicted an E grade, and by whom black pupils’ grades are staggeringly under-predicted, with only 39 per cent of predictions turning out to have been correct, while boys are also endemically ill-served.
OMG! So huge bureaucracies continue to exist because the people controlling the money get what they want?
I am just astounded!
Wait, sorry, it reminds me of my schooling, and that was a very long time ago.
Its long past time to give the money to the parents and have schools compete for their funding.