September 24, 2024 - 10:20am
In both the 2016 and 2020 presidential elections, pollsters underestimated Donald Trump’s level of support. Using 538‘s election day polling averages, the Democratic candidate’s national lead over Trump was 3.9 points lower than the polls predicted in 2020 and 1.8 points lower in 2016. This pattern looks set to continue with Kamala Harris in 2024, and part of the issue stems from how pollsters estimate turnout.
Social desirability bias is a well-studied phenomenon in the field of research. In simple terms, it relates to the tendency of people to provide survey responses that may be viewed more favourably by others, such as whether they voted in an election or whether they give money to charity.
Given the socially desirable impact of electoral participation, voters are — generally speaking — not particularly good at assessing their own likelihood of voting. Surveys which simply rely on self-reported turnout may therefore be subject to an added degree of bias in their results.
To combat this problem, Focaldata devised a turnout model to estimate the likelihood of a respondent in our US election surveys actually voting, rather than simply relying on their own estimation. To create it, we used the 2020 Cooperative Election Study (CCES) panel of 60,000 respondents, which were matched to the voter file to determine whether each respondent actually voted in the election.
The CCES allows us to determine whether a person’s self-declared likelihood to vote reflects their subsequent turnout. On the surface, a reasonable estimate for a pollster might be to assign “certain” voters a 100% likelihood of voting, “probable” voters somewhere around 75%, undecided voters a 50-50 chance, and “would not vote” 0%. In reality, these figures do not correspond with actual voting behaviour.
Respondents overestimate their likelihood of voting in US election |
Self-reported likelihood vs validated turnout in 2020 |
|
|
In 2020, over a quarter — 27% — of people who said they were certain to vote, going to vote early, or had already voted, did not actually vote in the presidential election. Even more strikingly, those who said they would “probably” vote only turned out 23% of the time. In addition, a respondent saying they will not vote does not entirely preclude them from voting — 5% of those who said they wouldn’t vote actually did.
A respondent’s self-declared likelihood is important, but it should not be the sole factor in a turnout prediction in an opinion poll. Some pollsters do not even assess likelihood of voting, instead relying solely on registered voters to generate their headline results. Implicitly, a registered voter poll assumes every voter has the same probability of voting — provided they are registered — which we know empirically is not the case.
If rates of overstating turnout were similar across different demographic groups, the turnout weighting problem for pollsters would be quite small and its effects would mostly cancel each other out. However, there are significant differences in reported versus actual behaviour by age group and education level, making the problem significantly larger.
Consider voters under 35. In 2020, young voters who said they were “definitely” going to vote only voted around half the time. Among those aged over 65, the figure shoots up to 85%. Similarly, those with high levels of education are much more likely to correctly assess their probability of voting. 80% of “definite” voters with postgraduate degrees turned out, and just 1% who said they wouldn’t vote ended up voting. In contrast, only 63% of self-declared “definites” who didn’t graduate high school voted, and 5% who said they wouldn’t vote did.
If we were to simply assume “definitely” means the same thing across different groups, we would end up with poll results too heavily skewed towards the views of younger, non-white and lower-education voters. Two of these three groups lean heavily towards the Democrats, partially explaining why the party's candidate has been overestimated in the polls at the last two presidential elections.
Using a sophisticated turnout model which takes into account the effects of self-reported likelihood to vote — alongside other demographic factors such as age, race, education and political interest — reduces Harris’s lead over Trump by an average of 2.4 percentage points in our latest wave of swing state polls. In an election which could be decided by just 60,000 voters in November, this margin could easily be the difference between a right and wrong call on the election winner. Pollsters who simply rely on self-reporting may be subject to another polling miss in Trump’s favour.
This is an edited version of an article originally published by Focaldata.
Join the discussion
Join like minded readers that support our journalism by becoming a paid subscriber
To join the discussion in the comments, become a paid subscriber.
Join like minded readers that support our journalism, read unlimited articles and enjoy other subscriber-only benefits.
SubscribeNice article, bringing up some important issues.
In more general terms, the corrections employed by (honest) pollsters to correct biases in the data all suffer from a known defect which haunts data analysts in general: non-linearity.
Data relationships are often – maybe even always – non-linear, but when we try to analyse, we must impose linearity, ‘bending’ the underlying relationship to fit our models. This creates error.
Next comes the error ‘correction’. The problem here is that this correction introduces another kind of bias. The analyst ‘knows’ the data is wrong, but doesn’t know in what way, and so much make assumptions about the direction of correction. That, in turn, is based on his a priori assumptions.
Good article and great comment. I have read comments like yours’ a few times and I always have yet another question: do the polls affect the way people vote?
In theory, in a two-horse race the polls should have no effect. But consider the following scenario: if somebody aged 21 doesn’t know whether to vote or not and, anyway, wouldn’t know who to vote for, reads many articles saying that 20-year olds are supporting Harris – would that possibly gain a vote for Harris?
Polls assume that most people know who they will vote for, but I suspect that younger people are not so certain.
I would have though that with tools like machine learning we’re well beyond the point where we have to limit ourselves to assumptions about linearity in modelling. Indeed, it is no longer necessary to analyse and understand relationships in order to make predictions from patterns of data.
The effect discussed in the article – different rates of voting for people who claim they will definitely vote – can hardly be a new one (interesting though it certainly is).
So I’m sceptical that we should expect a significant polling error (let’s say above 1%) for this reason. That’s not to say there may not be other errors and biases in the polling and reporting of polling.
The world is full of non-linear functions that have been used to fit all kinds of data. The polls were generally right in 2016, or within the margin of error. They were right in 2020 also: Biden was predicted to win all summer long by almost every poll.
But Biden was predicted to win by a far larger margin, that’s one of the points here.
I’m not sure what you are saying here. What is nonlinear? How does error correction involve bias?
There’s another kind of bias in addition to social desirabilty bias….and it is one that is endemic to all kinds of surveys. It can best be illustrated this way: Here’s a survey question….Are you the kind of person who would answer a survey?
That’s my business, thank you.
And a third kind: deliberate bias. I knew several guys who hate polls but will gladly do them so they can give deliberately misleading responses.
He lost the popular vote by 3 million in 2016 and 8 million in 2020. He’s the greatest turnout machine the Democrats have ever had. He carried the states he needed to win in 2016 by less than the number of people that would fill a very large football stadium.
Trump is an intensely unpopular candidate except to his band of followers, of which I am one. We are slowly succumbing to the tides of time. There aren’t enough of us left to elect him.
It’s not over ‘til it’s over. Have faith. And let’s hope Kamala keeps emitting her ‘word clouds’.
What does the popular vote have to do with anything? I would expect that the Democrats would win every presidential election if we had majority voting since two of the five most populous US states, New York and California, are “blue” and together comprise roughly 1/5 of the US population. There’s a reason Democrats are always complaining about the electoral college. They’d like it if sparsely-populated states, which tend to be “red,” had no influence in national elections at all since they traditionally vote against Democrats.
I think the ironic part about this is that Democrat policies have forced sizeable portions of blue state residents into red states, like Texas and Florida, which now have some of the largest populations in the country. I believe California lost a seat or two in the House in the last couple of years. If the popular vote starts swinging towards Republican candidates as a result of this population drain from mismanaged, crime-ridden, high-tax blue states to traditionally red states, I predict the left-wing tune on the electoral college will change.
This is why the electoral college is genius. It moderates the highly populated colonies of the East and West coast who feed upon themselves to preserve a vote Democrat. The coastal cities should not be able to control the heartland.
There is hope, Montana which has been Democtratic for many years, swung Republican in 2016. On the other hand Arizona swung back to Democratic in 2020 because of the inordinate influence of Hispanics in Maricopa county.
The only poll that matters is the one on Election Day.
I now openly tell charities that I give people money to buy drugs with.
Since that seems a lesser evil than bankrolling the employment of that cadre of English middle-upper class children too thick to get into the Civil Service.
As somebody commented on the Spectator “US bureaucracy are behaving as tyrants, replete with “Who shall rid me of this meddlesome priest””
Today Rasmussen came out with interesting polling. Also AtlasIntel published some of its polling numbers at the weekend. Both were considered the most accurate in the last two election and both predict that Trump leads or is tied. Rasmussen actually said that in some of the Sunbelt States he is up to 5% ahead.
We are at the point of legitimately asking if Trump will be allowed to win. Let that roll around a bit. Allowed. As if that’s how representative govt functions. Left unspoken is allowed by whom. I’m trying to imagine if any former president or presidential candidate could have two attempts on his life and so little institutional interest in getting to the bottom of it.
It shines a light on the CIA and FBI who judging by the incompetence at the Butler event, Hillary’s emails and Hunter’s laptop brings their neutrality into question. Aiming to nick Trump’s ear is a hell of a shot, an unlikely feat by that shooter.
Soon, if they haven’t already, they will stop just saying “who will rid us of this turbulent President”.
Are the polls underestimating Trump — again?I hope so!
Half an hour on Youtube won’t find you much support for KH. Youtube is over judicious and left leaning but still lets the dominant Red States be shown. The actuality lies with the Electoral College votes. Headcounts make as little difference to the result as they do here but for different reasons. 16 million here didn’t want Starmer and yet unfortunately here we are .
More on Social-desirability bias.
https://en.wikipedia.org/wiki/Social-desirability_bias
I suspect not this time.
In my opinion, there is no reason anyone should vote for Kamala and Tim. They are both empty vessels. If they win this election the USA will become a vacuous has been nation. Who votes for this type of political slate?
Interesting insight into those who actually vote. I suspect there are two primary forces at play. First, there is the “herd” mentality by which undecided voters may vote for the candidate leading the polls as they want to vote for the winner. Second, when a candidate has an unsurmoutnable lead, A voter may not vote for their candidate as they feel their vote has no impact.It makes you wonder if polls should exist and should be regulated for acuracy. Only by polling the same individuals pre and post will there be some accurate data. Of course another bias then enters. If a voter knows they will be polled post election, they may feel pressure to vote.