site banner

Nate Silver: The model exactly predicted the most likely election map

natesilver.net

Key excerpt (But it's worth reading the full thing):

But the real value-add of the model is not just in calculating who’s ahead in the polling average. Rather, it’s in understanding the uncertainties in the data: how accurate polls are in practice, and how these errors are correlated between the states. The final margins on Tuesday were actually quite close to the polling averages in the swing states, though less so in blue states, as I’ll discuss in a moment. But this was more or less a textbook illustration of the normal-sized polling error that we frequently wrote about [paid only; basically says that the polling errors could be correlated be correlated between states]. When polls miss low on Trump in one key state, they probably also will in most or all of the others.

In fact, because polling errors are highly correlated between states — and because Trump was ahead in 5 of the 7 swing states anyway — a Trump sweep of the swing states was actually our most common scenario, occurring in 20 percent of simulations. Following the same logic, the second most common outcome, happening 14 percent of the time, was a Harris swing state sweep.6

[Interactive table]

Relatedly, the final Electoral College tally will be 312 electoral votes for Trump and 226 for Harris. And Trump @ 312 was by far the most common outcome in our simulations, occurring 6 percent of the time. In fact, Trump 312/Harris 226 is the huge spike you see in our electoral vote distribution chart:

[Interactive graph]

The difference between 20 percent (the share of times Trump won all 7 swing states) and 6 percent (his getting exactly 312 electoral votes) is because sometimes, Trump winning all the swing states was part of a complete landslide where he penetrated further into blue territory. Conditional on winning all 7 swing states, for instance, Trump had a 22 percent chance of also winning New Mexico, a 21 percent chance at Minnesota, 19 percent in New Hampshire, 16 percent in Maine, 11 percent in Nebraska’s 2nd Congressional District, and 10 percent in Virginia. Trump won more than 312 electoral votes in 16 percent of our simulations.

But on Tuesday, there weren’t any upsets in the other states. So not only did Trump win with exactly 312 electoral votes, he also won with the exact map that occurred most often in our simulations, counting all 50 states, the District of Columbia and the congressional districts in Nebraska and Maine.

I don't know of an intuitive test for whether a forecast of a non-repeating event was well-reasoned (see, also, the lively debate over the performance of prediction markets), but this is Silver's initial defense of his 50-50 forecast. I'm unconvinced - if the modal outcome of the model was the actual result of the election, does that vindicate its internal correlations, indict its confidence in its output, both, neither... ? But I don't think it's irreconcilable that the model's modal outcome being real vindicates its internal correlations AND that its certainty was limited by the quality of the available data, so this hasn't lowered my opinion of Silver, either.

9
Jump in the discussion.

No email address required.

The polls did better this time than 2016 and 2020. At least, in general.

The controversy about polls starts in 2016. I think this is worth emphasizing, because there are still arguments floating around that the polls in 2016 were fine. And thus every subsequent argument about polls is really a proxy war over 2016. Because 8 years later we're still talking about Trump, we're still discussing how the polls over- or under-estimate Trump. We're still discussing how the polls do or don't measure white rural voters.

In 2016 the polls were entirely wrong. For months they predicted Hillary winning by a large margin blowout, sometimes by 10+ points. (I remember sitting in class listening to a friend excitedly gossip about Texas flipping blue.) Toward election day itself, the polls converged, but still comfortably for Hillary. And when Trump won, and the argument came around that the results were technically within the margin of error -- it missed entirely that whole states were modeled vastly incorrectly. The blue wall states of Pennsylvania Wisconsin and Michigan were not supposed to have gone red. Florida was supposed to have been close. States that had once been swing states were not even close. (To. me, this was the smoking gun that Trump had a real chance in 2016: Iowa and Ohio were solidly predicted for Trump from the very beginning, and no one offered any introspection on what that implied as a general swing.)

2020 was not much better. Without getting into claims about fraud and states: Biden was also supposed to win by larger margins than many states in fact showed. There were still lots of specific misses (like Florida redding hard). And again a series of justifications that polling did just fine because, technically, everything was inside some margin of error.

2024 is actually much better. AtlasIntel and Polymarket both broadly predicted exactly what happened. Rasmussen was fairly accurate (after taking a break in 2020 if I remember correctly). There's also a lot of slop. Selzer's reputation is destroyed (actually people may forget all about it by 2028). The RCP national average was off by a few points. Ipsos and NPR and Morning Consult and the Times were all wrong. Well, maybe that's not much better than 2020 -- but mixed in with all the bad data were predictors who got everything exactly right.

So Nate Silver's problem is that his method is junk. He takes some averages and models them out. The problem is that a lot of the data he relies on is bad. A lot of the polling industry is still wrong. And unless Silver is willing to stake a lot of expertise on highly specific questions about counties and polls, he can't offer all that much insight.

So Nate Silver's problem is that his method is junk. He takes some averages and models them out. The problem is that a lot of the data he relies on is bad.

I’m more sympathetic to the pollsters than I am to Nate. The pollster’s job is to poll people using a reasonable methodology and report the data, not to make predictions. They can’t just arbitrarily add Trump +3 to their sample because they think they didn’t capture enough Trump voters in their samples.

Nate’s job is explicitly to build a model that predicts things. He can legitimately adjust for things like industry polling bias. He doesn’t because he’s bad at his job.

don't the pollers have some degree of freedom because they sample based on demographics and not purely random. presumably they use this to perform adjustments. i also assume they poll the chance of the person voting as well and don't just make that number up.

They try but fundamentally, IMO, it’s a good idea to separate data collection and model building.