Monday, June 18, 2012

A closer look at FiveThirtyEight's Presidential Election Simulator

I've always been a fan of the presidential election simulator they've developed over at FiveThirtyEight. Essentially they use a pretty complicated model to give a prediction of which candidate will win in which state, and then run some simulations to predict which candidate will win the overall election. It seems like most of the work goes into re-calibrating the results of polls and combining some pretty complicated factors that I won't go into here (since I'm not too familiar with their methodology on that level), but what they've done with that second part is what I'd like to suggest an improvement on. They use their current estimate about which candidate will win each state, then run simulations to predict the overall winner. The histogram from their website looks like this:
Based on my experience with simulations, you shouldn't normally have a few peaks much taller than the rest; this is a symptom of an illness I like to call Not-Enough-Simulations. If they just run more iterations, they'll get a much nicer and smoother curve. To demonstrate, I went through and grabbed their current prediction for the odds that Pres. Obama wins each state or district (for ME, NB, and DC), and just ran some simulations where for each state I randomly choose who wins, weighted by those odds, and tabulate the totals. I want to point out here that the model at 538 has some extra features which mine doesn't have, like how states don't vote independently, regional and economic influences, poll movements, and others, I'm sure. So this is purely for demonstrative purposes. Anyways, for three hundred simulated elections, this is what it looks like:



Does that look familiar? For reference, in my code that I just knocked together in a few minutes, that takes way less than one second. For one billion runs (which admittedly takes a few minutes), this is what I get:


It's a nearly perfect, beautiful bell curve. This represents the true result of their statistics. Another result of running more simulations is that the odds of who wins the overall elections is measured more accurately, and the number changes a little bit. Their "now-cast" function, which is what I'm actually mimicking, predicts what would happen if the election were held today. It gives Obama a 64.7% chance of winning. Instead, it should be a 64.2% chance, if you collect enough data. That's a small difference, but here's one more. They also calculate the chance of a 269-269 tie in the EC, which they give as 0.6%. But with less-noisy data, it's actually 1.5%, more than twice as likely.

It's a shame to see all the hard work the people at FiveThirtyEight put into their model at the state-level, just to have it under-sold with such a simple bug in their nation-wide model. They need more simulations!