Monday, June 18, 2012

A closer look at FiveThirtyEight's Presidential Election Simulator

I've always been a fan of the presidential election simulator they've developed over at FiveThirtyEight. Essentially they use a pretty complicated model to give a prediction of which candidate will win in which state, and then run some simulations to predict which candidate will win the overall election. It seems like most of the work goes into re-calibrating the results of polls and combining some pretty complicated factors that I won't go into here (since I'm not too familiar with their methodology on that level), but what they've done with that second part is what I'd like to suggest an improvement on. They use their current estimate about which candidate will win each state, then run simulations to predict the overall winner. The histogram from their website looks like this:
Based on my experience with simulations, you shouldn't normally have a few peaks much taller than the rest; this is a symptom of an illness I like to call Not-Enough-Simulations. If they just run more iterations, they'll get a much nicer and smoother curve. To demonstrate, I went through and grabbed their current prediction for the odds that Pres. Obama wins each state or district (for ME, NB, and DC), and just ran some simulations where for each state I randomly choose who wins, weighted by those odds, and tabulate the totals. I want to point out here that the model at 538 has some extra features which mine doesn't have, like how states don't vote independently, regional and economic influences, poll movements, and others, I'm sure. So this is purely for demonstrative purposes. Anyways, for three hundred simulated elections, this is what it looks like:



Does that look familiar? For reference, in my code that I just knocked together in a few minutes, that takes way less than one second. For one billion runs (which admittedly takes a few minutes), this is what I get:


It's a nearly perfect, beautiful bell curve. This represents the true result of their statistics. Another result of running more simulations is that the odds of who wins the overall elections is measured more accurately, and the number changes a little bit. Their "now-cast" function, which is what I'm actually mimicking, predicts what would happen if the election were held today. It gives Obama a 64.7% chance of winning. Instead, it should be a 64.2% chance, if you collect enough data. That's a small difference, but here's one more. They also calculate the chance of a 269-269 tie in the EC, which they give as 0.6%. But with less-noisy data, it's actually 1.5%, more than twice as likely.

It's a shame to see all the hard work the people at FiveThirtyEight put into their model at the state-level, just to have it under-sold with such a simple bug in their nation-wide model. They need more simulations!

2 comments:

  1. You have to be doing it wrong. The reason you shouldn't expect a bell curve here is that there are really only a finite number of realistic maps. You should expect a big spike at exactly 303 where Romney wins FL and Obama wins VA, OH, WI, IA, NV, NH, PA, and CO; then another spike at 290 where VA flips to Romney, then another at 285 where Romney wins OH but loses VA, then another at 272 where Romney wins FL, OH, and VA etc. etc. You are probably assuming independence from state to state which is obviously a horrible assumption.

    ReplyDelete
    Replies
    1. True, as I said I made the assumption that states will vote independently; it hadn't occurred to me that 538 would account for that, I guess because it wasn't immediately obvious to me how one could.

      In my model, romney-leaning AZ might go blue while obama-leaning NM goes red, during the same election simulation. I will re-think these simulations.

      Incidentally, the 303 EV for obama can also be achieved if Obama wins VA, OH, WI, IA, NV, NH, and FL, since PA+CO=FL, so there are two very reasonable ways to get exactly the same number. Maybe that's why that one spike is more than twice the size of any other.

      Delete