Saturday, September 28, 2013

Population Density compared to Partisan Lean

 So a website called "the atlantic cities" has a very interesting article about how the closer you live with your neighbors, the more left-leaning you're likely to be, and vice versa. What really raises my ire about their article, however, is the absolutely terrible plot they made published, which is a scatter-plot of congressional districts. Someone just spat the data out into Excel and went with the first image they could make. They did change the y-axis to a logarithmic scale, but they didn't change the fit to to match it! Here's my (much better) plot with the same data (or more accurate? their sources are slightly unclear).

 As you can see (because I literally spelled it out at the bottom), there's a significant correlation between population density and partisan lean. Every time you double the population density, the district is about 8.6 points more democratic. I made a similar histogram-type plot of the districts, sorting them into bins by density and reporting the average partisan lean of all congressional districts with similar densities (circle size is related to how many districts that circle represents).

As you can see, most districts are clustered between 100 and 10,000 people per square mile. If you examine the two trendlines I've drawn, the blue one represents how we should expect to find the districts, and the red one shows how it actually bends significantly. This demonstrates the level of gerrymandering that republicans have accomplished, maximizing the number of districts that are republican, even if just slightly, while shoving all democrats in their states into a few districts that are heavily democratic. This is how Republicans currently control the House of Representatives, despite receiving several million fewer votes in the House than Democrats received. 

(For more on the current Gerrymandered state of many districts, try this fun jigsaw-style quiz! It shows very well how ridiculously these districts' shapes have been contorted to skew the makeup of the House so much)

Tuesday, September 10, 2013

Fantasy Football week 1 analysis

Well week 1 of Fantasy Football is over, and my team did surprisingly poorly. Instead of a projected 124 points, they earned me a paltry 64 points. Thanks to Kyle Rudolf (2 points) and Lamar Miller (0 points), but I'm afraid the game ball goes to David Wilson, who earned me a stupendous -3 points. So ESPN's projects of how many points each player would get me were hilariously bad. But how bad? A quick correlation coefficient between predicted and actual points (including my bench) gives a correlation coefficient of -0.66, which means that if ESPN says that one player will do better than average, they will probably do worse, and vice versa. At least for my team during week 1.

Another way to quantify how bad ESPN's projections were would be ... simulations! I ran 100 million simulations, where I randomly chose 1/2 QBs, 2/5 RBs, 2/5 WRs, 1/2 TE's, and one flex from the remaining RB+WR+TE pool. (I only have one kicker and defense on my team).
First of all, apologies for the MS excel plot. The average score was 59.1, and the most common value was 59, so pretty symmetric. So benching the players ESPN told me to gave me a score better than 70% of random trials, but because of the fairly narrow peak, that was only worth 5 points (the y-axis is log-scale). This is not very good at all, when I could have gotten 84 points with this roster if I had chosen correctly (this still would have lost my match-up).

Look for more delicious simulations in weeks to come!

Saturday, January 5, 2013

Matt's probability question

A quick post since I don't have access right now to my beautiful plotting software.

My friend Matt asks a question: if I have 12 cards labelled 1-12, and each day I draw 3 without replacement, record which ones I've drawn, and then shuffle them back in for the next day, how many days should I expect to draw cards before I have drawn all 12?

Instead of doing actual math, I just used MATLAB and ran one million simulations (a nice round number). First I'll plot the probability of having drawn all 12 cards by day X

You can see that we cross 50% somewhere between day 11 and 12. Specifically, 46.4% of simulations were done by day 11, and 57.1% were done by day 12; Interpolating gives about 11.3. Even though I'm too lazy to do stat math for you, we can see the form of the equation if we plot the probability to not be done yet versus number of days (below). In a semilog (y) plot, it forms a straight line after about day 10.

Monday, December 10, 2012

United States of Moochers: Red vs Blue states

It's been a long campaign season, so I'm sure the first thing everyone wants to see is some extensive, in-depth political research! Some of you might remember an interesting figure that went around the internet a few years back. It sorts all US states into two columns, net contributors to the federal government vs. net takers; and two colors, red for republican and blue for democrat states. The conclusion is stark: republican states take more than they give to the federal budget, and democratic states give more than they take. But I thought the binary decision for each state (red or blue, giver or taker) was a bit simplistic, and it seems like it used just one snapshot of America (2004), so I did my own research. I gathered as much data as I could on the subject (sources were Wikipedia and the First, the normalized vote margins in the last 5 presidential elections (separated into colors at margin values of +/- 4% and +/-15%). Then the amount of money the federal government spends on each state, divided by the amount that state contributes, for the years 1981-2005, to get our "Mooching Factor".
Let red states secede if they want - that would solve our budget deficit instantly!
These results are also shown on this US map, where "Giver" states are given their normal red, blue, or purple, while the "Moocher" states are assigned the less-dignified colors of pink, cyan, and yellow.
You can clearly see that only 3/25 red states are givers (12%), while 11/16 blue states are givers (69%).  In fact, seven red states are bigger moochers than the worst blue state. But they say correlation (in this case 0.2, which is pretty weak) does not indicate causation. My first thought is that relative poverty rates in each state will be a determining factor. A state with richer people contributes more in taxes but takes less for social programs, right?
This explains part of the overall trend: red states tend to have higher poverty rates than blue states, so naturally they would be taking more money for social benefits while contributing less from taxes. But we see that all 10/10 (a shameful 100%) of "rich" red states still take more than they give, while only 4/13 (17%) of "rich" blue states do. Depressingly, poverty is less an indicator of whether a state is a giver or taker (0.12 correlation) than political lean (0.20). In the background you can see an aggregated "Redland" and "Blueland" (I didn't worry about "Purpleland"). We see that red states are significantly more impoverished, even though they have been receiving a "stimulus package" from blue states for at least 30 years running. But also interesting are the trends within Redland, where poorer red states don't necessarily take more than richer red states (the same is true for Blueland). It really looks like red states, not poor states, are inherently takers.

Another hypothesis is that each representative for a state is like a pig at the Federal Trough, grabbing as much money for their constituents as every other pig. That means that less-populous states, which have the same number of senators as big states, will have more congressional influence per capita, and therefore more federal money. I define "congressional influence" as the fraction of the House of Representatives that a state controls plus the fraction of the Senate that each state controls (this assumes both chambers of Congress are equal in budgetary power). In the plot below you can compare a state's congressional influence to its population by comparing the areas of the outer and inner circle; we see that for example, citizens of Wyoming have more than 10 times the congressional influence per capita as citizens of California.
It's evident that congressional influence is a large factor. Notably, each of the five most underrepresented states, regardless of political lean, all give more than they contribute. Over-represented red states are more likely to take more (all 18/18), while over-represented blue states are split evenly between givers and takers (5/10). This plot is perhaps the most damning of all for Republicans: it suggests that the only reason that any red states contribute more than they take is just because they don't have the congressional influence to grab more money from the Federal Trough, while blue states exercise fiscal restraint, even when they have the congressional influence to grab more money. Again, the implications are clear: Republican politicians greedily rake in as much money as they can for their states, while Democratic politicians govern toward some other goal, perhaps "the best interest of the country"? In the background of the figure we again see "Redland" and "Blueland", where Blueland has more people but less congressional influence, and therefore pays tribute every year to Redland. In fact, each citizen of Redland has 26.4% more congressional influence than a citizen of Blueland, which corresponds quite closely to their 26.4% higher Mooch Factor.

Red States Blue States
total moochers 88% (22/25) 31% (5/16)
fraction of poor states that are moochers86% (12/14)0% (0/2)
fraction of rich states that are moochers100% (10/10)31% (4/13)
fraction of under-represented states that are moochers 57% (4/7) 0% (0/6)
fraction of over-represented states that are moochers 100% (18/18) 50% (5/10)
Federal money spent/contributed ("Mooch Factor") 1.16 0.91
poverty rate 14.3% 11.7%
US population fraction 39% 41%
fraction of congress ("congressional influence") 44% 37%

Aren't Republicans supposed to be fiscally-responsible small-government advocates? If blue states are taking less but still have lower poverty for 30 years now, perhaps their governing model is more successful: social services to people in need, rather than trickle-down Reaganomics for the wealthy.

Thursday, August 16, 2012

Too lazy to "Occupy"? Hit the ATM.

When the "Occupy" movement first started, I felt like there were some legitimate claims buried in somewhat incoherent message. To me, the most compelling complaint is related to the increasing separation of wealth, how "the rich get richer". For example, since Reagan took office, the increase in after-tax income has leapt significantly for the richest Americans (much of which can be explained by Reagan slashing taxes for the richest Americans), while rising only modestly for the bottom 80%.

Increase in After-Tax Income by Income Group 1979-2007
Source: Congressional Budget Office

What is causing this increasing separation of wealth? Why are the rich getting way, way richer, while everyone else is making only modest gains? Well that growth in the top 1% starting from 2002, which as you can see is not reflected among the poorer 99%, corresponds roughly with the Bush Tax Cuts for the wealthy. It just seems like a shameful state of affairs when companies consider the "Return On Investment" for lobbyists and campaign contributions. The wealthy spend some of their money on influencing politicians, who devise laws that benefit the wealthy at the expense of everyone else. Everybody wins!

But I didn't really want to talk politics too much today. I guess it's just the little things that bother me. The banks offer you and me 1% cash back for using their credit cards, but they charge the vendor 3%, which the vendor turns around and charges us, through increased prices, even for those of us who use cash. In fact it's against the law to charge a higher price for consumers who use credit cards; guess who wrote that law? So we're stuck in a cycle where the banks make 3% on every transaction, for doing almost nothing.

Now, when the Occupiers started Occupying, I figured "I have a job, I don't have time to stand around complaining all day." But now I can see one small way we can all support income equality, without quitting our day jobs: visit the ATM. The bank earns nothing on cash transactions. When you use your credit card for $100, you are basically hiring the bank to walk over to the ATM, withdraw $100, and give it to the cashier, and you are paying $3 for this service. If instead we all visit the ATM once a week and pay most of our transactions in cash, we save that money, resulting in lowered prices for consumers and higher revenues for business which actually produce economic value. For a person making the median personal income in the USA, $32,000, who spends 30% of their income through their credit card, they are paying almost $300 per year to the banks.

If you want to combat the growing wealth disparity in the USA, and help ensure that less money is paid to companies that don't actually produce any economic value, hit the ATM once a week. 

Monday, June 18, 2012

A closer look at FiveThirtyEight's Presidential Election Simulator

I've always been a fan of the presidential election simulator they've developed over at FiveThirtyEight. Essentially they use a pretty complicated model to give a prediction of which candidate will win in which state, and then run some simulations to predict which candidate will win the overall election. It seems like most of the work goes into re-calibrating the results of polls and combining some pretty complicated factors that I won't go into here (since I'm not too familiar with their methodology on that level), but what they've done with that second part is what I'd like to suggest an improvement on. They use their current estimate about which candidate will win each state, then run simulations to predict the overall winner. The histogram from their website looks like this:
Based on my experience with simulations, you shouldn't normally have a few peaks much taller than the rest; this is a symptom of an illness I like to call Not-Enough-Simulations. If they just run more iterations, they'll get a much nicer and smoother curve. To demonstrate, I went through and grabbed their current prediction for the odds that Pres. Obama wins each state or district (for ME, NB, and DC), and just ran some simulations where for each state I randomly choose who wins, weighted by those odds, and tabulate the totals. I want to point out here that the model at 538 has some extra features which mine doesn't have, like how states don't vote independently, regional and economic influences, poll movements, and others, I'm sure. So this is purely for demonstrative purposes. Anyways, for three hundred simulated elections, this is what it looks like:

Does that look familiar? For reference, in my code that I just knocked together in a few minutes, that takes way less than one second. For one billion runs (which admittedly takes a few minutes), this is what I get:

It's a nearly perfect, beautiful bell curve. This represents the true result of their statistics. Another result of running more simulations is that the odds of who wins the overall elections is measured more accurately, and the number changes a little bit. Their "now-cast" function, which is what I'm actually mimicking, predicts what would happen if the election were held today. It gives Obama a 64.7% chance of winning. Instead, it should be a 64.2% chance, if you collect enough data. That's a small difference, but here's one more. They also calculate the chance of a 269-269 tie in the EC, which they give as 0.6%. But with less-noisy data, it's actually 1.5%, more than twice as likely.

It's a shame to see all the hard work the people at FiveThirtyEight put into their model at the state-level, just to have it under-sold with such a simple bug in their nation-wide model. They need more simulations!

Friday, January 20, 2012

Our broken Electoral System

When Americans elect a president every 4 years, the method we use is actually pretty strange when you stop to think about it:

1) Every state gets a number of votes equal to their number of representatives plus two. These are called "electoral votes".
2) 48 of the 50 states use a winner-takes-all system, where whichever presidential candidate gets the most votes in that state gets ALL the electoral votes of that state. The other two states use an adaptation of that method, where each candidate gets an electoral vote for each congressional district they win, plus two more for winning the overall state popular vote.

Electoral College for the year 2000

A notable side-effect of this policy is that someone can become President of the United States while losing the popular vote. This has happened 4 times out of 55 US presidential elections, or 7% of the time. Maybe that seems like an acceptably small fraction to you, but consider that there are also cases where it was very close to happening, like in 2004: Bush II had about 3,500,000 more nationwide votes than Kerry, but if 60,000 Bush voters had changed their minds and voted for Kerry in just one state (Ohio), he would have become the president. In the last 60 years, a "close" election like this, where fewer than 60,000 voters could've made the wrong man President, has come close to happening 6 times, meaning that 6/15 or 40% of recent elections were problematic.

For fun, I've taken the liberty of running some simulations. Each state is given its share of electoral votes as of the 2000 census, I specify the national popular vote totals and give each state its own vote total, normally distributed about the national mean, with a standard deviation taken from the last three presidential elections (about 11% each time). Then I check to see if the national popular vote winner is also the electoral college winner.

For an example election that's 48/52 (i.e. a 4% margin for one candidate), I ran this simulation 1,000,000 times, and here are the EV results:

We see that in more than 10% of the runs, the national popular vote winner does not become the president. Repeating this process for a collection of margins, I find the probability of the "wrong president" vs. national popular vote margin:

I also show the last eight elections as vertical lines on the bottom, highlighting in red the one that gave us the "wrong" person. Statistically speaking, we should have seen on average 1.3 "wrong" presidents in the past 8 elections. Reality, however, is constrained to integers in this case, so it's really no huge anomaly that we got 1 error out of 8. What's surprising to me is how astonishingly poor this system is at electing the popular vote winner to the presidency. With a national popular vote margin of 4% we get an error of 10%. With a margin of 1% we get an error of 37%. For margins smaller than 1% we may as well flip a coin, even though 1% represents more than 3,000,000 Americans.

Raw data is tabulated below. For reference, the margins of the last 8 elections ranged between 0.5% and 10%. The real miracle here is that we have had only four wrongly-elected presidents out of 55!

Popular Vote Margin
Probability of Wrong President