Tuesday, May 10, 2016

Baseball, final exams, and omitted variables

Here are some fun graphics I put on the final exam this year. One of the fundamental problems in social science inference is that we observe inputs and outcomes without necessarily seeing all relevant variables, and it's difficult to say much about which X's are causing Y.

Here's a look at annual payroll and regular season wins in Major League Baseball between 2007 and 2015, courtesy of MLB.com and the USA Today payroll database.
It's definitely a cloud, but OLS finds an upward sloping line. Now take a look at two randomly selected teams, the Oakland Athletics and San Francisco Giants just for kicks:
There's not much of a clear pattern here at all, in either case. If there really were an upward sloping causal relationship between wins and payroll, you'd also expect to see it here within a team's history over time. But check out the LA Dodgers, whose owner experienced a messy, costly divorce, filed for bankruptcy, and finally sold the team before the 2012 season:
This is seriously fun stuff, because the natural experiment here, namely the essentially forced sale of the Dodgers to cover the enormous costs of a shattered marriage, revealed what looks like a pretty clear upward-sloping relationship between payroll and wins. Even 2015, which was a reduction in both, fits the pattern.