Separation anxiety? - Alex Trautwig
Dogs, babies, needy spouses- they all get it, and to tell you the truth, I get it too... separation anxiety, overwhelming fear and cold sweats cover my body when I think about separating how much of 2012 was luck vs skill. Why do I feel this way?
2011- we sucked
2012- we should have sucked to the 10th power, but we were demigods (coco reference).
Hence, I kind of don't want to know how lucky or skillful last years team was. To preface, I work in finance- the data are extremely volatile, it's noisy, it takes 20+ years of data just to prove that stocks have higher expected returns than one month t-bills or bank CD's. In short, its a lot like baseball. Every event is really a series of odds, probabilities, coin flips- and you need to flip the coin to point you that your right thumb is the size of Yoenis' deltoids to have any sort of a clear picture of what is luck vs skill.
So how do we look at the data? I ran monte carlo simulations (binomial wins and losses, and normally distributed run differentials) with the same characteristics as a .500 team, 94-68 team, and the 2012 A's. I pretend these teams play 1,000 seasons with identical odds each game of each season and check for the variation in wins/losses by chance alone. These number's represent constant odds- they don't change, if a simulated season starts with 5 wins you have no better or worse odds of winning the next game than the assigned odds at the beginning of the season, there's no mean reverting, which I meet with healthy skepticism even though every post I read has some B.S. about players x,y, & z "regressing to the mean".
Simulation 1: A team with 50% odds of winning each game plays 1,000 seasons:
Obviously, the mean median, and/or mode are 81 wins.
Only 2% of the time did this team win 94 or more games. Those are the odds that we were a .500 team last year that was incredibly lucky. Yep, there's that much noise in wins/losses under these assumptions.
68% of the seasons the team won between 87-74 games, 95% of the seasons it was 94-68 wins.
You might think that it's crazy that a .500 team can win 68 or 94 games- I think the data are just that noisy, if mean reversion exists it'd tighten these confidence intervals, but that's a whole other topic.
Simulation 2: A team with 58.02% odds of winning each game plays 1,000 seasons:
This is a 94-68 team, obviously because the expected win percentage is 58.02% 94 wins is the center of the distribution.
3.7% of the time this 94-58 team wins 81 or less games.
68% of the seasons the team won between 101-88 games, 95% of the seasons it was 107-81 wins.
Considering the Mariners won 116 games in 2001, I don't think the 107 win probabilities are any cause for concern. I mean, if maybe 8 teams per year have expected winning percentages as high as 58.02% and we have 50 years of data with 162 game seasons and no regime change in the level of competition we only have 400 seasons in history, where as we have 1,000 in these simulations.
OK, now we have some feel for how much wins can vary by chance alone.
Now lets test seasons based on simulating run differentials rather than the probabilities of the games themselves. Each game that has a positive run differential is a win, while negatives are losses. I know you can't score a partial run in baseball, but if we round the simulated numbers our conclusion would be the same, we just can't round them before hand because it'd bias the simulation greatly.
The 2012 A's had an run differential of +0.5569 runs per game with a standard deviation of 4.27 runs per game (this measures how much the run differential varies per game, we use it in the simulation so that the behavior of the simulated games matches that of real life..
Simulation 3: A team with a .5569 run differential and standard deviation of 4.27 runs plays 1,000 seasons:
If we simulate games based on the SAME distribution of the 2012 A's run differential the expected out come is an 89 win season and there's a 26.5% chance that this team wins 94 or more games. I'd probably wager that this is a better expected value than the 94 wins we saw, the question is, is run differential a good predictor of wins? And 26% isn't exactly winning the lotto. There's a 12.9% chance a team with this run differential distribution wins 81 or less games too.
What if we simulate a .500 team (zero expected run differential) with the same standard deviation as the 2012 A's?
Simulation 4: A team with a 0.00 run differential and standard deviation of 4.27 runs plays 1,000 seasons:
Obviously the center of the distribution is 81 wins, half the time you do better, half the time you do worse. Only 2.10% of these teams won 94 or more games, so it's pretty safe to say that we're probably better than a .500 team.
What's interesting is that simulations 1 & 4 are the same team (.500 record) over 162 games they both have expected wins of 81 games and about a 2% chance of winnings 94 or more games.
The A's run differential doesn't predict their record however, its only predicts 89 wins. If (and that's a big "if") run differential is the best estimator of how good a team is, the A's should have won 89 games and they were slightly lucky (about 1 in 4) to win 94 games.
What to take from this? Baseball is a series of coin flips- some of which are in your favor and some against. Even after 162 games we're left with a lot of noise, hence many decisions are based on point estimates(or expect values) rather than the variation of those values. You also have to understand your assumptions and their impact. For example, the A's were basically 2 different teams before and after the All Star break, maybe we should weight those periods 70/30 in-favor of the post break period, rather than 50/50 if we want to predict next year? Its most important to use statistics to understand the data- not try to predict everything under the sun. Hopefully this adds a little color for some of you here. Just understand that plus/minus 6 games can happen very easily based on bloop singles and bad calls. The other teams in the division's data are probably just as noisy, and nobody knows what each teams true expected winning percentage is, that's why we play the games, else we'd have no need for them. The team with the best record is probably the best team, you just can't prove it.