clock menu more-arrow no yes mobile

Filed under:

Melting Down: An Analysis of Bullpen Volatility

The A's bullpen in the span of an off-season went from one of the best to one of the worst. Did the A's fail in their analysis, or are relievers ticking time bombs?

The bullpen has gone from Abad to Aworse.
The bullpen has gone from Abad to Aworse.
Troy Taormina-USA TODAY Sports

The 2015 A's bullpen has been an unmitigated disaster. This has been especially shocking to A's fans because coming into the season, we expected the bullpen to be a team strength... and from what we've heard from management, it seems that this has come as just as much of a shock to them as to us. So what gives? How could the bullpen go from great to so terrible seemingly instantly?

Much has been made, on this site and others, on bullpen volatility: the idea that, from year to year, it's difficult to predict which relievers are going to be great and which are going to explode into a million pieces. On one level, this doesn't make a whole lot of sense: if a reliever is good, they're good, and that should persist year to year. On the other hand, relievers typically appear in smaller samples and higher-leverage situations, so they are prone to lots of random fluctuation because one bad inning (or one bad at-bat!) can destroy their stats. But just how volatile is reliever performance?

I set out to analyze this, and spent a bit of time crunching numbers. I decided to measure ERA (I thought about measuring FIP, but we're talking about pitching results here, not picher skill set) from one year to the next, and see how closely they were correlated. How closely correlated were relief pitcher ERAs from Season 1 to Season 2? For example: how closely correlated was 2012 Grant Balfour's ERA to 2013 Grant Balfour's ERA?

I did this for every set of consecutive qualified seasons for relievers from 2008 to 2014 (the last six years of complete data). So for example: 2011-2012 Ryan Cook and 2012-2013 Ryan Cook are different data points. And if a reliever qualified in one season but did not qualify the next, they were excluded from the data set.

The results? Well, the results were shocking even to me. They're provided in the following graph (Don't worry if you're not super statistically inclined; I'm going to explain it in easy terms!):

The x-axis is the ERA in Season 1, and the y-axis is the ERA in Season 2. If relievers were predictable, you'd expect a good ERA in Season 1 to mean a good ERA in season 2, and the plot to make a nice slope going upward. But as you can see by the random scattering of dots, there's not a great relationship between ERA from one year to the next.

R^2 (the number in the right lower corner) shows how strong the relationship is between the two variables. R^2 falls between 0 and 1, and the closer you are to 1, the more closely they are connected. So for example, strikeout percentage from one season to the next is correlated with an R^2 of 0.7, because generally a pitcher who's good at striking batters out keeps that skill, especially because it's not affected by the defense or the randomness of batter balls. For ERA in general, the year to year correlation is about 0.38.

For relievers from 2008-2014, though, the correlation was a microscopic 0.02, which means that they're pretty much not correlated at all. This is counterintuitive because we know that there are some relievers who are good year in and year out. I'll admit: I didn't expect this when I ran the numbers. I expected to see at least some correlation. The sample size is 471 sets of consecutive reliever seasons, which I think is large enough where you'd see something if it existed. Perhaps if you ran a larger one something would come out; unfortunately I didn't have the time to do 10 seasons, or 20.

Once I found this, I also did sub-group analyses, where I split the sample into pitchers with Year 1 ERAs of 0.00 to 1.99, 2.00 to 2.99, 3.00 to 3.99, and 4.00+. My hypothesis was that the lowest group would have some correlation year-to-year (ie, the best relievers would at least be good the next year, while the merely average ones would fluctuate more). This was not the case: even the 0.00-1.99 group had an R^2 values of 0.03, which isn't much better. Of course, diluting the group further meant that the sample in each group was very dilute: that 0.00-1.99 group had only 41 data points, which probably isn't big enough to measure that sort of thing. It definitely deserves further research.

Regardless, it seems as though relief pitchers are volatile in the extreme. The thing is, though, that the fluctuations happen in both directions: some relievers will perform better year-to-year, and some will perform worse. It seems to have afflicted our A's that all of them have imploded at once, which I don't think anybody could have possibly predicted, even accounting for bullpen volatility.

Random Notes

* The average difference from season to season (ERA in Season 1 minus ERA in Season 2) was -0.11, which I think is telling: despite the fact that the numbers are all over the place, they balance out. 2008-2009 Grant Balfour went from an ERA of 1.54 to 4.81, while Matt Capps had a 5.80 ERA in 2009 but 2.42 in 2010. The numbers fluctuate like crazy, but they fluctuate equally in both directions. The standard deviation, by the way, was 1.3.

* Some fun with outliers:

Yup, one of those crazy outliers is one of our most horrible memories. 2013 Jim Johnson had a 2.94 ERA, 2014 7.09.

Chad "The Oakland A's are My Daddy" Qualls had 2009, 2010, 2011 ERAs of 3.63, 7.32, and 3.51, respectively.