Projecting Runs and Beating Expectations

Sparked by the discussions in bringbackymiggy's recent diary I decided to looking into projecting runs and then trying to determine trends between the teams that beat the formula or fail to live up to their talents.

I ran a regression on every team from 2000-2007, comparing their Hits, Total Bases, Strike Outs, Walks, Stolen Bases, Sac Hits and Sac Flies to their runs scored.

Each stat returned the following coefficients:
H    -0.10374541
TB    0.372303479
SO    -0.135615182
BB    0.298866411
SB    0.078589393
SH    -0.317950476
SF    0.850655403

All of the values, except stolen bases, are significant at the 95% level.

This gives us a formula of:
ProjR = H * -.104 + TB * .37 + SO * -.14 + BB * .30 + SB * .08 + SH * -.32 + SF * .85

You'll notice that hits return a negative value. This formula does not actually think that a hit hurts the team - it just says that its value is entirely explained by total bases. It would say that every single correlates with .27 runs (TB - H), while a double correlates with .64 runs (2 * TB - H), etc. I would guess that a portion of that single's value comes from stolen bases - as most of those come when the runner started on first. What's curious is that this formula show that a walk correlates more significantly with runs than do hits.

You'll notice that sac bunts also return a negative value, more than canceling out the value of the hit or walk that put the runner there.

Sac flies seem to be the most valuable. Keep in mind, though, that every sac fly scores a run, so it goes without saying that they would correlate pretty significantly with runs scored.

As you can see in the graph below, with a few exceptions, the formula fits reality pretty darn well.

Of course, what we're all really interested in is how did the A's fare?

The answer, not so well:

In each of the last six years the team has failed meet their projections. This year they've been particularly bad.

So what happened? What lessons can we learn? Why do some teams exceed expectations and others come up short?

To answer that question, I ran another regression, using the same X- variables as the previous one, but replacing runs with projDif -- the difference between the actual runs scored and projected runs.

This returned the following coefficients:
H    -0.032243112
TB    0.008497944
SO    -0.146256025
BB    0.297670492
SB    0.079517232
SH    -0.331711418
SF    0.855222478

Hits, total bases and stolen bases were not statistically significant.

It seems to me, the lesson we can learn is that two basic concepts correlate with beating expectations - putting the ball into play (as measured by Sac Flies and not striking out) and drawing walks. The A's are considered particularly good at the latter, but particularly bad at the former.

Maybe the nay sayers are right. Maybe, somehow, Jason Kendall and Mark Kotsay are actually the keys to beating the projections (though, perhaps more through lowering them). Seriously, on both counts -- there's always more room for improvement through just improving talent, even if it's not the most efficient kind of talent. All but a very few teams were within 30 runs of their projections. It seems clear that we're not as efficient as we could be -- but I wouldn't be surprised at all if that were unavoidable on our budget. Players that walk a lot and make consistent contact tend to be expensive.

Finally, here are the top and bottom 10 in terms of the difference between actually and projected runs:

===UPDATED===
Based on Andeux's comments, I replaced H and TB with 1b, 2b, 3b, HR and added Other Outs

Each stat returned the following coefficients:
1b    0.56324714
2B    0.675460767
3B    1.174972722
HR    1.473213429
SO    -0.113783201
BB    0.31091322
SB    0.028410154
SH    -0.314513133
SF    0.499949561
OO    -0.111557603

All except SB were significant at the 95% level.

You'll notice that a strikeout is almost exactly the same as other outs. You'll also notice that sac bunts hurt a team almost three times as much as other outs.

I do find the relative lack of value of a double curious.

You'll also notice that stolen bases are worth virtually nothing.

The overall fit of the graph has improved. The new formula accounts for 93% of the variation, while the old one only accounted for 89%:

It also does a better job of projecting the A's:

Unfortunately, the second regression did not return very meaningful results. None of the coefficients were statistically significant.

Interestingly, doubles had the greatest correlation with beating projections, while sac flies, home runs and strike outs had the greatest correlation with falling short.

Trending Discussions

forgot?

As part of the new SB Nation launch, prior users will need to choose a permanent username, along with a new password.

Your username will be used to login to SB Nation going forward.

I already have a Vox Media account!

Verify Vox Media account

Please login to your Vox Media account. This account will be linked to your previously existing Eater account.

As part of the new SB Nation launch, prior MT authors will need to choose a new username and password.

Your username will be used to login to SB Nation going forward.

We'll email you a reset link.

If you signed up using a 3rd party account like Facebook or Twitter, please login with it instead.

Try another email?

Almost done,

By becoming a registered user, you are also agreeing to our Terms and confirming that you have read our Privacy Policy.

Join Athletics Nation

You must be a member of Athletics Nation to participate.

We have our own Community Guidelines at Athletics Nation. You should read them.

Join Athletics Nation

You must be a member of Athletics Nation to participate.

We have our own Community Guidelines at Athletics Nation. You should read them.