Staturday: Small sample size
Grab a snack and pull up a chair. This will be long.
I think I've been called, indirectly at least, a small sample size Nazi. I don't have the patience to discuss with anybody who insists on using small sample sizes as the basis for their argument. It's dangerous, it's wrong, and you should never, ever do it. Let's talk about why that is.
In baseball, the danger of using small sample sizes boils down to a a phenomenon called regression to the mean. Most of you have at least heard of this concept, and practically everybody utilizes it intuitively in their everyday lives. Regression to the mean, simply put, is the tendency for any observation to be less extreme on subsequent observations.
For example, let's look at the top of the batting average leaderboard for 2001 (min 500 PA), and see how they did in 2002:
| Player | 2001 AVG | 2002 AVG |
| Ichiro Suzuki | .350 | .321 |
| Larry Walker | .350 | .338 |
| Jason Giambi | .342 | .314 |
| Robbie Alomar | .336 | .266 |
| Todd Helton | .336 | .329 |
| Moises Alou | .331 | .275 |
| Bret Boone | .331 | .278 |
| Lance Berkman | .331 | .292 |
| Chipper Jones | .330 | .327 |
| Albert Pujols | .329 | .314 |
| AVERAGE | .337 | .305 |
Finishing near the top of the batting average leaderboard is an extreme observation. And, in aggregate these players performed worse in the following year. That is, our observation of their batting averages was less extreme.
You can do the same thing for the worst hitters and you'd find the exact same thing. You can use batting average, OPS, Runs Created - it really doesn't matter.
In this case, we had the extreme observation that each of these players had very high batting averages in 2001. And the next year, by and large, their batting averages decreased - in other words, the observation in 2002 was less extreme than the observation in 2001.
Why does this happen? The reason is that for any player, you have "true skill" and "actual performance." We too often think that "actual performance" is "true skill." That's incorrect. A player has a true skill, and he utilizes that true skill to accumulate actual performance data. But those data are only random samples from his true skill.
Any observation we make will have the tendency to regress to the mean, and the amount that it regresses depends on two things:
1. How much performance data we have. When you have almost no data, you regress all the way. So if the A's call up Matt Sulentic for one day and he goes 1-1, we observe his batting average to be 1.000. But we know that is not his true skill, and the fact that we have only one data point means that, if we want to estimate his true skill, we have to regress very heavily to the mean (like, all the way).
If he were to continue to accumulate plate appearances, we would regress smaller and smaller amounts. We need a lot of performance data before the actual performance matches the true skill. This is what people mean when they talk about "small sample size."
2. The spread in skills are among the general MLB population. We regress to the mean more if the spread in skills among the general population is small, and we regress less if the spread in skills among the general population is large. If there were a skill for which there were zero variation among the general MLB population, we would regress all the way to the mean, no matter what the performance data indicated.
Sitting on a dock on a Bayes...
Let's back up a minute and talk philosophy.
Pick a hitter - say, Travis Buck. You watch Travis Buck play in 2007 and observe that he has a .377 OBP. What do we know about Travis Buck's ability to get on base? Here's a list:
1. Travis Buck had a .377 OBP in 334 plate appearances.
Now, statisticians will tell you that there's a margin of error there. I'll skip the derivations, but you can get the standard deviation using the equation:
sqrt(OBP*(1-OBP)/PA)
So, we observed that Travis Buck had a .377 OBP +/- .026. We think that Travis Buck's "true OBP" is somewhere between .351 and .403. If you're not familiar with standard deviation, then think of it as a "margin of error" of sorts. The true talent is 68% likely to fall between plus or minus one standard deviation.
There's just one problem: our conclusion that Buck's true OBP skill is .377 +/- .026 is hopelessly and utterly wrong. Why? Because our list sucked.
We actually know two things about Travis Buck:
1. Travis Buck had a .377 OBP in 334 plate appearances.
2. Travis Buck is a major league baseball player, and the average major league baseball player has a .330 OBP.
The second point is absolutely critical. Buck isn't some random dude picked off the street; he is part of an overall population. We know two things about him: what we observed about him, and what we observed about people who are like him.
Taking the second item into account is the same as regression to the mean.
The league-average OBP is .330. Through some complicated statistics, we know that the variation in true skill (the standard deviation) across all major league hitters is 0.025. That is, 68% of all major leaguers have a true OBP skill between .305 and .355.
Now we have two measurements corresponding to our two observations regarding Travis Buck:
1. We observed his OBP to be .377 +/- .026
2. Tavis Buck is a major leaguer, whose collective OBP is .330 +/- .025.
We must take into account both of these measurements. Must! We combine these measurements in such a way that the measurement with the least uncertainty is given more consideration, and the measurement with the most uncertainty is given less consideration.
Mathematically, we do this by weighting by the inverse of the square of the standard deviation.
The equation for combining two measurement with two different standard deviations is:
true skill = (m1/s1^2 + m2/s2^2)/(1/s1^2 + 1/s2^2)
where m1 is measurement 1, m2 is measurement 2, s1 is the standard deviation in meaurement 1, and s2 is the standard deviation in measurement 2.
This equation is how we regress to the mean, and we're about to do it for Travis Buck.
Plugging in:
true OBP skill = [.377/(.026^2) + .330/(.025^2)]/(1/.026^2 + 1/.025^2) = .353
Notice that our estimate of his true OBP skill is basically halfway between our first measurement (.377) and our second measurement (.330). This is because the uncertainty in our first measurement is almost equal to the uncertainty in the second measurement.
One can play lots of games with regression to the mean, and I won't go into them here. But I do want to point out that our second measurement, that Buck is a major leaguer, is somewhat arbitrary. We could just as easily have chosen any population to which Buck belongs: all American men, major leaguers with really nice butts, people who are 6'2"/200 lbs, etc. Choosing the right population - statisticians call this "choosing the correct prior" - is something that forecasters have spent many hours pondering. But in the absence of any other information, a good population to use is "all major leaguers." Forecasting performance based on regressing to the major league mean is actually remarkably accurate. Projecting a player's performance by simple regression to the league mean is almost as good as systems like PECOTA, Chone, or ZiPS.
Also, we must regress any measurement we make, including, for example, splits. Let's say you look at Eric Chavez's awful splits against portsiders. You must - must! - regress those splits to the mean performance of all lefty-on-lefty matchups. The same thing goes for home/road splits, splits by lineup order, splits by position, splits by month, etc. What you find is that the many splits that are touted as important ("he hits .312 after the All-Star Break!") are actually meaningless.
The take home message
"Small sample size" depends on which skill is being discussed. Maybe later, I - or maybe in the comments, you - will discuss how aggressively we regress different skills. Until then, let's look at some examples.
As Buck accumulates more and more plate appearances, our uncertainty in his actual performance will decrease. The more certain we are about about his actual performance, the less we have to regress to the mean.
The following chart shows how our estimate of his true OBP skill changes with the number of plate appearances over which we make observe him to have a .377 OBP.

If Buck had a .377 OBP one month and we wanted to estimate his true talent based only upon the 100 PA he got in that month, then regression to the mean would be very strong and we would estimate his true OBP skill to be .338. If Buck accumulated 5000 PA over eight years and had a .377 OBP, we would regress only a small amount and estimate his true OBP skill to be .373 (assuming no change in skill over eight years, which is a bad assumption). Here, we see that the "small sample size" argument is intexricably linked to regression to the mean.
Some skills have very wide variation among the MLB population and other skills have a very narrow variation. If a skill has a very wide variation, then the standard deviation will be very large. This results in regression to the mean being fairly weak. If a skill has a very narrow variation, then the standard deviation will be quite small and the regression will be very aggressive.
One skill that has a wide variation, for example, is hitting home runs. You will know a lot about a hitter's ability to hit home runs after only a few hundred plate appearances. So "small sample size" when discussing home run hitting ability is anything less than a few hundred plate appearances.
A skill that has a narrow distribution, for example, is batting average on balls in play (for pitchers). After nearly 3700 balls in play (about 5 years of full-time pitching) we still have to regress 50% of the way in order to estimate a pitcher's true BABIP skill. So if anybody mentions that a pitcher has had a consistently low BABIP, don't believe them unless they come armed with several years worth of data.
If you remember nothing, remember this: we must always regress to the mean when figuring true player skills, and the amount we regress is based on how much performance data we have and what the spread in skills are among the general MLB population.
If you'd like to learn more about regression to the mean as it applies to baseball, I highly recommend purchasing and reading The Book by Tom Tango, Mitchel Lichtman, and Andy Dolphin. You should also read their blog.
I've also recently done some work looking at career trajectories using regression to the mean at The Hardball Times. You can check them out here and here, and as well in the coming weeks.
Thanks for reading.
5 recs |
127 comments
Comments
I may be in the minority here...
But not only did I not find this post too long, but I was disappointed when it ended! This is great stuff. It is statistical candy from the baseball gods. Awesome job, Salb.
More than just ANtics: http://www.louisgray.com/live/
by louismg on Mar 7, 2008 10:46 PM PST reply actions 0 recs
I concur
I find the Staturday articles very fun to read from a statistical point of view whether they are written by salb or devo. The articles are never too long and I would enjoy reading more.
After browsing through the papers and websites for A's news its refreshing to see someone put in something with a different approach even if its just once a week.
As I was reading the article, my impression was that Salb was probably hinting at all the people who were getting excited with the hitting results of the very small sample of spring training games, myself included. The team has been hitting great excluding today and a few other times.
Great Job again Salb.
by Coffee13eans on Mar 7, 2008 11:23 PM PST up reply actions 0 recs
Just thought I'd share what I wrote to Sal after reading a draft of this
a couple of days ago:
I am now smarter than I was 20 minutes ago ...
~Devin
I'm serious ... I hope ya'll find my writing interesting -- but I'm not half as qualified as this guy.
"It's for your own good. Big strong Devo knows whats best for Poppy" -- Mossback
by devo on Mar 7, 2008 11:04 PM PST reply actions 0 recs
Good stuff
As I sometimes say----
My liberal arts education taught me the importance of math, though for many it seems the opposite is true...
Far and wide are my travels around the internet, and everywhere there is Bayes and priors; this article is another small notch in my understanding that prevents me from having to pick up a textbook...
I am confident that further returns will cement Buck's awesomeness....
The A's colors are green and gold.
by mikeA on Mar 7, 2008 11:30 PM PST reply actions 0 recs
Great post...
...I would have liked to see an attack on the theory of "clutch" hitting though :)
Juan Pierre: 44 Million Dollars, Juan Pierre's 3.2 WARP3: Priceless
by Travis Buck Nuckin on Mar 8, 2008 12:16 AM PST reply actions 0 recs
Well, this post certainly applies
since regression to the mean in the case of clutch hitting is almost 100%. The sample sizes bandied about by announcers are so small as to be worthless. It takes basically a career's worth of data to show a guy hit 2% better in the clutch, which is virtually meaningless.
Your 2008 Athletics: It's Nothing Personal.
by PaulThomas on Mar 8, 2008 9:53 AM PST up reply actions 0 recs
if you can show, in a statistically significant way, that a player hit 2% better in the clutch ...
that's pretty damn significant (pun totally intended). 2% is .020 points of batting average that's pretty dang good.
"It's for your own good. Big strong Devo knows whats best for Poppy" -- Mossback
by devo on Mar 8, 2008 10:12 AM PST up reply actions 0 recs
Exactly,
Virtually "meaningless" depends on the definition of "meaningless".
Just because something is not "statistically significant" does not mean it does not matter. Not when in most sports, the margin between winning and not winning is often not statistically significant.
ZIPS: Milledge: 466 HR, 485 2B, 2282 hits, 278-379-524
by rfloh on Mar 8, 2008 10:35 AM PST up reply actions 0 recs
Huh ... that was not what I was saying at all ...
not even remotely ...
But as to what you were saying -- just because something is difficult or impossible to prove statistically does not mean it does not exist or that it is not important. This is absolutely true.
Personally, I'm skeptical that clutch hitting exists as a real, repeatable skill (though not that the concept of "clutch-ness" exists at all). The reason I believe this is because hitting is almost purely about skill. No player in any sport can become more skilled on command. A running back could get a shot of adrenaline and run faster or a basketball player could jump higher, a tennis player could ignore the fact that he just threw up in the corner of the arena, etc but none of them can all of a sudden become more skilled. Swinging a baseball bat is not like other sports (except golf) because it is so purely based on skill and the proper balance of strength and fluidity that all of a sudden becoming stronger would be more likely to help than to hurt.
Basically, stepping it up in the clutch wouldn't mean that a player is over performing when it matters most, it would mean that a player was under performing the rest of the time. Hitting is not like pitching or running up and down a basketball court in which case there is often a good reason to hold a little something back for key key situations later in the game. Hitting (assuming you have the requisite skills) is not difficult. A DH could play a 50 inning game and, assuming he was in good enough shape to do it to begin with, run a marathon afterwards (okay, maybe not quite). Really focusing isn't such a burden that a player cannot do it 4-5 times a game without difficulty.
Which gets me to what I do believe in. While I'm skeptical that players are really clutch, I do belief that choke hitting exists as a real, repeatable anti-skill. Players that lose focus, that try to swing to hard or whatever are going to be less likely to come through in the clutch.
But, back to the statistical angle of this post. Just because we cannot measure something in a statistically significant way does not mean it does not exist. That is true.
Unless you have additional reason to believe something to be the case, though (say, scouting, for instance), teams should not be spending money (or other currency -- players, draft picks, etc) or making other decisions based on non statistically significant data. They'll have about a 50-50 chance of being disappointed. It's not that the potential for that skill didn't exist -- it's just that they had no reason to believe that the player they acquired (inserted in the lineup, etc) actually had that skill.
"It's for your own good. Big strong Devo knows whats best for Poppy" -- Mossback
by devo on Mar 8, 2008 11:05 AM PST up reply actions 0 recs
hold on there!
Sticking to Sal's (and your) argument about overall skill level and performance, if in "clutch" situations there are those who perform at their mean and those who underperform their mean, wouldn't there be a probabilistic likelihood of a commensurate group overperforming? Given a large enough sample size, of course?
I'm just not sure how you can justify an argument for "not clutch" on probabilistic grounds.
I wasn't thinking about anything. I'll have to try to do that more often. @('.')@
by monkeyball on Mar 8, 2008 11:21 AM PST up reply actions 0 recs
of course ...
I'm not saying that performances that can be described as clutch don't exist -- just that I don't believe they are a real, repeatable skill.
"It's for your own good. Big strong Devo knows whats best for Poppy" -- Mossback
by devo on Mar 8, 2008 11:41 AM PST up reply actions 0 recs
but unclutch is a repeatable unskill?
I wasn't thinking about anything. I'll have to try to do that more often. @('.')@
by monkeyball on Mar 8, 2008 12:40 PM PST up reply actions 0 recs
it hasn't been statistically shown to be (that I know of) ...
I just theorize that it could exist ...
"It's for your own good. Big strong Devo knows whats best for Poppy" -- Mossback
by devo on Mar 8, 2008 2:35 PM PST up reply actions 0 recs
Much like my sense of compassion
I mean, theoretically I could have some but trying to quantify an intangible skill or trait is extremely difficult.
The monster at the end of this blog.
by grover on Mar 8, 2008 2:52 PM PST up reply actions 0 recs
you're going out on a limb there ...
but I suppose crazier things must have happened, right?
"It's for your own good. Big strong Devo knows whats best for Poppy" -- Mossback
by devo on Mar 8, 2008 3:00 PM PST up reply actions 0 recs
ah, but what devo's proposing is quite tangible
Here's another way of looking at it: devo postulates that clutch doesn't exist, but unclutch does (I'm formulating a stronger position than devo's explicitly claimed, just for the sake of argument). Which would mean that for most players, over a large enough sample size of clutch situations, their performance should regress (positively and negatively) to their mean nonclutch-situation performance; and a small subset of players, again over a large enough sample of critical situations, would significantly underperform their mean nonclutch-situation performance. Which would leave us with an overall picture where the overall composite population of all players (i.e., the large subset of "normal" players + the smaller subset of unclutch players) would show a distinct trend (though a trend smaller than the trend of the unclutch subgroup) toward underperformance in clutch situations.
Now, that could be the case -- if the stats back it up. But it seems unlikely to me.
I wasn't thinking about anything. I'll have to try to do that more often. @('.')@
by monkeyball on Mar 8, 2008 4:45 PM PST up reply actions 0 recs
Players do perform worse in clutch situations
by most definitions, but until you account for the fact that those players are a. overwhelmingly facing relievers, who outperform starters in general, and b. facing same-handed relievers with unusual frequency, and thus working against the platoon differential more often in the clutch than in non-clutch PAs, the raw numbers don't prove anything.
Your 2008 Athletics: It's Nothing Personal.
by PaulThomas on Mar 8, 2008 5:21 PM PST up reply actions 0 recs
Clutch or Non-Clutch that is the question.
I would say that most people believe in players that are unclutch but few if any believe in clutch players. I agree their are unclutch players as it seems obvious when you consistently see certain player fail in so called important or crucial moments. If a player has a trend or a 2% increase in balls hit into play then I would say they should be considered clutch. Simply making an 2% improvement in contact in any given situation shows that you have some innate ability to improve your skills in such a situation. Certain players could then be considered clutch then when no one is on base due to having an increase ability to make contact when the bases are empty. Also players with an increased ability to make contact with men on base would be clutch in those situations. The problem is that because of the variables and lack of a large sample size it is often times hard to tell if a trend is just that or if it is really the make up of the player. Given that players have increasing abilities early in their career followed by decreasing abilities later it is often hard to determine if the players skills or make up were responsible for his unclutch or clutch performances. I think clutch existed but it is often determined by the player themselves and just as you can have a bad day at the office so can players. This uneven attitude or mental capacity changes the players ability to be clutch or unclutch. So, yes I believe clutch exists but you can not measure it or even predict it because that ability lies with the player and their mental perspective and make up. The ideal of formulas is to predict or reasonable predict out comes. Could a formula be made to predict clutch performances? Yes, but first it would have to take into account a players mental capacity and life situations and physical abilities at that given moment in time.
All Truth Goes Through Three Stages 1.It is ridiculed 2.It is violently opposed 3.Finally, it is accepted as self-evident LGT kinesiologist! Straw,Drink
by E5 on Mar 9, 2008 4:38 PM PDT up reply actions 0 recs
Well, I'm sceptical of clutchiness too
but using your argument that "Swinging a baseball bat is not like other sports (except golf) because it is so purely based on skill and the proper balance of strength and fluidity that all of a sudden becoming stronger would be more likely to help than to hurt."
I would argue that BECAUSE it is more based on skills, on fine mechanics, instead of brute force strength, that "clutchines" is more likely.
With a sport involving a brute force approach, the athlete merely need push himself to 100% intensity or slightly more of physical capacity, ie maximum force production, here I define 100% intensity as the best he has ever performed.
With sports that are influenced by fine mechanics, just hitting maximum force production is not enough. Thus, it is more difficult to turn on, and off, max production. Thus, those who can, are "clutch".
As for the non-stat significant stuff, I agree in general, but disagree on specifics. To me, the key is "Unless you have additional reason to believe something to be the case,".
Which too many people who chant the "it's meaningless, because it's not statistically significant mantra" seem to ignore.
ZIPS: Milledge: 466 HR, 485 2B, 2282 hits, 278-379-524
by rfloh on Mar 9, 2008 7:43 AM PST up reply actions 0 recs
On the clutchiness argument ... we're just going to have to agree to disagree ...
I'm not sure which specifics you're disagreeing with ... you seem to just be quoting me and then saying you agree ...
One thing that you should keep in mind re the your last sentence -- with statistics there are specific, unassailable standards for significance. Far too often (though not always) non-stats based reasons for believing something are also based on insufficient observations and are about as valuable as batting average after two games. We should be both open minded and apply rigorous standards to statistical analysis -- but we must also apply the same level of rigorous standards to non-stats based analysis as well.
"It's for your own good. Big strong Devo knows whats best for Poppy" -- Mossback
by devo on Mar 9, 2008 2:33 PM PDT up reply actions 0 recs
I'm aware of your second point
I'll use an example. I have often seen doctors / sports science researchers who are stuck in their ivory towers do studies, say on different methods of training, different methods of rehabilitation, and then blithely conclude that because there is no statistical significance, there is no difference, the difference is meaningless.
When, in the real world, the difference is often the difference between winning a gold medal, or a bronze, or worse, finishing 4th and ending up with absolutely zilch.
Clutchiness: I'm arguing that clutch is more likely in baseball, because in sports that require less skill and fine mechanics, it's easier for any athlete to push himself / herself to perform at maximum, all that is necessary is desire. If anyone can do it, it is no longer "clutch".
ZIPS: Milledge: 466 HR, 485 2B, 2282 hits, 278-379-524
by rfloh on Mar 9, 2008 3:20 PM PDT up reply actions 0 recs
I understand your POV on clutchiness -- I just disagree ...
Anyone making those conclusions is misusing stats. Not being able to show a difference does not mean there is no difference. It basically means nothing.
"It's for your own good. Big strong Devo knows whats best for Poppy" -- Mossback
by devo on Mar 9, 2008 3:34 PM PDT up reply actions 0 recs
I think I used the wrong term
Perhaps "useless" is better than "meaningless." Hitting 20 points better in the clutch isn't actually all that helpful to a team, given a player's limited number of plate appearances in such situations. If a guy has a 2% clutch edge, it takes 50 clutch plate appearances to generate one additional hit. If a hit is half a run, and a player gets 25 clutch PA a year, you're talking a quarter of a run per season.
On top of which, the number of players who are available on the market who are old enough to have proven clutch skill AND who have actually demonstrated it AND who aren't otherwise overvalued by the marketplace has got to be virtually, if not literally, zero. If you're signing John Olerud at the end of his career, you can add .25 runs to your offense for his clutch skill. That's about as far as it goes.
Your 2008 Athletics: It's Nothing Personal.
by PaulThomas on Mar 8, 2008 5:29 PM PST up reply actions 0 recs
a "clutch" hit is most likely worth more than half a run ...
since many clutch situations are defined as having runners on, especially in scoring position. By definition that run is going to be worth quite a bit more than a typical run in terms of the probability that it leads to a win. Rather than thinking of that his as adding half a run, it would be more appropriate to think of it adding, say, a third of a win.
The second paragraph I agree with -- but I assume that we go through these exercises not just as practice for our future careers as GMs, but also to expand our understanding and appreciation of the game and its players.
"It's for your own good. Big strong Devo knows whats best for Poppy" -- Mossback
by devo on Mar 8, 2008 6:26 PM PST up reply actions 0 recs
Fair enough, the WPA is probably higher
than your typical base hit by some amount. I doubt it's as much as a third of a win, though. Consulting my handy-dandy win expectancy table, even a 2-out 2-run single to tie a game in the top of the ninth inning (sounds pretty clutch to me) is only worth about .27 wins.
Your 2008 Athletics: It's Nothing Personal.
by PaulThomas on Mar 9, 2008 12:21 AM PST up reply actions 0 recs
Firstly, I'm not suggesting
that signing decisions be based solely, or even primarily on clutchiness. No one is suggesting that you sign John Olerud at the end of his career because he was clutch, but is already not a good player. What if whether David Ortiz is actually clutch or not, is the small difference, say a couple million total, between the RS being willing to pay him the amount necessary to get him signed or him walking? Useless?
Also, typically, a hit is half a run, if it is a single. Doubles, triples, HRs are worth more.
Even using WPA, you are not fully valuing "clutch" hits. A "clutch" hit in the playoffs is certainly worth far more than just 0.27 wins. Moreso, in the WS.
ZIPS: Milledge: 466 HR, 485 2B, 2282 hits, 278-379-524
by rfloh on Mar 9, 2008 8:37 AM PST up reply actions 0 recs
OK... and those are even more unlikely to occur...
If you actually feel like tallying up the probabilities of a guy finding himself in various WPA situations and finding the average WPA of a 2% clutch advantage, knock yourself out. I'm not going to waste my time on it.
I would venture to say that any GM who signs a player because of perceived clutchitude is either statistically incompetent or lying to himself about why he's signing a player. The odds of a player's demands and his true worth being so close that a massively regressed "clutch advantage" actually tips the scales are so close to zero that it's not worth anyone's time to figure it out. The information is more likely to mislead than to inform.
Your 2008 Athletics: It's Nothing Personal.
by PaulThomas on Mar 9, 2008 12:03 PM PDT up reply actions 0 recs
I'm not asking you to do it
I'm not going to do it either. Neither of us is getting paid.
It won't hurt, the people who are actually being paid to figure it out. Whether the information is used well or not, depends on the people using it. All information / knowledge is worthwhile.
ZIPS: Milledge: 466 HR, 485 2B, 2282 hits, 278-379-524
by rfloh on Mar 9, 2008 3:12 PM PDT up reply actions 0 recs
OK, I think I've isolated the issue here
I disagree with that last sentence. A lot of information is junk. It's a waste of time to look at it, and it can affect your decisionmaking in ways that actually damage your ability to make rational decisions. There are documented situations in which people given a small amount of highly useful information significantly outperform people given all the information that can be dredged up on a topic.
Your 2008 Athletics: It's Nothing Personal.
by PaulThomas on Mar 9, 2008 11:29 PM PDT up reply actions 0 recs
That is the fault of the people using the information.
If you know that David Ortiz is worth 5 additional runs a year, but the choose to value those 5 additional runs at $20M, that is your fault. Not the fault of the information.
Stats like batting average, are not junk. Even if some idiots choose to misuse them.
ZIPS: Milledge: 466 HR, 485 2B, 2282 hits, 278-379-524
by rfloh on Mar 10, 2008 10:53 AM PDT up reply actions 0 recs
A lot of information is junk.
The more junk you have the harder it is to use the good information correctly. Batting average is not complete junk -- but there are other, better ways of measuring the value derived from a base hit, relative to a walk and by considering both stats, instead of just the superior one, you are making it difficult not to overvalue those singles.
A single is roughly 50% more valuable than a walk. (and if you want to adjust the weighting because you believe that walks are more difficult to draw in higher leverage situations [or whatever], you can ... it would just require a bit of math to first verify that the theory is true and second to ascertain the proper weighting) If you have one stat (call it, maybe, EQA) that weights the values of different events (singles, doubles, walks, stolen bases, etc), divides them by opportunities (and, perhaps, makes a routine adjustment to end up with a clean looking statistic). If you take that stat and then you look at both it and batting average -- the only way to properly use the information is by completely ignoring batting average. But, seriously, what are the odds that someone who bothered to look at batting average in that situation is going to completely ignore it?
In addition, splits stats are almost uniformly worthless. He slugged .732 on Mondays? Wow ... except that there is no reason to believe there is any correlation ... and the only reason his slugging percentage his so high on Mondays is because he teed of on a AAAA loser and the long reliever three weeks ago (the result of small sample size), knocking two home runs and a double in the blowout ...
That is information that is completely useless and the only proper way to deal with it is to completely ignore it. Wouldn't it be easier to just not have that information in the first place?
(hint: yes it would)
"It's for your own good. Big strong Devo knows whats best for Poppy" -- Mossback
by devo on Mar 10, 2008 12:57 PM PDT up reply actions 0 recs
Good example of regression to the mean.
Let's say Buck slugged .732 on Mondays every year for 10 years. Guess how far we regress to the mean?
Answer: 100%. Remember, the amount we regress is based on the spread in skill in the overall population. And (I'm guessing) the overall population has no tendency to slug better or worse on Monday. If that variance among the general population is zero, we must regress all the way to the mean no matter what the performance data indicated.
stat-addled alien overlord
by salb918 on Mar 10, 2008 1:21 PM PDT up reply actions 0 recs
I won't judge the meaningless part...
...but vitrual all studies of clutch hitting show that the regression is extremely heavy because of the small sample sizes. This makes clutch hitters very difficult to identify (if they exist).
stat-addled alien overlord
by salb918 on Mar 8, 2008 12:40 PM PST up reply actions 0 recs
You had me at "Sitting on a dock on a Bayes"
Another interesting case of Thing 1 (represented by Matt Sulentic) is cosmic variance. When trying to determine the mean properties of the Universe, we have to allow for the fact that only one Hubble volume (sample) is observable to us, so the variance is infinite.
by green star oakland on Mar 8, 2008 12:33 AM PST reply actions 0 recs
cosmic variance
This isn't as big of a problem as you suggest, because cosmological theories predict a lot more than the mean properties of the universe. Specifically, they make predictions for fluctuations in various quantities (cosmic microwave background temperature, baryon density, etc) at various length scales. When looking at the cosmic microwave background, there is only one dipole term, so the uncertainty due to cosmic variance is 100%. However, if you look at fluctuations with angular extent of a degree (ell of about 200, which is near the peak of the CMB temperature spectrum), then there are something like 40,000 independent modes on the sky (assuming CMB gaussianity). It is true that, with the new 5-year WMAP results, the measurements of the temperature power spectrum are cosmic variance limited up to ell of 550 or so.
Sorry for the rant. I just got a little too excited to see cosmology show up on AN!
by colin on Mar 8, 2008 2:06 PM PST up reply actions 0 recs
Not to digress too far here, but
(i) I didn't say it was a problem, just an interesting example (and one that is most easily explained by reference to the mean)
(ii) It is an issue for cosmological signals only encoded in the lowest multipoles, like the "anomolous" TT quadrupole, the TE reionization bump or the BB gravity wave signal,
(iii) Don't even get me started on WMAP.
by green star oakland on Mar 9, 2008 1:39 PM PDT up reply actions 0 recs
small sample size
So are you suggesting that we get salb started on the quadrapole-octopole alignment?
by colin on Mar 10, 2008 9:02 AM PDT up reply actions 0 recs
This is excellent
Sal and exactly the kind of thing I was hoping to bring with the Staturday columns. I will say that I'm very guilty of looking as small sample sizes on occasion. The emotional fan side of me wins out over the rational stats side. See, I'm hoping we see a .400 OBP from Mr. Buck this year. But I know reality says otherwise.
by Tyler Bleszinski on Mar 8, 2008 12:52 AM PST reply actions 0 recs
Thank you
I, too, am hoping that Buck is awesome. For me, being a sports fan is rooting against the mathematical observations!
stat-addled alien overlord
by salb918 on Mar 8, 2008 8:57 PM PST up reply actions 0 recs
Ah, so you like to gamble too?
"You may glory in a team triumphant, but you fall in love with a team in defeat."--The Boys of Summer
by alox on Mar 8, 2008 9:33 PM PST up reply actions 0 recs
do you do that in chemistry too?
"It always breaks my heart when the formaldehyde molecule bonds with the ammonium chloride molecule rather than simply oxidizing. What could have been!"
Signatures? We don't need no stinking signatures.
by jubjub on Mar 9, 2008 8:59 AM PST up reply actions 0 recs
I try not to take it personally when molecules don't
do what I want them to do.
stat-addled alien overlord
by salb918 on Mar 9, 2008 9:17 AM PST up reply actions 0 recs
The reason this post is great
You use plain language.
I understood the concept of regression to the mean before this. I actually have The Book, too. But you do something most of the math guys can't do well: explain the math for non-math experts.
I'm actually very good at math, but I went the English route in college so the hardest math class I ever actually took as Algebra II Honors my senior year of high school. While I can do the math I know, I never learned a lof of the stuff guys like Tango and MGL discuss. You do a great job of "dumbing down" the technical parts without talking down to us, and you retain the message.
I don't know if I really gained a lot of knowledge concerning regression to the mean and baseball, but I'm now better at math. Tango, MGL, and the others have never helped me very much in that regard.
by thejd44 on Mar 8, 2008 12:53 AM PST reply actions 0 recs
I disagree entirely about Buck
Swisher and Blanton -- those guys are part of an "overall population."
Travis? More like part of a Jams population.
(Sal, I think [mistakenly, I know] you're part of a JAMS population.)
I wasn't thinking about anything. I'll have to try to do that more often. @('.')@
by monkeyball on Mar 8, 2008 12:55 AM PST reply actions 0 recs
It took me about ten minutes to get that.
stat-addled alien overlord
by salb918 on Mar 8, 2008 8:34 AM PST up reply actions 0 recs
B'Gosh, really?
I wasn't thinking about anything. I'll have to try to do that more often. @('.')@
by monkeyball on Mar 8, 2008 11:24 AM PST up reply actions 0 recs
Jeez, these journals need to figure out different acronyms
I thought you were referring to the journal of the American Medical Society... and I was thinking, salb isn't a doctor (medical variety, at least), is he?
Your 2008 Athletics: It's Nothing Personal.
by PaulThomas on Mar 8, 2008 9:57 AM PST up reply actions 0 recs
reminds me of my one and only original country song
"I Needed a Doctor (But All She Had Was a PhD)"
I wasn't thinking about anything. I'll have to try to do that more often. @('.')@
by monkeyball on Mar 8, 2008 11:24 AM PST up reply actions 0 recs
I'm not a doctor of any variety.
And I'm definitely not a member of any AMS, as monkeyball intimated.
stat-addled alien overlord
by salb918 on Mar 8, 2008 12:41 PM PST up reply actions 0 recs
Irony?
Though I agree with your points, I found it rather amusing that you chose a single sample that only showed the results of only two seasons to discriminate against small samples.
by williadc on Mar 8, 2008 2:41 AM PST reply actions 0 recs
There was a different version initially...
He wrote a first draft with several thousand additional examples, but he overloaded the Internets when attempting to upload it to AN...
by LoveDemAs on Mar 8, 2008 5:50 AM PST up reply actions 0 recs
This is great, sal
Your next assignment is to analyze the reaction to your post -- how much of it is due to the intrinsic awesomeness of what you've written, and how much is due to random reader variance?
One little substantive note, from a person with a really rudimentary understanding of statistics. One of the tricky things here is that what you're calling "true skill" is very much a moving target. That is, the statistical analysis of Buck's OBP performance is designed to tell us how much of his past performance is due to "true skill" and how much is due to luck. But his "true skill" is not stable -- presumably, he's becoming a better hitter as he gains more experience. So although the regression gives us a better idea of where his skill level truly was in 2007, there's still lots of room for speculation about where it will be in 2008.
"And Julio Franco is batting right-handed!" -- Wayne Hagin, A's radio play-by-play, mid-80s
by Nick on Mar 8, 2008 7:31 AM PST reply actions 0 recs
and, of course, by about the time a hitter is experienced enough to get a lot better ...
... he's so old that he starts his age-28-onward decline.
I wasn't thinking about anything. I'll have to try to do that more often. @('.')@
by monkeyball on Mar 8, 2008 7:45 AM PST up reply actions 0 recs
Indeed, Nick.
Regression is only one part of projecting a player's performance. There is typically also an explicit or implicit age adjustment. I talk a little bit about that in the THT articles I linked toward the bottom.
Great, great observation.
stat-addled alien overlord
by salb918 on Mar 8, 2008 8:33 AM PST up reply actions 0 recs
There's also the problem
that players' "true skill" hops around even during the middle of a season as a result of injuries and possibly a few other things (sal may know this, but has anyone done a study to demonstrate [or disprove] that inconsistent playing time hurts players' offensive numbers?). So you might think you're sampling from "all major league players," but actually be sampling from "all major league players with a sore hamstring and bone chips in the elbow," but maybe only for two weeks, so that data should actually be regressed to a different mean... and so on ad infinitum.
While approximations like the above are highly useful and much less labor-intensive, I'd be interested to see whether something like PECOTA outperformed or underperformed a well-informed, statistically savvy, relatively unbiased subjective observer.
Your 2008 Athletics: It's Nothing Personal.
by PaulThomas on Mar 8, 2008 10:06 AM PST up reply actions 0 recs
Ask mikeA ...
"It's for your own good. Big strong Devo knows whats best for Poppy" -- Mossback
by devo on Mar 8, 2008 10:13 AM PST up reply actions 0 recs
About what?
Your 2008 Athletics: It's Nothing Personal.
by PaulThomas on Mar 8, 2008 5:14 PM PST up reply actions 0 recs
He did a fan projection ... of the A's last year ...
comparing that to the computer projections would be an interesting start ...
"It's for your own good. Big strong Devo knows whats best for Poppy" -- Mossback
by devo on Mar 8, 2008 6:27 PM PST up reply actions 0 recs
BPro's Nate Silver believes that
inconsistent playing time hurts offensive numbers (or maybe he just thinks the two are correlated). I'll defer to him in the absence of any other studies.
stat-addled alien overlord
by salb918 on Mar 8, 2008 12:43 PM PST up reply actions 0 recs
or his next assignment can be
to measure jennifer's true skill as a poster, and whether or not she's "clutch"
I'm here to talk about the past.
by 67MARQUEZ on Mar 8, 2008 5:57 PM PST up reply actions 0 recs
Why We Regress
Here's how I like to think about regression. Take the top 10 batting averages from any season. Why do those names appear there? One, those are good hitters. Two, those are the good hitters who had luck/statistical variation on their side. The next season, they are all likely still good hitters (Sal's example shows a .305 group average), but the luck is removed.
The same goes for team wins. When a team wins 110 games is it more like that they are a 100-win team that was lucky or a 120-win team that was unlucky. The answer is obviously the first, because there are many more 100-win teams than 120-win teams. But since there's a *chance* that the team was actually a real 110-win team, we don't regress all the way towards 100-wins (or 81 wins or whatever).
Regression makes an attempt to find the best guess of true talent that would result in what we observed.
by Sky Kalkman on Mar 8, 2008 9:16 AM PST reply actions 0 recs
Ooh, Professor B, Professor B!
You rock!

There is an A in Whimsy.
by FreeSeatUpgrade on Mar 8, 2008 9:24 AM PST reply actions 0 recs
splendid
"You know, as that was coming out of my mouth, I knew that it was wrong."
by JI on Mar 8, 2008 10:21 AM PST reply actions 0 recs
Sal, any thoughts on confidence level?
You're using one standard deviation, which, as you note, gives allows us to be 68% confident in whatever it says is significant.
(eg if a player had a .355 OBO, exactly equal to one standard deviation better than average, we'd be 68% confident that he's actually better than average)
By contrast, most major public opinion polls we see offer a confidence interval based on a 95% level of confidence and, I'd suspect that most of Sal's work is based on 99% or higher confidence levels ... mine, sadly, is only based on 90% confidence intervals -- which I can tell you with no reservations is woefully unideal -- but I lack sufficient data to produce results at a higher level.
"It's for your own good. Big strong Devo knows whats best for Poppy" -- Mossback
by devo on Mar 8, 2008 10:26 AM PST reply actions 0 recs
Devo, the use of one std deviation in the regression formulas
is rigorously correct, although I'll spare you the mathematical proof.
However, I didn't slap any confidence intervals on the regressed estimate of OBP. It is possible to do so, and the confidence intervals depend on the standard deviations of the measurements.
stat-addled alien overlord
by salb918 on Mar 8, 2008 12:45 PM PST up reply actions 0 recs
Sal... Home Run
You could run for Mayor of AN!
The monster at the end of this blog.
by grover on Mar 8, 2008 10:45 AM PST reply actions 0 recs
Nah
Comptroller, maybe. Or city manager.
I wasn't thinking about anything. I'll have to try to do that more often. @('.')@
by monkeyball on Mar 8, 2008 12:41 PM PST up reply actions 0 recs
That went right over your head
Couldn't find the clip, but after one of Cust's game winning homeruns the Announcer yelled that Cust could run for Mayor of Oakland.
Am I the only guy who thinks a referrence like that is easily understood?
The monster at the end of this blog.
by grover on Mar 8, 2008 12:53 PM PST up reply actions 0 recs
aw, nertz
I wasn't thinking about anything. I'll have to try to do that more often. @('.')@
by monkeyball on Mar 8, 2008 12:58 PM PST up reply actions 0 recs
I'm with you.
stat-addled alien overlord
by salb918 on Mar 8, 2008 1:04 PM PST up reply actions 0 recs
That's the important thing
We all knew the simian wasn't as smart as you.
The monster at the end of this blog.
by grover on Mar 8, 2008 1:39 PM PST up reply actions 0 recs
I thought it was a Deadwood reference
which would have made it a sort of subtle insult.
"And Julio Franco is batting right-handed!" -- Wayne Hagin, A's radio play-by-play, mid-80s
by Nick on Mar 8, 2008 2:29 PM PST up reply actions 0 recs
Sorry
Thought this was an Oakland A's related website.
;-)
The monster at the end of this blog.
by grover on Mar 8, 2008 2:53 PM PST up reply actions 0 recs
Still, Deadwood references are always welcome
Now, who can we call a loopy c.... never mind.
by thejd44 on Mar 8, 2008 5:35 PM PST up reply actions 0 recs
This stuff is fantastic!
As has already been mentioned, Sal is able to effectively communicate abstract math to the masses. You will surely spread the "stat gospel" far and wide!
Question....does there exist the occasional "phenom" player who skews the league average stats? What I'm getting at rather clumsily...is there a way to reliably identify such players early in their development via stats?
"You may glory in a team triumphant, but you fall in love with a team in defeat."--The Boys of Summer
by alox on Mar 8, 2008 11:06 AM PST reply actions 0 recs
Major league expectancies
for a player's minor league stats can often tell you whether a guy is a fluke or the real deal. For instance, I tend to think that Travis Buck is actually better than a .345 OBP player, because he has a bunch of minor league and college data points telling me that against inferior competition he's a much better than .345 OBP player.
There are certainly more extreme examples than this-- Justin Upton is not going to be a .220 hitter in the major leagues.
Your 2008 Athletics: It's Nothing Personal.
by PaulThomas on Mar 8, 2008 12:09 PM PST up reply actions 0 recs
Answer: not that I know of.
You can use all the available performance data, regress to the mean, and toss in an age adjustment. You'll always have a few young players who are absolute monsters (once they've accumulated the necessary amount of performance data). That's the best way I can think of to identify them.
One player is rarely good enough to skew the league-average stats, although if I'm not mistaken you do want to regress to the mean of all players EXCEPT the player in question (not sure about that, but I think that's right). In practice, in makes little difference.
stat-addled alien overlord
by salb918 on Mar 8, 2008 12:38 PM PST up reply actions 0 recs
What about Babe Ruth?
I realize that nobody was projecting Ruth when he played, but his offensive numbers were so far beyond anybody else's, AND the player population was much smaller than today, that I wonder what projecting him would've been like if sabermetricians were doing their thing 85 years ago.
During some of the seasons when Ruth outhomered, like, every other team or whatever it was, I would think you'd have to eliminate his numbers when regressing to the mean. Maybe he only raised that mean by a couple points, but it still doesn't seem right to have him factored into the calculations. In some ways, he really wasn't among that population. He was in a population of his own.
by thejd44 on Mar 8, 2008 5:38 PM PST up reply actions 0 recs
great work
Been reading for a year but never posted; just wanted to unveil myself and say thanks for the insightful writing. I work in economic consulting for a lot of econometricians, and this is all dead on and much appreciated...only on AN.
Though in terms of a population, I'm curious as to why the chosen population is all major leaguers, and not all major league outfielders. Is this because when isolating an offensive skill all players are seen as homogeneous hitters? If so, does the population include NL pitchers' batting statistics? Just taking a guess here but I feel like that would skew the data and bias the results.
by Capitol A on Mar 8, 2008 12:19 PM PST reply actions 0 recs
The choice of population is really up to you.
"All major leaguers" is a good starting point. But you can regress to the mean of "all outfielders," or "all comparable players of throughout history." In general, when I say "all major leaguers" for offense, I remove pitchers' batting stats. Good point.
stat-addled alien overlord
by salb918 on Mar 8, 2008 12:48 PM PST up reply actions 0 recs
on a related note...
how is a major leaguer defined? e.g., minimum number of PAs? Or all PAs in the majors last year?
by sec119 on Mar 8, 2008 1:56 PM PST up reply actions 0 recs
Generally, I use the aggregate
MLB stats for the year in question, pitchers removed. YMMV.
stat-addled alien overlord
by salb918 on Mar 8, 2008 2:10 PM PST up reply actions 0 recs
Another choice article on regression
stat-addled alien overlord
by salb918 on Mar 8, 2008 12:50 PM PST reply actions 0 recs
From the above links...
The following chart shows how our estimate of his true OBP skill changes with the number of plate appearances over which we make observe him to have a .377 OBP."
Should read:
The following chart shows how our estimate of his true OBP skill changes with the number of plate appearances over which we observe him to have a .377 OBP."
Also, mgl adds:
Also, this sentence should be qualified/explained because it is so important:What you find is that the many splits that are touted as important ("he hits .312 after the All-Star Break!") are actually meaningless.
You (Sal) should add:
...even for large samples of performance, let alone small ones. That is because there may be little or no spread in skill in the population of major league players. Remember I told you that if there is no variance in skill in the population, then we regress all the way to the mean, no matter how large our sample size is. In fact, that is true for most splits, even the ones that commentators and the entire "conventional wisdom crowd" think (or speak as if they are) are meaningful (e.g. day/night, home/road, first half/ second half). (Keep in mind that for many of these splits, there may be some very small spread in skill, which for all practical purposes is the same as no spread, since we would need an enormous sample size in order to have any meaningful regression less than 100%.)
Or something like that!
I would also love to see you add something to the effect of:
Since how much to regress a sample stat or "split" (or any sample measurement or series of measurements) varies with (is a function of) both sample size and the spread of skill in the population, there is no "magic point" at which we believe or don’t believe a sample number (like BA, OBP, HR rate, etc.). There is no point at which a "small unreliable sample" all of a sudden turns into a "large reliable sample." It is all relative. And it is sort of a smooth, continuous (but not linear as you can see from the "Buck" graph) function. The larger the sample size, the more "believable" the sample stat is (assuming there is SOME variance in skill in the population) and the larger the spread of skill, the quicker the "believability" increases as the sample size increases, where at zero sample size all sample stats are 0% believable (of course) and at an infinite sample size, they are 100% believable (again, assuming SOME spread of skill). As I said, how quickly the "believability" rises between a sample size of zero and infinity, depends on the spread of skill. So, whether a certain sample stat is "believable or not" (whether you consider the sample size adequate or not) on its face (without actually doing the regression) depends in your definition of "believable" (and "adequate").
stat-addled alien overlord
by salb918 on Mar 8, 2008 2:00 PM PST up reply actions 0 recs
This article was linked on FJM
But I wanted to post the link here because it's awesome. Even though he's probably just a mediocre pitcher, I really want Brian Bannister on the A's.
by thejd44 on Mar 8, 2008 5:40 PM PST reply actions 0 recs
I've said it before...
great job, sal. i wish i had the time and energy to delve into this stuff the way i attacked the backs of baseball cards back in the day. thanks for teaching this old dog new tricks in a way without making me feel like i've been chasing my tail without success for seventeen hours.
I'm here to talk about the past.
by 67MARQUEZ on Mar 8, 2008 5:54 PM PST reply actions 0 recs
Validity of this stuff?
This is the first time I've really read about any of this regression to mean information in regards to athletic performance, so I'm sure there's more to it than what I've just looked at today. I'm currently in my senior year of an Industrial and Systems Engineering program, at Auburn. INSY is predominately based on statistics, and a lot has to do with forecasting sales, production and inventory, so seeing baseball stuff in relation to the topics I'm studying is pretty cool.
I'm really finding it difficult to see the benefit of this statistical analysis in regards to human performance though. All regression analysis is, in regards to forecasting is plotting a line through observed values. The more accurate the forecast the less bias the line would have, meaning the deviations from above and below the plotted lines should sum to equal zero. So for example if in one year a player had 75 hits and the following year 125 ,and the forecasted line had the forecast at 100, it would be an accurate line. So as I understand it, forecasting to the mean is the prediction that these observations are getting closer to the slope of the forecasted line. The only problem with this is that as every new observation is made, the line's slope is recalculated and it changes.
So while in theory this could be a good way of forecasting performance, it can be very very far off in many circumstances. There are so many variables like injuries, switching leagues, steroids, etc. that could cause the actual performance to stray a good deal from the mean. And, that one outlier will again affect that mean, making the next forecast even less accurate. One thing I am curious about, is what these analysts feel is a good Mean Absolute Deviation(MAD) for their forecasts. Any of you guys know? I was looking at the batting predictions for some of our players from the Marcel Player Forecasts from 2006 for the 2007 season, and they aren't exactly incredible results...
There are so many ways of forecasting data; Exponential smoothing, Moving Averages, Holt's, Winter's Methods, seasonal factors, etc. Each has its own benefits, but it seems darn near impossible to get consistent accurate results with human performance. It's a whole lot different than producing sales, of say; Macbooks, which would spike around August, and December of every year. But unless an Apple store has all their employees call in sick and are forced to close 4 or 5 days out of the month, that trend will never change.
by KMoAsFan on Mar 8, 2008 5:58 PM PST reply actions 0 recs
Simple, non-mathematical answer to the question
Everything salb mentioned is by no means a perfect tool for projection, but it's a helluva lot better than "Joe Blow hit .333 in 12 at bats, so I project him to be an awesome, HOF player." And believe it or not, a lot of people - not just average fans, but commentators and other "experts" - do this on a regular basis.
by thejd44 on Mar 8, 2008 6:02 PM PST up reply actions 0 recs
You are thinking like I am thinking, KMoAsFan,
It's one thing to "regress to the mean" when bowling, or putting a golf ball from various distances, because each attempt is very similar in so many ways. This is untrue for baseball players hitting a baseball.
My best counter example: Rich Aurilia.
In 2001, he batted .324,way beyond anything he previously did (with 1800+ prior ABs) including 37 home runs, an incredible increasing over his previous high of 22. After signing with Seattle in 2004, he devolved into a .241 hitter (.303 OBP).
How could a large sample size prior to the 2001 season predict 37 home runs, and .324 BA?
Did Barry Bonds have anything to do with Rich Aurilia's performance in 2001? I think so. And, it is a non-statistical factor.
Seattle perhaps signed Aurilia based on "regression to mean", but for them, he was way below. Statistics failed them.
Major League Baseball is full of non-statistical factors. I think a discussion of "sample size" is not workable because it's never the same "sample" in too many ways.
"I never predict anything, and I never will." Paul Gascoigne, English footballer
by One won lost won on Mar 8, 2008 11:33 PM PST up reply actions 0 recs
Nonsense
The fact that it's not literally the same sample doesn't mean you can't gain anything by making approximations.
Rich Aurilia is actually a fantastic example of regression to the mean, as he collapsed back to career levels after a couple of unsustainable outlier seasons-- a predictable outcome. Seattle signed him because they were doofuses who thought his 2001 was a real performance level, not a fluke. The Giants did likewise the second time around, substituting "2006" for "2001."
Your 2008 Athletics: It's Nothing Personal.
by PaulThomas on Mar 9, 2008 12:30 AM PST up reply actions 0 recs
HUH?
Aurilia's mean batting average was .296 through 2003. And even if you get rid of his first season with his .474 average with 19AB's his mean is still .273, so how is hitting .241 regressing to anything? Also, it's somewhat contradictory to statistical analysis to just throw out a bad year. I dont really feel like plugging every one of his stats in excel, but just by looking at them you can tell that year in seattle is worse than his previous career averages. His big years from 1999-2001 really illustrate how difficult it can be to forecast. If you were using some sort of mean regression there is no possible way you could predict this guy's power numbers were going to explode. I'd be willing to bet, along the lines of what One won lost won said, if you could've seen him mashing baseballs in his offseason training, or hanging out with Bonds and Giambi, you would've had a better idea about what he was going to do in the future than you did by looking at his numbers through 1998.
Fundamentally I just don't see how this works. Either way you look at it, just using a mean is going to miss forecasting any huge years a player is about to have, or any terrible year a guy is about to have. Isn't that the whole idea of using this, to get an edge up on the "old timers"? Anyone can look at a guy who has been in the league 10 years and get an idea how he is going to perform, but finding these players before they blow up, is what makes GM's look like geniuses, and using a mean is never, ever, going to give you that.
by KMoAsFan on Mar 9, 2008 1:08 AM PST up reply actions 0 recs
Exactly ...
because, see ... as you've hit upon, regressing to the mean is the be all, end all of statistical projections.
Baseball observers with a statistical bent certainly are not smart enough to identify and deal with the fact that there is more going on than simply regressing to the mean. They would never simply realize that before you take into account other factors (like, say, a change in teammates) you have to first regress to the mean and then understand that these other factors may ultimately lead them to believe that the player will ultimately be either better or worse than the mean. No, because, you see, as this article clearly states, all players ONLY regress to the mean and so all players that are better than the mean will get worse and all players that are worse than the mean will get better.
Either that or maybe Sal explained to you one of the simplest concepts of statistical analysis and you've come up with (to your great credit) a way in which this very basic concept is not sufficient to make accurate forecasts in and of itself.
"It's for your own good. Big strong Devo knows whats best for Poppy" -- Mossback
by devo on Mar 9, 2008 1:40 AM PST up reply actions 0 recs
Regarding regressing to the mean
missing huge years or terrible years, yes; BUT, most teams are not merely interested in how a player is going to perform THIS year. They typically care about THIS year, NEXT year, the year after that, and so on.
For example, if I am the Mets, signing Carlos Beltran to a huge deal from 2005 onwards, I care about more than 2005. Even if the Mets could have forecasted that Beltran would struggle in 2005, due to injuries, signing him would still have not been a bad idea. It is only a year.
Also, "Anyone can look at a guy who has been in the league 10 years and get an idea how he is going to perform, but finding these players before they blow up, is what makes GM's look like geniuses, and using a mean is never, ever, going to give you that."
The problem is that it seems many people CANNOT get an idea of how a guy is going to perform. It seems many people have issues regarding both sample and context, when it relates to baseball stats.
ZIPS: Milledge: 466 HR, 485 2B, 2282 hits, 278-379-524
by rfloh on Mar 9, 2008 7:58 AM PST up reply actions 0 recs
It means -exactly- that " you cannot gain anything"
There is a famous "correlation" between the rate of sales of black bras in Miami and amount of rainfall in a certain town in India. Dead-lock correlation that went beyond "chance". But what could it do for anyone?
The magazine Economist also pointed out that there was a significant correlation in the Middle Ages in Europe, between storks roosting on your house, and giving birth. Hence the idea that "the stork brought the baby", not unfounded statistically, or anecdotally.
A case for "regression to mean" may be made, or disproved, but overall, I find that it is only a "statistical seperator": "I use statistics, you don't." It is as useful as the color of your car. Certain cars of a certain color get in more accidents, but whether a person's car is a certain color does not affect their insurance rates, and therefore is not "statistically relevant" to the insurance industry ( a big big user of stats, if there ever was one).
Just for more "baseball stats fun", consider HoFer Charlie Gehringer and his 1940 season. Outstanding, and the Tigers won the pennant. What happened to Charlie in 1941? Predictable? Is it because the Tigers sunk to below .500, or is it the other way around?
In the realm of Major League Baseball, IMHO the statistical emphasis should be on 'contagion'. Is hitting contagious? Why? What players are more likely to participate when "everyone is hitting" or "no one can get a hit"? Milton Bradley vs. Detroit in the 2006 ALCS was an example of "non-contagion" with his performance IMO.
Is pitching contagious? When the A's had the "Big Three" they often spoke of the effect of the other two. Is there an effect? What about Robin Roberts and his success on a losing club? Not contagious?
The field of "contagion" should be statistically derived and studied.
"I never predict anything, and I never will." Paul Gascoigne, English footballer
by One won lost won on Mar 9, 2008 2:33 PM PDT up reply actions 0 recs
I'm not entirely sure if this is a serious post
But hitting is not contagious. Pitching is not contagious.
You know why hitters seem to hit or not hit all at once? They're probably facing a really good or really bad pitcher. Some other things factor in: If player A gets on, player B is more likely to get on because the pitcher is now in the stretch. Things like that add to it. But there's no magical force making player B hit because A did.
There's really no such thing as momentum from a game to game basis. Not in any sport. It just doesn't happen. That's what you bring up with the example of starting pitchers "feeding" off each other. The fact that you can cite so many examples for or against this phenomenon is proof that it doesn't exist. If it did, it wouldn't only exist sometimes (when bad announcers and players spouting cliches want it to be so).
by thejd44 on Mar 9, 2008 10:52 PM PDT up reply actions 0 recs
Do you have the statistics to prove that
hitting, or pitching, is not contagious?
Currently, I am consumed with about five major, and about twenty minor projects. So I do not have the time or inclination of say, Mr. Salb, to create an in-depth statistical construct of "contagion". However, I believe it has much more credibility than say "clutch hitting".
Baseball is a very mental game. One of the most difficult aspects is "compartmentalization". That is, the most successful players are able to put in a compartment, mentally, one game away from another. If you're a closer, it is very important that you not let failure in one game affect another, or allow a string of successes to leave you lax and unprepared. The ability to isolate is so difficult to overcome, its opposite, transferring the success or failure of something just observed to oneself, appears very easy to do. I am not alone in thinking that hitting is contagious. The idea has been around a long time.
Salb918, I just want to say I appreciate the work you put into this thread, and in no way do I feel dismissive or unappreciative of the work, even if I disagree with the ultimate conclusion. I am not your peer in statistics, by any measure, so I appreciate the time you put in to write this piece.
"I never predict anything, and I never will." Paul Gascoigne, English footballer
by One won lost won on Mar 10, 2008 4:47 PM PDT up reply actions 0 recs
You're saying that if any statistical model makes any approximations whatsoever
with regard to reality, that it is completely worthless?
OK, you need to spend your time talking to (or more likely getting hung up on by) the Federal Reserve chairman, or something, not us. Apparently they can't actually model the economy. Their predictions are hopeless! Repeal the Bank of the United States!
Saying regression to the mean does not exist is like saying that one plus one equals three. Not only is it theoretically preposterous, it's disproven over and over every day in the real world.
Your 2008 Athletics: It's Nothing Personal.
by PaulThomas on Mar 9, 2008 11:35 PM PDT up reply actions 0 recs
My comment is not negating statistics, or "any statistical model"
Paul, you are not reading what I wrote. And I don't need (nor does anyone here need) a judgment and sentence ("you need to spend your time...") from you. I am requesting that you NOT recast what I write into a laughable "straw man". "You're saying" is redundant. I say what I say, and no "between the lines" stuff. And, I never used the phrase "completely worthless". Yet, you attributed that to me. Again, a judgment and condemnation with no facts to support it.
And second, don't use the presumptive "us". Whoever is sharing your keyboard, they can write their own passage if they wish to chime in.
To make it clear, I was stating that, IMO, that a case can be made, or not made about individual players in baseball, and 'regression to mean', but IMO 'regression to mean' knowledge, as a statistic about an individual player, will not provide anything gainful. And by example, I gave the "car color" as an example. Insurance companies don't use it, so it is not "gainful" to them. But that should not be interpreted or judged therefore as "completely worthless".
"I never predict anything, and I never will." Paul Gascoigne, English footballer
by One won lost won on Mar 10, 2008 4:30 PM PDT up reply actions 0 recs
Didn't I go through this earlier on the thread?
If you cannot gain anything from information, it is, ipso facto, completely worthless.
I think it's pretty clear at this point that I have absolutely no idea what you're actually trying to argue here. (Thus the apparent straw man.) Whether that's your fault for not explaining it coherently, or mine for being unable to parse it, is not relevant. You're going to have to restate your position in different terms.
Your 2008 Athletics: It's Nothing Personal.
by PaulThomas on Mar 12, 2008 12:41 AM PDT up reply actions 0 recs
This is good, but misleading in itself
IMHO... Basically, what's being said here is that most major league baseball players are a lot like most other major league baseball players.
I totally understand that Sal's using this to prove a point about small sample sizes, but looking at his original example of the top 10 batting averages from 2001 to 2002, it seems overly simplistic to me to figure that the decline in all of the top batters' averages were regressions to the mean. If you were to look at the overall batting average in the MLB, it dropped from .264 in 2001 to .261 in 2002, so one might wonder if the drop in the top ten averages was due to some occurrence across all of baseball.
Similarly, it seems like somebody might look at the example of the "extreme" cases "regressing". The example might be taken to suggest that everyone will regress in a similar fashion, which obviously isn't true. In the same 2001-2002 example, the average of the top ten batting averages in 2002 is actually .336, down from .337. You have to consider that only four of the "top ten" hitters in 2001 were in the top ten in 2002. When you look at 2002's actual top ten, it's clear that the bell curve pretty much stays the same. Only the names of the data points have changed. So the suggestion here is that 6 players have regressed, but 4 still manage to stay in the extreme, which might be seen as contradicting what Sal's saying (even though I think it really isn't). One explanation might simply be that opposing pitchers decided to pitch differently to these players, knowing that they're the cream of the crop. The course of one season is maybe too small of a sample size to see if that was truly the effect, but one season is exactly the amount of time that really matters in the grand scheme of things, right?
I'm not really disagreeing with anything Sal's said, I just hope that one of the takeaways from this is that a quantitative analysis should always be accompanied by a qualitative analysis, if only to provide perspective and context. Try as hard as anyone (*cough* bill james *cough*) might, there will never be one number that will, by itself, truly describe a baseball player's skills.
by Dex on Mar 8, 2008 7:35 PM PST reply actions 0 recs
First of all ...
the only people who have ever suggested that scouts and stats cannot co-exist are the anti-stats crowd. Bill James certainly never has.
Secondly, regression to the mean doesn't mean the league as a whole will have any less distribution. It won't -- of course not. If that were the case, after 100+ years of playing the game, luck would have been eliminated. Of course that doesn't happen. Luck is a real thing -- it just tends to attach to different people from one year to the next.
"It's for your own good. Big strong Devo knows whats best for Poppy" -- Mossback
by devo on Mar 8, 2008 8:04 PM PST up reply actions 0 recs
I think it usually makes more sense
to call it "chance" rather than "luck."
The A's colors are green and gold.
by mikeA on Mar 8, 2008 8:22 PM PST up reply actions 0 recs
I'm fine with that ...
I'll try to adjust my habits ...
"It's for your own good. Big strong Devo knows whats best for Poppy" -- Mossback
by devo on Mar 8, 2008 8:28 PM PST up reply actions 0 recs
It's working already!
The latter involves some "luck" (more like "chance")...
test
Really, the most logical thing would be for all ANers to give me $10.
/test
The A's colors are green and gold.
by mikeA on Mar 8, 2008 9:05 PM PST up reply actions 0 recs
I have this sudden urge to send mikeA $10 ...
I only wish I knew where to send it ...
"It's for your own good. Big strong Devo knows whats best for Poppy" -- Mossback
by devo on Mar 8, 2008 11:07 PM PST up reply actions 0 recs
First: I love your Uncommon Sportsman blog. Love it.
- Those guys who are at the top of the "true talent" bell curve are the most likely to end up on the leaderboards, but the guys who actually end up on the leaderboards are those who are the end of the "sample performance" bell curve. The latter involves some "luck" (more like "chance") in the sense that the sampling of their performance happened to come out the high end of their talent (although a change in skill is possible as well).
- You can adjust for offensive context by using a stat like OPS+; you'd find the same thing.
- When the sample size is too small (e.g., the regression is very aggressive), visual observation is actually quite important. That's why nobody ever drafted a high school player based on their batting average.
stat-addled alien overlord
by salb918 on Mar 8, 2008 8:36 PM PST up reply actions 0 recs
Well, except the Mets in 1980...
and A's fans everywhere are happier for it.
(OK, I'm sure they didn't actually draft Beane based on his HS batting average. If I recall, though, it did play a role.)
Your 2008 Athletics: It's Nothing Personal.
by PaulThomas on Mar 9, 2008 12:33 AM PST up reply actions 0 recs
I thought it was that he could run faster than the black people ...
"It's for your own good. Big strong Devo knows whats best for Poppy" -- Mossback
by devo on Mar 9, 2008 1:43 AM PST up reply actions 0 recs
I totally agree
I just wanted to make it clear to others that looking at the top batters from one year to the next is itself a small sample size. I'm sure we could find a year where most of the top ten batters actually distanced themselves further from the mean from one year to the next. It doesn't mean what you're saying is false. It just ends up that Joe Morgans will look at this sort of thing and say, "Well that doesn't take into account blah blah blah", when you (we) are not really saying anything about those things and are just trying to improve a particular aspect of scouting and measurement.
Thanks for the kind words about Uncommon Sportsman! I will shamelessly plug my other blog and revisit Gaslamp Ball's conversation with DePodesta about combining quantitative and qualitative in baseball. It's interesting if only to make people realize that DePodesta wasn't just the guy who used Excel a lot in Moneyball. :)
by Dex on Mar 9, 2008 6:42 AM PST up reply actions 0 recs
re
I just wanted to make it clear to others that looking at the top batters from one year to the next is itself a small sample size.
Yes.
I'm sure we could find a year where most of the top ten batters actually distanced themselves further from the mean from one year to the next.
Not to sound snarky, but...let me know when you find that year.
stat-addled alien overlord
by salb918 on Mar 9, 2008 8:39 AM PST up reply actions 0 recs
I'm totally gonna do it... very very soon... I think.
by Dex on Mar 9, 2008 11:58 AM PDT up reply actions 0 recs
Before the post falls off the front page...
What I've found so far...
From 1968 to 1969
Rose .335 .348
Alou .332 .331
Alou .317 .282
Johnson .312 .315
Flood .301 .285
Yastrzemski .301 .255
Jones .297 .340
Beckert .294 .291
McCovey .293 .320
Staub .291 .302
Five guys improved on their numbers for a grand "total" of .307 to .307, though admittedly they didn't really distance themselves from the pack. Another set that's kinda interesting, but not really, are the top ten from 1921 to 1922...
Hornsby .397 .401
Heilmann .394 .356
Cobb .389 .401
Ruth .378 .315
Sisler .371 .420
Speaker .362 .378
Jacobson .352 .317
Tobin .352 .331
Roush .352 .352
McHenry .350 .303
The overall BA dropped, but you had some guys there really push away from what a regression to the mean might have suggested. i'm still optimistic that there's a year where most of the top ten improve on their previous year, but it takes some doing looking for it by hand. :)
by Dex on Mar 10, 2008 10:20 AM PDT up reply actions 0 recs
For what it's worth,
I'm morally certain that if the batting average of the top 10 batters stayed the same from 1968 to 1969, that those 10 batters moved closer to the "pack" (and thus regressed to the mean somewhat). Why? Because batting averages (and virtually every other hitting indicator) hit historic lows in 1968. It was the worst year for hitters since the Dead Ball Era. Leaguewide batting averages went way up in 1969, so if the top hitters remained constant instead of "rising with the tide," then relative to the rest of the league, they sank somewhat.
Your 2008 Athletics: It's Nothing Personal.
by PaulThomas on Mar 12, 2008 12:46 AM PDT up reply actions 0 recs
sal,
perhaps in your spare time you could look at the 1910-11 seasons, where there was a big bump in offense (or was it a step backwards for pitching?). i know it's only a two year span but just thought it was interesting. i am sure there are other seasons like it...
I'm here to talk about the past.
by 67MARQUEZ on Mar 9, 2008 9:52 AM PST reply actions 0 recs
I'll keep it in mind...
stat-addled alien overlord
by salb918 on Mar 9, 2008 12:01 PM PDT up reply actions 0 recs
Well I do get this concept.
But sometimes there is only a small sample size. Yep...that's my genius observation. ;-)
by IM4Oakgal on Mar 9, 2008 10:18 PM PDT reply actions 0 recs
Well then you just need to augment the small sample size ...
with an investment at your local Good Vibes.
But seriously, if you do not have information on a player, you should try to get more. If you cannot get more, then you should just go ahead and flip a coin because that will be comparable in rigor to making decisions based on SSS statistics.
"It's for your own good. Big strong Devo knows whats best for Poppy" -- Mossback
by devo on Mar 10, 2008 12:16 AM PDT up reply actions 0 recs
in other words
size does matter?
I'm here to talk about the past.
by 67MARQUEZ on Mar 10, 2008 4:51 AM PDT up reply actions 0 recs
And, perhaps, even examine the basis of the "small sample"
Chris Denorfia is a good example. His total PAs in the Majors is small, but people still say, "He hit .280-something in a brief call-up with Cincinnati". You look at the game-by-game stats, and you see the last three games of the season are where he did his damage, and raised is average up to respectability. A case where a small sample size is affected by even a smaller sample size, and where the sampling stopped!
"I never predict anything, and I never will." Paul Gascoigne, English footballer
by One won lost won on Mar 10, 2008 4:57 PM PDT up reply actions 0 recs
Fortunately ...
in Denorfia's case, we know that he batted .329 with a very solid 75:95 bb:k ratio in AAA (over two seasons) and he dominated A+ and AA in his second chance at each.
If you omit 9/26 and 9/27 against Florida, he would have batted .245 instead of .283 ... of course there is no reason to omit those two games and the 6 hits he amassed over the span are no more or less representative of his talent than the 24 he amassed over the other 47 games he played in.
"It's for your own good. Big strong Devo knows whats best for Poppy" -- Mossback
by devo on Mar 10, 2008 11:27 PM PDT up reply actions 0 recs
I agree with your assessment
where you say, "There is no reason to ignore", but it does make one ponder the fact that Dontrelle Willis was pitching and gave up all those hits in that game. Near the end of the season, and how can you factor in a possible, and probable, "Ahhhhh F**k it!" factor, where the pitcher says, "Here, hit it." after a point. And the manager changes his normal thinking to "Leave him out there."
??
I wasn't at the game, so I have no evidence. But it goes along with my "human factors argument against" the regression to mean philosophy and baseball hitting performance.
"I never predict anything, and I never will." Paul Gascoigne, English footballer
by One won lost won on Mar 20, 2008 3:26 PM PDT up reply actions 0 recs
OMG Devo...LMAO.
I had to Google Good Vibes and you gave me quite a surprise.
by IM4Oakgal on Mar 11, 2008 12:40 AM PDT up reply actions 0 recs
Glad you enjoyed it ...
the sample was apparently adequate ...
"It's for your own good. Big strong Devo knows whats best for Poppy" -- Mossback
by devo on Mar 11, 2008 1:16 AM PDT up reply actions 0 recs























