clock menu more-arrow no yes mobile

Filed under:

Goldilocks And The Three Sample Sizes

I'll never be a foreign diplomat -- for one thing, somehow I seem to have upset the Dutch -- but I have to think there is common ground to be found among some of the warring factions on AN. And an innocent comment by PT gave me a thought that gave me an idea that gave me this post. The comment was this:

Zito has enough pitching experience to make it clear that he is a pitcher who outperforms his FIP
His true-talent BABIP appears to be about .280, as compared to a league average of about .295. That’s a significant difference— half a win per season or more.

The same analyst (PT) who believes it's too soon to say whether or not Trevor Cahill can reliably outperform his FIP is comfortable concluding that Barry Zito can outperform his FIP, and that makes sense because the samples are significantly different. In fact, PT has been utterly consistent in saying, "Talk to me in 2015" in regards to Cahill, because then and only then will the sample be large and the data reliable.

Of course it's also pretty logical that fans in 2010 might not want to limit their analysis to "silence for five years." Had people said, in 2002, that they believed Zito was one of those pitchers who outperformed their FIP, the reply would have been the same as it with Cahill today (even though eventually they would have been vindicated) -- so does that mean everyone should be silent until the sample is large enough, or does it mean...I have an idea...

I think there is consensus that when a sample is too tiny, you can't really draw any conclusions. You might flip on the TV one day just as 2Bman Brooks Conrad is diving to his left to take a hit away in the best play he'll ever make in his life, but that doesn't mean anything in regards to his defensive acumen at 2B.

There is also consensus that when a sample is large, as it is with Zito, you can draw conclusions that are statistically reliable and proven. 10 years of L/R split data, 8 years of BABIP against data, and you can align your analysis with "reliable data" to support your conclusions.

Where I think the polarization occurs is in regards to samples that are "medium" -- that is, neither absurdly tiny nor sufficiently large. And where I think there could be common ground is if we could agree that a medium (but not tiny or large) sample is too small to say, "This is how it is," yet too large to say, "You have no justification for your position."

I would submit that the correct position for small samples is, "My take is that when the sample is large enough, it will confirm that..." That weds observation ("I've seen enough to form a reasonable hypothesis") with math ("But we haven't seen enough to know").

In other words, sometimes we're caught in between "no clue yet" and "fact." The guy who is 1/10 with 6 Ks against LHP may or may not be able to hit lefties, and the guy whose OBP over 10 years is .400 can clearly hit them just fine. As for the guy whose numbers, after 3 seasons, show a significant disparity? It may be enough to commit to a hypothesis and find out later if your eyes, gut, and brain were right, yet not enough to know what you'll really know when 100% of the precincts have reported.

Perhaps if we distinguish between tiny, medium, and large samples, and accept that the middle ground is neither "too tiny" nor "sufficiently large," but rather is just "fair hypothesis" time, we can all enjoy the porridge together and remember that we're all here for the same true purpose: To get that blonde wench arrested for trespassing.