Two of a kind, for your information, we're two of a kind
Two of a kind, it's my observation, we're two of a kind
Like peas in a pod
And birds of a feather
Alone or together, you'll find
That we are two-oo-oo, oo-oo-oo, oo-oo-oo, oo of a kind
6’1" 231 lbs
5’11" 170 lbs
Eerie, isn’t it?
Aside from practically being twins, these two outfielders have another thing in common – absurdly high batting average on balls in play (BABIP). As A’s fans, we should be hoping that they have another thing in common.
Ichiro Suzuki has PECOTA (a forecasting system created by Nate Silver of Baseball Prospectus) confounded. PECOTA is absolutely sure that Ichiro is just getting lucky – that like most luck does, his luck will end and he will be regressing to the mean any minute now.
PECOTA is stubborn – as it should be. Players who break the mold as thoroughly as Ichiro does are few and far between – but that is kind of the problem.
What makes PECOTA so special is that it is based on more than just a player’s past statistics. It uses those statistics as well as information about that player (physical information, position, favorite Hugh Grant movie) to identify similar players and then it makes projections based on how those players performed going forward.
It runs into a problem, though, with players that are too unique. I am sure none of you, my dear readers, would be surprised to learn that Ichiro is, in fact, truly unique. The curious thing is that PECOTA not only knows it, it can identify and measure that uniqueness. Along with the projections, PECOTA gives players a “Similarity Index” score:
Similarity Index is a composite of the similarity scores of all of a player's comparables. Similarity index is a gauge of the player's historical uniqueness; a player with a score of 50 or higher has a very common typology, while a player with a score of 20 or lower is historically unusual. For players with a very low similarity index, PECOTA expands its tolerance for dissimilar comparables until a meaningful sample size is established.
Ichiro has a Similarity Index of 19, making him historically unusual.
This, my friends, is where Jack Cust comes in. Jack Cust has a Similarity Index of 13. Is he borderline Hall of Famer, Jim Thome … or is he Mike Epstein, a thoroughly unimpressive first baseman who managed a couple of lucky seasons, thanks mostly to a fluky batting average on balls on in play? Or is he really neither of them (ding, ding, ding), which is, umm, kind of the point of this article.
While PECOTA has not caught on, Ichiro has a pretty well established ability to consistently make more productive contact than the typical hitter. In 2007 he posted a .390 BABIP – high, even for him, but not that high, at least for him. He has a career .359 BABIP. Heck, it was not even a career high. Not surprisingly, that came in 2004, the year he knocked 262 hits, thanks to a .401 BABIP. By comparison, Ted Williams only managed a .378 BABIP in 1941. That was one of three seasons that he struck out fewer times than he went yard, which led to his .406 batting average, despite the low babip.
PECOTA, having read and re-read Sal’s article from last week, doesn’t believe that Ichiro’s high batting average on balls in play is sustainable, despite the fact that he has done it seven years in a row. Since PECOTA is only looking at the numbers that Ichiro and his comparable players produced, this is absolutely appropriate.
Ichiro is so successful because he combines an incredibly high ground ball percentage (3rd in MLB) with the highest infield hit percentage in the game. What’s more, Ichiro basically does not hit fly balls (2nd lowest in MLB), which, as we’d expect, limits his power potential but it drives his high batting average. On top of that, he was second in the league in bunt hit percentage (among players with at least 6 bunts). That’s pretty frickin’ unique – both in that he is so consistently at the extremes and in that, unlike fellow ground ballers Luis Castillo and Tony Pena, he doesn’t suck.
Jack Cust is pretty unique himself (are we noticing a trend here?). He’s not beating out grounders left and right. He was actually dead last in the league, with an incredible 0 infield hits. Seriously, Frank frickin’ Thomas managed to beat out 4 dribblers. Even Bengie Molina beat out 3, tying Prince Fielder who was too fat to not sell jeans. What is incredible is that, not only was Jack Cust the only player to qualify for the batting title who did not beat out a single infield hit, no one else had fewer than 2.
I’m getting a bit off track, though. Jack Cust’s incredible ability to not beat Bengie Molina in a foot race is not a reason for optimism. Seriously, my 108-year-old grea grandmother could beat Bengie Molina in a foot race … and she’s been dead for 30 years. He’s that slow. The Slowskies of Comcast fame aspire to be Bengie Molina. Yeah, the Slowskies – they’re turtles. Turtles are slow – but not as slow as Bengie Molina (has this joke been sufficiently beaten into ground yet?).
There are two key reasons for optimism.
1. Jack Cust hits a ton of line drives. He was 8th in the league with a 23.2 line drive percentage in 2007. Line drives are much more likely to result in hits than ground balls or fly balls.
2. While Jack Cust does not hit many fly balls (131st out of 162 qualified batters) but the few that he does hit leave the park more frequently than anyone else. Fly balls are great when they leave the park, but they’re pretty bad otherwise.
PECOTA did something curious. It projected Jack Cust to be pretty good in 2007, hitting 20 home runs with a .839 OPS and a .312 BABIP. Cust exceeded those numbers by a fair bit with 26 home runs, a .912 OPS and a .366 BABIP. PECOTA, feeling snubbed after Cust blew away its hard work then lowered his projections for 2008, down to a .821 OPS – and back to a .312 BABIP. PECOTA basically did two things. It assumed he was not going to strike out so ridiculously frequently and that his contact would cease to be more productive than normal. Given the small sample size that Custs’ career offers at this point, that is absolutely the right thing to do. Statistically speaking, there is no reason to believe that Jack Cust had anything except a fluky season.
Fortunately, though, we are not just mindless adding machines. We can process things other than numbers and include them in our understanding of the game. Jack Cust is a unique player. He does play the game differently than your typical ball player. He actually plays it pretty similarly to Ryan Howard who has shown a consistent ability to strike out, hit line drives and watch his fly balls leave the park significantly more often than anyone else … anyone else except Jack Cust, that is.
Writing this article may be a bit statistically irresponsible on my part. There is a very reasonable chance that everything I have said about Jack Cust is rubbish -- that like virtually every other player in the game, he is actually normal. What’s more, we all have to understand that I am writing this article under the assumption that Jack Cust is extremely abnormal. By that, I mean, the overwhelming majority of players are more or less normal. Assumptions or projections based on normalcy work and are absolutely appropriate in the great majority of situations.
His minor league numbers provide a mixed bag. In AAA in 2005 and 2006, his GB% was comparable (43.5%) but he traded in a number of his line drives (15%) for fly balls (38.2%) – a return to which would generally portend to bad results in the future. 8% fewer line drives will likely result in 5-6% fewer hits. 4% more fly balls will likely result in 4% more homeruns. If you make those adjustments to his 2007 numbers he would have had 5 fewer singles and one extra homerun, leading to a .895 OPS, losing 11 points of OBP and 6 points of Slg.
I’d like to believe Jack Cust is a unique player. But he has to be. If he cannot maintain his abnormally high LD%, BABIP and HR/FB Rate he won’t be much of a player. Thankfully, though, Jack and Ichiro may just be two of a kind, for your information …
Two of a kind, it's my observation, they're two of a kind
Like peas in a pod
And birds of a feather
Alone or together, you'll find
That they are two-oo-oo, oo-oo-oo, oo-oo-oo, oo of a kind
A couple of questions that I’d like folks’ thoughts on:
1. Why does PECOTA believe in Ryan Howard but not Jack Cust (a fairly similar player) or Ichiro Suzuki (who is similar in his uniqueness, but, unlike Jack Cust, has an even more established track record than Howard)?
2. For players that are “historically unique”, why not replace (or average, weighted based on uniqueness) their PECOTA projection with a simpler projection system that does not rely on comparable players?