Not UZR-Friendly: The 3-Year Sample Problem

This post discusses a problem and why it's a problem, without actually making any attempt to solve the problem. I believe this is extremely useful. For example, the problem with poverty is that poor people need a lot more money than they have, but I have no idea how to help them get any more of it. That's really helpful, right? Tune in next week, when I discuss global warming.

UZR, among the best defensive metrics we have, has the built-in liability of requiring about three years of data before you have a sufficient sample. I've mentioned before that besides the obvious problem of needing to wait three years to draw your first conclusions, there is also the additional issue of how much a player's skill, or situation, can change over a three-year period. By the time we have three years of data, is the 2008 data still relevant as we look at this season or try to make predictions about 2011?

Let's take a look around the A's defense and see what happens when we try to take three seasons of defensive data...

Daric Barton: No can do, because Barton has not played three full seasons thanks to his extensive minor league stint in 2009.

Mark Ellis: In 2008, Ellis was a 30-31 year old still at an age where you could reasonably expect him to be at or near peak performance. In 2009, Ellis played through shoulder problems much of the season. In 2010, Ellis is a 32-33 year old now at an age where defensive decline is common. That's an awful lot of "moving parts" to bunch together and call it a "3-year sample."

Cliff Pennington: No can do, because Pennington is just now completing his first full season.

Kevin Kouzmanoff: Kouz is the exception that proves the rule, in that you can actually get 3 years of data during which time he was at roughly the same stage in his career (not a rookie, young and at/near his prime).

Rajai Davis: In 2008, Davis got very sporadic playing time for the Giants and then the A's. In 2009 Davis played more regularly (125 games) in CF. In 2010, Davis has played regularly but at a variety of OF positions.

Coco Crisp: To get three years worth of data, you need to go back 4-5 years because injuries have shelved Crisp for significant chunks of almost every season. That means looking at ages 26-30, which is a wide range in the pre-prime/prime/post-prime age of a defensive player. He has also spent much of his time in CF in these years playing just after recovering from various injuries of varying magnitude, often following a substantial layoff.

Ryan Sweeney: In 2008 and 2009, Sweeney bounced between CF and RF, intermittently battling knee problems. In 2010, Sweeney played exclusively in RF on bad knees that would require season-ending surgery by July.

If you go around the league, it's hard to find too many players whose defensive status can be considered comparable across three of their own seasons. If they weren't awfully young, raw, and green the first of those years they were awfully old, declining, or past his prime the last one. If they didn't play multiple positions, injuries changed their profile at some point. If they have been in the league three years at all, they didn't get regular playing time in 1-2 of them.

So mostly, you aren't exactly looking at "three years of data for one player." You're more looking at three years of data, from three somewhat similar players. If even that similar at all. Assuming you even have three years. It may be the best we have right now, but you see the problem.