Navigation: Jump to content areas:


Pro Quality. Fan Perspective.
Login-facebook
Around SBN: On Hazards And Hulks And Tigers, Oh My!

Not UZR-Friendly: The 3-Year Sample Problem

This post discusses a problem and why it's a problem, without actually making any attempt to solve the problem. I believe this is extremely useful. For example, the problem with poverty is that poor people need a lot more money than they have, but I have no idea how to help them get any more of it. That's really helpful, right? Tune in next week, when I discuss global warming.

UZR, among the best defensive metrics we have, has the built-in liability of requiring about three years of data before you have a sufficient sample. I've mentioned before that besides the obvious problem of needing to wait three years to draw your first conclusions, there is also the additional issue of how much a player's skill, or situation, can change over a three-year period. By the time we have three years of data, is the 2008 data still relevant as we look at this season or try to make predictions about 2011?

Let's take a look around the A's defense and see what happens when we try to take three seasons of defensive data...

Star-divide

Daric Barton: No can do, because Barton has not played three full seasons thanks to his extensive minor league stint in 2009.

Mark Ellis: In 2008, Ellis was a 30-31 year old still at an age where you could reasonably expect him to be at or near peak performance. In 2009, Ellis played through shoulder problems much of the season. In 2010, Ellis is a 32-33 year old now at an age where defensive decline is common. That's an awful lot of "moving parts" to bunch together and call it a "3-year sample."

Cliff Pennington: No can do, because Pennington is just now completing his first full season.

Kevin Kouzmanoff: Kouz is the exception that proves the rule, in that you can actually get 3 years of data during which time he was at roughly the same stage in his career (not a rookie, young and at/near his prime).

Rajai Davis: In 2008, Davis got very sporadic playing time for the Giants and then the A's. In 2009 Davis played more regularly (125 games) in CF. In 2010, Davis has played regularly but at a variety of OF positions.

Coco Crisp: To get three years worth of data, you need to go back 4-5 years because injuries have shelved Crisp for significant chunks of almost every season. That means looking at ages 26-30, which is a wide range in the pre-prime/prime/post-prime age of a defensive player. He has also spent much of his time in CF in these years playing just after recovering from various injuries of varying magnitude, often following a substantial layoff.

Ryan Sweeney: In 2008 and 2009, Sweeney bounced between CF and RF, intermittently battling knee problems. In 2010, Sweeney played exclusively in RF on bad knees that would require season-ending surgery by July.

If you go around the league, it's hard to find too many players whose defensive status can be considered comparable across three of their own seasons. If they weren't awfully young, raw, and green the first of those years they were awfully old, declining, or past his prime the last one. If they didn't play multiple positions, injuries changed their profile at some point. If they have been in the league three years at all, they didn't get regular playing time in 1-2 of them.

So mostly, you aren't exactly looking at "three years of data for one player." You're more looking at three years of data, from three somewhat similar players. If even that similar at all. Assuming you even have three years. It may be the best we have right now, but you see the problem.

Comment 42 comments  |  0 recs  | 

Do you like this story?

Comments

Display:

In the idea the "they can't stay healthy enough for us to accurately gauge their UZR",

the “they can’t stay healthy enough” part is the A’s bigger problem. But, yes, it is a problem and hopefully hit f/x or something comes out soon to help us gauge defensive skills better.

A's Fan in Sweden

"Some of us know him as the a-hole who piled into Ray Fosse in an All-Star game (it's why Ray is the way he is folks)" - OptimistPrime

by travdog6 on Sep 25, 2010 7:08 AM PDT reply actions  

the problem is more "what's the alternative"

There is somehing in the Minors, no?

"The ego, the super-ego, and the Ed" - dannycakes

by Future Ed on Sep 25, 2010 9:30 AM PDT reply actions  

No no no no
UZR, among the best defensive metrics we have, has the built-in liability of requiring about three years of data before you have a sufficient sample.

Well, okay, this isn’t factually wrong. But it’s incomplete. What it should say is “DEFENSE has the built-in liability of requiring about three years of data before you have a sufficient sample.”

In other words: Imagine a world in which we can, with 100% accuracy, measure everything. Batted ball types, positioning, jumps, plays made, plays not made. Everything. There’s no human error in collecting the data. Let’s call it Perfect UZR. There are no flaws in pUZR’s methods, no variables omitted.

WE WOULD STILL NEED ALMOST THREE YEARS OF DATA!

And that’s just to equal one offensive season (which, of course, is hardly enough to completely evaluate a player).

We don’t need the multiple years of data because the data isn’t very good. It’s because there just aren’t enough samples. Even perfectly recorded samples are not enough samples. Sure, with my fictional, impossible pUZR, you would eliminate some noise. Maybe all noise. But even in doing so, there’s just not enough samples in a single season to know how talented a guy is with any real certainty. We might have a better idea of who positions themselves well, who takes good angles, etc., but I don’t see how, we’ll ever have a defensive metric that only requires one year of data unless baseball starts playing 400 games a season.

That doesn’t mean all your concerns are unfounded. I’m just saying that even perfectly accurate data is always going to run into the problem of having enough perfectly accurate data.

www.zekeishungry.com

by thejd44 on Sep 25, 2010 9:44 AM PDT reply actions   1 recs

There's also the desired result -- whether it's a player's true talent or just their result from a given season.

Assuming your ‘Perfect UZR’ with zero flaws, it’s absolutely possible to look at one season of a player’s defensive data and know how that player was THAT YEAR.

Pam liked my old sig better.

by mikev on Sep 25, 2010 9:52 AM PDT up reply actions  

I figured the idea was for true talent/projection purposes

How the player was that year doesn’t have a lot of value outside of picking a Gold Glove winner.

www.zekeishungry.com

by thejd44 on Sep 25, 2010 12:37 PM PDT up reply actions  

What I think you're overlooking is how

getting sufficient data over a one-year time period, and getting sufficient data over a three-year time period, are fundamentally different looks. This is my point. When we look at a full season of Ellis’ hitting, it might be “Ellis as a healthy 30 year old” data, not “Ellis as a healthy 30 year old, Ellis as an injured 31 year old, Ellis as a 32 year old bouncing back from injury.” Very different.

I like Cindi. A. She never pretends to know more than she does. B. She has unbridled enthusiasm for her "Hotties," and isn't afraid to show it. -IM4Oakgal

by Nico on Sep 25, 2010 10:45 AM PDT up reply actions  

No no, I understand. And I don't disagree with that point at all.

I guess my conclusion is “Defense is always going to be difficult to analyze.”

www.zekeishungry.com

by thejd44 on Sep 25, 2010 12:38 PM PDT up reply actions  

I agree.

I like Cindi. A. She never pretends to know more than she does. B. She has unbridled enthusiasm for her "Hotties," and isn't afraid to show it. -IM4Oakgal

by Nico on Sep 25, 2010 1:37 PM PDT up reply actions  

This.

The 3-year thing (it’s more like 2.4, actually) comes from two factors: A, there’s not enough data. The plays that differentiate good defenders from average ones only happen once a week, if that. And B, the data can be flawed, so that while one year of UZR may not be reflective of a guy’s true talent level, it also may not even be reflective of how he actually performed that year. Your perfect UZR would get rid of point B…but that’s a small concern relative to the not enough data thing.

That said, the 2.4 years thing isn’t a magic number, either. It’s not like after 2.4 or 3 years, the UZR number all of a sudden becomes perfect. That figure just comes from the point where UZR reached the same level of reliability as one season of something like OPS. In that case, one year of UZR should be equivalent to two and a half months of OPS. Can two and a half months of OPS tell us something useful? Of course it can.

by danmerqury on Sep 25, 2010 11:04 AM PDT up reply actions  

Again, in a 2.5 month period, or one-year period,

a given player is relatively static as a player. Over a 3-year period, though, they really aren’t.

I like Cindi. A. She never pretends to know more than she does. B. She has unbridled enthusiasm for her "Hotties," and isn't afraid to show it. -IM4Oakgal

by Nico on Sep 25, 2010 11:14 AM PDT up reply actions  

I mean, of course it's true that a player is relatively static over one day and relatively dynamic over 3 years

but so what? So you have to operationalize the data somehow and you do it by taking approximations— so what? It’s like measuring the area under a curve in calculus by pretending chunks of the curve are really straight lines. It’s legit.

I also agree with everything dan said. Yes, one season of UZR isn’t the equivalent of one season of OPS, but that doesn’t mean it’s useless. If a guy puts up a .550 OPS for 2 months, he’s either the unluckiest person on the planet, way over his head, or just terrible. The same can be said of very good or bad UZR figures. The key, as always, is proper regression.

"We don't want our people to be preoccupied with seminude, crazy men jumping up and down who are chasing an inflated object," said Sheik Mohamed Osman Arus, head of operations for the Hizbul Islam insurgent group.

by PaulThomas on Sep 26, 2010 12:14 AM PDT up reply actions  

I don't think 1 season of UZR is "useless"

But I don’t even think one week of hitting data is useless – it just has to be put into the context of being only one week of hitting data (which means that eyeball reports are probably relatively useful, e.g. “he’s lunging at pitches” or “he’s trying to pull everything” and stats less useful because the sample is so small).

I like Cindi. A. She never pretends to know more than she does. B. She has unbridled enthusiasm for her "Hotties," and isn't afraid to show it. -IM4Oakgal

by Nico on Sep 26, 2010 9:05 AM PDT up reply actions  

I'm not disagreeing with your overall point

But it can’t be perfect if it requires 3 years to collect the sample. As long as it can’t be judged within single season’s sample, it’ll never truly be “perfect.”

Choosy Feebas choose Leopold Bloom nipples

Daring. Sensual. Invigorating. Squirrel.
BLOOM. For men.

If the eggs actually hatch I made more than a mistake, I made some scientifically impossible crime.

by DMOAS on Sep 25, 2010 12:07 PM PDT up reply actions  

Right. In other words,

defensive metrics are inherently flawed because they require a sample data that spans over too much chronological time. That’s not their fault; it just is.

I like Cindi. A. She never pretends to know more than she does. B. She has unbridled enthusiasm for her "Hotties," and isn't afraid to show it. -IM4Oakgal

by Nico on Sep 25, 2010 12:16 PM PDT up reply actions  

This.

Also player health effects defense and varies from player to player and from year to year, month to month even. Player health is obscenely important and should have a metric about it for use in determining overall value, however its in the games culture to not report injuries and play through them even though thats proven time and time again to hurt the team not help it.

by PL78 on Sep 25, 2010 12:27 PM PDT up reply actions  

Right

So can we stop doing things (not that you do this, I’m just speaking generally) like looking at one awesome UZR season for Ryan Sweeney and declaring him a superstar?

the oakland athletics: hittin' ain't easy

by walk off bunt on Sep 25, 2010 12:57 PM PDT up reply actions  

Or just put an asterisk next to them

Like this year when it was pointed out to me that an OPS fueled by a high OBP was more important than one fueled by SLG and powerless hitters’ low OPSs shouldn’t be looked at with too much malice. Because hitting metrics are so definitive, Sweeney’s 3.8 WAR last year needs to be taken with a grain of salt, and its only guys who have been in the game for a while who constantly throw up giant UZRs (Crawford, Beltre, Zimmerman) who’s UZR you can look at analytically. That’s the problem here: unless the player has been in the game for a while, UZR is too prone to make guys like Sweeney better than they are by blowing up their WAR thanks to a random high UZR defensive year early in their careers.

by PL78 on Sep 26, 2010 1:59 PM PDT up reply actions  

What is the talismanic appeal of a "single season"?

Why not a single game? Why not three seasons?

I think there’s some serious circular logic at root here. The reason why people think in terms of season-long increments is simply that stats are usually presented to them in season-long increments.

"We don't want our people to be preoccupied with seminude, crazy men jumping up and down who are chasing an inflated object," said Sheik Mohamed Osman Arus, head of operations for the Hizbul Islam insurgent group.

by PaulThomas on Sep 26, 2010 12:05 AM PDT up reply actions   1 recs

There's also a real factor of six months going by between game 162 and game 1

So if 170 games were to be the right sample for something, I’d still look at 162, rather than acting like the first 8 games of the next season were continuous. We’ve seen plenty of cases where an off-season did a lot to suddenly age a player. Yes I’m looking at you, Jason Giambi.

I like Cindi. A. She never pretends to know more than she does. B. She has unbridled enthusiasm for her "Hotties," and isn't afraid to show it. -IM4Oakgal

by Nico on Sep 26, 2010 9:07 AM PDT up reply actions  

I don't agree with any of this.

I’ve stated the problem without making any attempt to solve this problem. This is extremely useful because I say so.

I hate Bob Geren and his peanut brain so much -- lenscrafters

by WaddellCanseco on Sep 25, 2010 9:47 AM PDT reply actions   1 recs

Maybe we just have to look at UZR as having this unsolvable weakness.

But think about it this way: no one has any problem saying Gabe Gross hasn’t been a very good hitter this year, based on about 240 PAs. That’s still a small sample size, mind you, but everyone’s comfortable saying it.

Maybe we can just accept that UZR will never be able to tell us everything we want to know perfectly, but that it is the best defensive metric and that it can tell us, at least somewhere between roughly and exactly, how good our defenders are. It’s no accident that Adam Dunn has a worse UZR than Daric Barton.

"I wasn't able to extend so I had a serious lack of extension."--Dallas Braden

by StJosephBurningTheOakTreesToTheGround on Sep 25, 2010 10:21 AM PDT reply actions  

I agree with this, and this is exactly how I look at UZR, and defense gauging in general

“it can tell us, at least somewhere between roughly and exactly, how good our defenders are” is, IMO, a healthy view of defensive info.

I like Cindi. A. She never pretends to know more than she does. B. She has unbridled enthusiasm for her "Hotties," and isn't afraid to show it. -IM4Oakgal

by Nico on Sep 25, 2010 10:43 AM PDT up reply actions  

I'm happy for the guy,

I was always baffled by his suckiness at the big league level, and I hope he does well.

A's Fan in Sweden

"Some of us know him as the a-hole who piled into Ray Fosse in an All-Star game (it's why Ray is the way he is folks)" - OptimistPrime

by travdog6 on Sep 25, 2010 6:43 PM PDT up reply actions  

I also don't think UZR is a good enough metric

Every reasonable attempt I’ve made to develop a defensive metric has been thwarted by obvious problems in the different natures of each position. For example a first baseman gets about one put out per inning, a pitcher about a quarter put outs per inning, a center fielder about a third, a second baseman or shortstop about a half putout per inning and so on. The larger the number of put outs per inning the smaller the variations in the data. Because statistical comparisons are only valid when the variations between data sets are about the same—there is no fair way to compare defense ability across all positions. Some people attempt to make comparisons using standard error to normalize these variations, but these comparisons fall short when the data-sets are too small, and for other problems too. And even in making comparisons between players of the same position there serious problems. For example ball park considerations make defense comparisons very difficult. And so, including your own arguments, I am not at all convinced that UZR is a valid tool.

by Ran on Sep 25, 2010 2:50 PM PDT reply actions  

Uh, people differentiate between positions through a combination of measuring changes when players switch positions,

and differences in offensive production.

No one actually tries to directly compare players playing two separate positions.

"We don't want our people to be preoccupied with seminude, crazy men jumping up and down who are chasing an inflated object," said Sheik Mohamed Osman Arus, head of operations for the Hizbul Islam insurgent group.

by PaulThomas on Sep 26, 2010 12:17 AM PDT up reply actions  

Yes I know

But it fails to work even when you normalize.

by Ran on Sep 26, 2010 12:23 PM PDT up reply actions  

No it doesn't

Teams with lots of WAR win lots of games.

I mean, you can keep denying reality if you want. I can’t stop you.

"We don't want our people to be preoccupied with seminude, crazy men jumping up and down who are chasing an inflated object," said Sheik Mohamed Osman Arus, head of operations for the Hizbul Islam insurgent group.

by PaulThomas on Sep 26, 2010 2:47 PM PDT up reply actions  

I'm not addressing WAR

I don’t know what’s eating at you but this discussion is not about WAR it’s about UZR. I was trying to present a simpler example to illustrate a point about statistical variation without going into the mathematics. UZR fails because there are more variables in defense than are described by the UZR metric. The result is that the statistically definition of the variation in the UZR metric for all players is larger the the difference in the means between good and bad players. As a result the metric doesn’t resolve the differences between good and bad players, and it is not just the result of sample size, it’s a fundamentally flawed concept. The way to correct this is to create a more robust way of monitoring defense. This requires capturing dynamic data (space and time) about the positioning and movement of the players that goes well beyond what’s typically recorded in the box scores. Welcome to my reality.

by Ran on Sep 28, 2010 10:55 PM PDT up reply actions  

One key question for me

For the players across the league that we can collect statistically significant data for 3 years, does it actually accurately predict the 4th year?

If not, then it’s kind of interesting, but of no practical purpose.

by Shed on Sep 25, 2010 10:43 PM PDT reply actions  

In a word?

Yes. It does. And it predicts run prevention, too.

UZR is a very useful statistic.

"We don't want our people to be preoccupied with seminude, crazy men jumping up and down who are chasing an inflated object," said Sheik Mohamed Osman Arus, head of operations for the Hizbul Islam insurgent group.

by PaulThomas on Sep 26, 2010 12:18 AM PDT up reply actions  

I probably missed this,

Is it set in stone that three years is the perfect criteria or does it just make sense?I see the reasoning behind that,however as someone stated earlier, a general defensive metric is probably not the easiest to implement.

The dude abides
Be cool to animals but F#CK PETA!
"They didnt quiet a building,they quieted a nation!"--WJC January 2010
"Hah! Crom laughs at your four winds!"

by tealkegkilla on Sep 25, 2010 11:41 PM PDT reply actions  

It's just that you need about 2.4 years of data or so

to have a sufficient sample. Which to me is inherently problematic because too much changes over that much time, so by the end you’re looking at a very different context than at the beginning.

I like Cindi. A. She never pretends to know more than she does. B. She has unbridled enthusiasm for her "Hotties," and isn't afraid to show it. -IM4Oakgal

by Nico on Sep 26, 2010 8:20 AM PDT up reply actions  

like PT said above

you need 2.4 years of data to have the equivalent to 1 seasons worth of offensive data, but people go off on small samples with offense ALL the time

Pam liked my old sig better.

by mikev on Sep 26, 2010 8:25 AM PDT up reply actions  

They probably shouldn't, yet those small samples are still better

because the player has been more “the same guy” throughout the sample.

I like Cindi. A. She never pretends to know more than she does. B. She has unbridled enthusiasm for her "Hotties," and isn't afraid to show it. -IM4Oakgal

by Nico on Sep 26, 2010 9:02 AM PDT up reply actions  

What if the rates that players change is different between offense and defense?

I know some work has been done to estimate the skill curves of players, but I’m not too familiar with the specifics. Could the greater repetition (and thus a more quickly accrued sample) result in players changing at a greater rate on offense?

by rebus on Sep 26, 2010 10:49 AM PDT reply actions  

How about discussing a problem that maybe can be fixed...

Namely… why do I need to log in every freaking time I come to AN?

The monster at the end of this blog.

by grover on Sep 26, 2010 11:40 AM PDT reply actions  

Glad you brought this up.

I thought it was only happening to me.

by LowcountryJoe on Sep 26, 2010 1:47 PM PDT up reply actions  

+3

Although it might be the length of my absences that are causing that issue for me…

by Poppy on Sep 27, 2010 1:01 AM PDT up reply actions  

SBN is aware of it

It’s just a matter of how long it takes them to solve it.

I like Cindi. A. She never pretends to know more than she does. B. She has unbridled enthusiasm for her "Hotties," and isn't afraid to show it. -IM4Oakgal

by Nico on Sep 26, 2010 2:40 PM PDT up reply actions  

Thank you!

let’s hope the log on metric is not as problematic as the defensive metric.

The dude abides
Be cool to animals but F#CK PETA!
"They didnt quiet a building,they quieted a nation!"--WJC January 2010
"Hah! Crom laughs at your four winds!"

by tealkegkilla on Sep 26, 2010 9:02 PM PDT up reply actions  

Comments For This Post Are Closed


User Tools

Welcome to the SB Nation blog about Oakland Athletics.

Community Guidelines ANcillary Terms

FanPosts

Community blog posts and discussion.

Recommended FanPosts

Small
A's relocation option from a legal expert on the issue
Oakland_athletics_team_logo_photofile_small
Prospects 1Q Report

Recent FanPosts

100_1536_small
My new smarts on the Fanpost, and Mr. Offseason is born, and getting to know me
Small
GOG 2012 #18: The Twins have a shiny new park, and not much else
Small
Gotta Be Their Pitching
Hardly-boys_small
Minor League notes on Major League Day Off
Small
Cespedes Upate?
Small
The SF Warriors, the LA Raiders and the Oakland A's
Photo__11__small
COG #17 - Yankees vs. Athletics or Spank me! Spank me!
100_1536_small
What to do? What to do?
Small
Fans Should Buy the A's
Reg3_small
Tom Milone's Nickname

+ New FanPost All FanPosts >

Yahoo_full_count

Front Page Writers

Maya_papi_small Tyler Bleszinski

08-_the_author_small 67MARQUEZ

Baseball_small baseballgirl

Poochini-butt_in_box_2_small Nico

Img_1877_small Billy Frijoles

Img_0653_small dwishinsky

Sb_nation1_small ahhall

Front Page Writers

Smiley_face_small gigglingone

Venasfans_small OaklandSi

60-minutes-clock_small cuppingmaster

Patpicturebucky2_small YonYonson

Img_3830_small David Fung

Moderators

Photofunia-5c770b_small coffee roaster

Denver_small Colorado Fan

Ls_logo100_small LoneStranger

Thumbs_up_small LongTimeFan

Marty_profile_in_green_small mrod

Babycomputergeek_small paris7

Img_0115_small Tutu-late