Navigation: Jump to content areas:


Pro Quality. Fan Perspective.
Login-facebook
Around SBN: Following UFC 146 Loss, Jason 'Mayhem' Miller 'Done' in UFC

Is Median or Mean Better for Predicting UZR?

 

Alright, I had a little free time this morning and, unfortunately for you guys, that means I decided to write about the A's.  And since the debate du jour of today seems to be how to predict defensive performance, here's my take.

Star-divide

I first took all the data on UZR from fangraphs from 2005 to 2010 for fielders who fielded for at least 500 innings.  For 2010, that was 220 players led by Brett Gardner with a 22.3 UZR and trailed by Matt Kemp at -24.  For all players, I then matched those with their corresponding scores in previous seasons dating back to 2005.  My goal was to use the five years 2005-2009 to predict a player's 2010 UZR*. 

*I did all of this in R of course.  If anyone wishes to view my codes, feel free to email me.

Of course, if a player didn't play the previous year or didn't play the same position the previous year, they were eliminated.  Unsurprisingly, by the end, only 38 players remained.  Now obviously that is not a big enough sample size to conclude much.  I decided to change the 5 year requirement and created 2 batches: 2005-2008 to predict 2009 and 2006-2009 to predict 2010.  The first batch had 53 players and the second 56.

From that data, I took each player's mean and median UZR score and used both to predict the next year's UZR score.  Below are pictures:

 

 

Uzrvsmed2005_2009_medium

Uzrvsmean2005_2009_medium

Uzrvsmedian2006_2010_medium

Uzrvsmean2006_2010_medium

Uzrvsmedian2005_2010_medium

Uzrvsmean2005_2010_medium

So what do these tell us?  The closer to the y=x line the points are, the more accurate the prediction is.  Which do you think is better in each case?

Lastly, I will give you the correlation coefficients.  These numbers explain the percent of the variance that can be explained by our model:

> cor(UZR2005_2009$"2009", UZR2005_2009$meds)

[1] 0.4529775

> cor(UZR2005_2009$"2009", UZR2005_2009$means)

[1] 0.4903792

 

From 2005-2009, using the mean UZR from 2005-2008 was more accurate than using the median score. 

 

> cor(UZR2006_2010$"2010", UZR2006_2010$meds)

[1] 0.5656674

> cor(UZR2006_2010$"2010", UZR2006_2010$means)

[1] 0.5567251

From 2006-2010, using the median UZR from 2006-2009 was slightly better.  Interestingly, the correlation from 2006-2009 to 2010 was much better than from 2005-2008 to 2009. 

> cor(UZR2005_2010$"2010", UZR2005_2010$meds)

[1] 0.4186658

> cor(UZR2005_2010$"2010", UZR2005_2010$means)

[1] 0.4534451

For the entirety of our data, means won by a fair amount. 

So, can we conclude anything?  Probably not, although if you're deciding between using mean and median, I suppose you lean towards choosing the mean.  It is unlikely that it will make much of a difference.  But this is AN, where even our tiniest differences are overanalyzed. 

With more time, I would add in a time series model that weighted more recent years more heavily and see how that compares to the mean and median models.  I might also see if TZ or Dewan +/- or the Fan's Scouting Report would help our model.  But in 2 hours how much can you expect of me?

Comment 34 comments  |  12 recs  | 

Do you like this story?

Comments

Display:

Yay! Mean wins!

Rec’d

I vibrated with joy that join A's. -- Kim Seong-min

by WaddellCanseco on May 3, 2011 12:33 PM PDT reply actions  

Wow, great research! Rec'd.

"We were shit, pathetic," Guillen growled early in spring training. "We hit too many home runs."

by lenscrafters on May 3, 2011 12:44 PM PDT reply actions  

MATH

#SHAKESFIST

I’m dumb so can you tell me if I’m wrong or right about Willingham? :D

Official Athletics Nation Rotating Tagline Editor - Pam liked my old sig better.
My thoughtful watermelon is easily mistook for an early American catapult.

by mikev on May 3, 2011 12:45 PM PDT reply actions  

Eh, it's impossible to be too wrong given the data, they're pretty close and the study is not scientific

But mean beat median two of the three times. So, you lose re: CompliantPork.

"Loyal? I'm the most loyal player money can buy." - Don Sutton

by vignette17 on May 3, 2011 12:54 PM PDT up reply actions  

but in his specific case I'm winning.

Official Athletics Nation Rotating Tagline Editor - Pam liked my old sig better.
My thoughtful watermelon is easily mistook for an early American catapult.

by mikev on May 3, 2011 12:58 PM PDT up reply actions  

so not bi-winning...

Resident Smartass.
and my residency is Blazersedge.com

by Devyn on May 3, 2011 1:01 PM PDT up reply actions  

his UZR sucks balls and it's sucked balls every year except for one.

how am I losing?

Official Athletics Nation Rotating Tagline Editor - Pam liked my old sig better.
My thoughtful watermelon is easily mistook for an early American catapult.

by mikev on May 3, 2011 1:06 PM PDT up reply actions  

:woohoo:

Official Athletics Nation Rotating Tagline Editor - Pam liked my old sig better.
My thoughtful watermelon is easily mistook for an early American catapult.

by mikev on May 3, 2011 1:10 PM PDT up reply actions  

What do you think of him eyetest wise, WC?

"Hey anyone can join in...as long as they talk about me." - Mr. Bed
"So you're saying we should skin the Rangers and wear them as uniforms? I’m down." - Kyli

by cuppingmaster on May 3, 2011 1:37 PM PDT up reply actions  

He seems like a statue to me. As Danny says, he's not awkward or afraid of the ball

but he doesn’t seem to have any mobility either. What do you think?

I vibrated with joy that join A's. -- Kim Seong-min

by WaddellCanseco on May 3, 2011 1:58 PM PDT up reply actions  

Something like that

He seems to know he’s not that great, and won’t take any risks. That said, I think he gets good jumps for someone who probably sucks as a fielder.

"Hey anyone can join in...as long as they talk about me." - Mr. Bed
"So you're saying we should skin the Rangers and wear them as uniforms? I’m down." - Kyli

by cuppingmaster on May 3, 2011 2:12 PM PDT up reply actions  

He's really slow

I mean seriously, seriously slow. His run is like Coco Crisp jogging. Any ball down the line is practically a guaranteed extra-base hit even if it’s cut off before getting to the wall, because it takes him so long to stagger over there.

"We don't want our people to be preoccupied with seminude, crazy men jumping up and down who are chasing an inflated object," said Sheik Mohamed Osman Arus, head of operations for the Hizbul Islam insurgent group.

by PaulThomas on May 3, 2011 4:14 PM PDT up reply actions  

This I do agree with

"Hey anyone can join in...as long as they talk about me." - Mr. Bed
"So you're saying we should skin the Rangers and wear them as uniforms? I’m down." - Kyli

by cuppingmaster on May 3, 2011 5:48 PM PDT up reply actions  

This doesn't really answer that.

You weren’t arguing for the use of median or mean. Neither was I. It was more about throwing out data, which you say is okay, and I say isn’t.

by danmerqury on May 3, 2011 1:43 PM PDT up reply actions  

Only in his specific case.

and I’m right so far. BOOYAH.

Official Athletics Nation Rotating Tagline Editor - Pam liked my old sig better.
My thoughtful watermelon is easily mistook for an early American catapult.

by mikev on May 3, 2011 1:44 PM PDT up reply actions  

The definition of "median"

is pairing off the extremes and then throwing them out until you get down to a remainder of one or two samples.

To be sure, it’s better than throwing out data just because you don’t like it, but it’s definitely throwing out data.

"We don't want our people to be preoccupied with seminude, crazy men jumping up and down who are chasing an inflated object," said Sheik Mohamed Osman Arus, head of operations for the Hizbul Islam insurgent group.

by PaulThomas on May 3, 2011 4:16 PM PDT up reply actions  

tm;dr

too mathy; didn’t read.

In all seriousness, I rec’d because I can appreciate all the work that went into this vin, thanks for contributing.

I was never graced with the mathematics gene. As soon as numbers and formulas are put in front of me, I glaze over. I have always had a great appreciation for those who are gifted in this area.

I got nothin'

by OptimistPrime on May 3, 2011 2:40 PM PDT reply actions  

I, on the other hand, am graced with the

think-up-a-mathy-idea-and-get-someone-else-to-do-the-work gene!

Sweet is the lore which Nature brings; / Our meddling intellect
Mis-shapes the beauteous forms of things:— / We murder to dissect.

by iglew on May 3, 2011 5:23 PM PDT up reply actions  

Great stuff.

Heres something that Im thinking about. Maybe discarding season which are greater than the standard dev would create a better projection going forward

You don't need a religion, you have the A's. - My girlfriend

by designatedforassignment on May 3, 2011 3:08 PM PDT reply actions  

It would be an interesting idea

Not impossible to implement, but I’m lazy so I can’t answer.

"Loyal? I'm the most loyal player money can buy." - Don Sutton

by vignette17 on May 3, 2011 3:25 PM PDT up reply actions  

That's certainly not the case with batting projections

And I can’t imagine it would be the case for fielding. If Willingham is +5, it counts. But if he’s +10, we throw out the whole season?

Outliers should be regressed for projection purposes—just like all data. But there’s no reason to simply throw the data out.

by Danny on May 3, 2011 3:53 PM PDT up reply actions  

Probably better to regress the +10 to a +5 (or whatever a SD was).

That would presume that the “beyond an SD” part was noise but that the “+1 SD part” reflected what actually happened. Not that I’d recommend doing this, but it would be better than throwing the data out entirely.

I like Cindi. A. She never pretends to know more than she does. B. She has unbridled enthusiasm for her "Hotties," and isn't afraid to show it. -IM4Oakgal

by Nico on May 7, 2011 9:02 AM PDT up reply actions  

I think part of the reason it's close is that

using the median is sort of like regressing by other means. When making projections with limited data, which is always the case with fielding, the observed level should be regressed towards the mean, in this case 0. Good fielding projections use the mean of whatever stat, and then regress it, so the projection will almost always be closer to 0 than the mean. The median in most cases will also be closer to 0, thus making the r competition closer. Even if the median did better in this test, it would be an asinine way to project. A weighted, regressed mean projection would perform better than either using the mean or the median.

With stout hearts, and with enthusiasm for the contest, let us go forward to victory. ----Hero Defector Montgomery

by mikeA on May 3, 2011 4:58 PM PDT reply actions   3 recs

Absolutely

I would love to have added one if I had the time. And yes, I would expect it to perform the best.

"Loyal? I'm the most loyal player money can buy." - Don Sutton

by vignette17 on May 3, 2011 6:41 PM PDT up reply actions  

Awesome!

I love AN. And you, vignette.

Sweet is the lore which Nature brings; / Our meddling intellect
Mis-shapes the beauteous forms of things:— / We murder to dissect.

by iglew on May 3, 2011 5:24 PM PDT reply actions  

Good stuff!

2011 Oakland Athletics: We have Cy Young pitchers and make yours look like it, too

by elcroata on May 4, 2011 12:16 AM PDT reply actions  

normal distributions

In a completely normal distribution, the median and mean would be the same. It seems pretty illogical that a player’s performance would be anything other than a normal distribution around a “true skill” level. If a minor but robust difference exists, it’s possible due to changing skill during the sample (i.e. a young player gets better for a few years before his age starts to impact his range). What would be really shocking is if the median or mean were significantly different. Were that to occur, it would make me think UZR was a pretty worthless statistic.

by MrIncognito on May 4, 2011 10:46 AM PDT reply actions  

I disagree.

It’s not illogical to think that a player’s performance would be anything other than a normal distribution. To expect a normal distribution assumes that a player’s “true talent” level remains constant and every variation is just random noise.

But it’s not just random noise. There are a lot of little unmeasurable factors that affect how well a player plays defense on any given day. What we are trying to determine here is whether it is the nature of those factors to clump themselves by season or not. It’s not far-fetched to think that they might.

It seems we don’t have enough data here to make a determination, but it’s a reasonable question and a worthwhile inquiry. I think you’re wrong to presume a normal distribution a priori, and to be ready to reject UZR as a stat if the data were to show otherwise.

Sweet is the lore which Nature brings; / Our meddling intellect
Mis-shapes the beauteous forms of things:— / We murder to dissect.

by iglew on May 4, 2011 1:06 PM PDT up reply actions  

The lot of little unmeasurable factors would have to add up to a trend in order to cause a difference in median and mean of a population. If there isn’t a true value to measure, the entire concept of a single statistic capturing defensive worth would be flawed.

by MrIncognito on May 7, 2011 7:04 AM PDT up reply actions  

Comments For This Post Are Closed


User Tools

Welcome to the SB Nation blog about Oakland Athletics.

Community Guidelines ANcillary Terms

FanPosts

Community blog posts and discussion.

Recommended FanPosts

Small
A's relocation option from a legal expert on the issue
Oakland_athletics_team_logo_photofile_small
Prospects 1Q Report

Recent FanPosts

100_1536_small
My new smarts on the Fanpost, and Mr. Offseason is born, and getting to know me
Small
GOG 2012 #18: The Twins have a shiny new park, and not much else
Small
Gotta Be Their Pitching
Hardly-boys_small
Minor League notes on Major League Day Off
Small
Cespedes Upate?
Small
The SF Warriors, the LA Raiders and the Oakland A's
Photo__11__small
COG #17 - Yankees vs. Athletics or Spank me! Spank me!
100_1536_small
What to do? What to do?
Small
Fans Should Buy the A's
Reg3_small
Tom Milone's Nickname

+ New FanPost All FanPosts >

Yahoo_full_count

Front Page Writers

Maya_papi_small Tyler Bleszinski

08-_the_author_small 67MARQUEZ

Baseball_small baseballgirl

Poochini-butt_in_box_2_small Nico

Img_1877_small Billy Frijoles

Img_0653_small dwishinsky

Sb_nation1_small ahhall

Front Page Writers

Smiley_face_small gigglingone

Venasfans_small OaklandSi

60-minutes-clock_small cuppingmaster

Patpicturebucky2_small YonYonson

Img_3830_small David Fung

Moderators

Photofunia-5c770b_small coffee roaster

Denver_small Colorado Fan

Ls_logo100_small LoneStranger

Thumbs_up_small LongTimeFan

Marty_profile_in_green_small mrod

Babycomputergeek_small paris7

Img_0115_small Tutu-late