Baseball Projections. How much do they tell us?
Margin of error is a huge component of statistics. When you see a political poll with candidate A leading Candidate B by 6 points, with a MoE of 3 points, you can be pretty confident that Candidate A is actually ahead.
Why don't we think in similar terms for baseball projections? Witness this exchange:
Anderson — 3.63
Cahill — 4.53
Mazzaro — 4.53
Braden — 4.58
Tomko — 4.65
Outman — 4.75
Edgar — 4.79
Eveland — 5.22
Gio — 5.88
Mortensen — 6.28
So ZiPS thinks that Cahill will be just fine in the near future, no matter how bad he was in the beginning of the year. It predicted that he’d improve in the second half and he did, by K-rate and K/BB.
Clearly he’s one of the 5 most talented starters on the team and talent mostly wins.
This is very typical analysis at AN (and I love it--I learn from this kind of stuff all the time), but I have a question. The conclusion above is that Cahill is "clearly" one of the 5 most talented starters, since his ZIPS_ros is .26 lower than our 7th best starter (E-Gon). That seems compelling, but what's ZIP's margin of error?
Citing a statistic/projection without stating its confidence level strikes me as basically meaningless. For example, if a player is projected for a .760 OPS, but with a confidence interval of [.660 .860] my conclusion would be that the projection is 100% meaningless.
Unfortunately, I downloaded the Zips_ros spreadsheet, and it is literally a black box. This is akin to saying Obama leads McCain by 5 points, but refusing to specify any sort of margin of error. I'm sure Zips has some explaination of Margin of error somewhere, but I couldn't find it in 10 minutes of searching...
I've noticed that we talk about the components of the metrics a lot but we never talk about their precision. And, when the projections are wrong, we shrug, and say, well, that's luck. And that's fair- no model can project the future perfectly. But Confidence Intervals and Margin of Error are tools designed to let us know how reliable a model is. I think we should give more scrutiny to arguments that cite fairly small differences in projections and fail to provide any sort of margin of error analysis. Lets look at what the Margin of Error for two projection systems are:
Chone provides some interval analysis, via its "10th percentile" through "90th percentile" projections. By the way, for Chone, the daring prediction for Giambi was that he would OPS somewhere between .700 and .960...that's a HUGE interval. It had Rajai between .550 and .770 on OPS. With intervals that large, I'm going to (inwardly) laugh every time I hear Chone cited.
PECOTA (using Phillies because I'm not a subscriber)
Raul Ibanez: 10th percentile: 740 OPS, 90th Percentile: 946 OPS
Jimmy Rollins: 10th: 735 OPS 90th: 915 OPS
Chase Utley: 797-987
Matt Stairs: 809 OPS (50th percentile)
These ranges are so large that you can't make a meaningful conclusion that any player is better than another. These projections should be read as "Rollins is projected for an 824 OPS, plus or minus 100 points. LOL. Since Stair's median OPS is within Utley's 80% confidence interval, statistically speaking, Pecota cannot be used to conclude that Utley is a better hitter than Stairs.
What am I missing? I like statistical analysis as much as anyone, but these intervals suck. Citing these stats is akin to citing a poll that lists Obama ahead of McCain by 3 points, plus or minus 15 points.
And, if I'm not missing something obvious, why are these metrics more reliable than a 40 year veteran scout? I bet he could tell you Utley is likely to OPS between 800 and a 1000. I bet he could have told you Rajai would hit better than a 550 OPS, but that 770 was near the peak of his potential.
And you know what? Lots of the ANers who go to games could have told you that too.
158 comments
|
8 recs |
Do you like this story?
Comments
I'm with you.
The biggest problem to me is that statistical analysis in any sense, can be used to proffer a wholly subjective argument that seems compelling. This goes double for emergent statistical tools. Too many political and entertainment junkies mistake these arcane tools as some final word in any projection of potential output going forward.
I realize this isn’t specifically on topic, but I’ll take this post as the latest salvo in the ongoing war between those enamored of their favorite analytical tools and those who would rather use a more qualitative approach.
It’s like a heated (but fun and harmless) exchange I had recently with another member here. My assertion was that, based on his on field performance and lack of notable progress from last year to this year, Gio shouldn’t just be handed a position in the rotation. I argued that, qualitatively and quantitatively, Gio still sucks. My fellow AN’er wanted to show that Gio had improved dramatically because a few select statistical contortions showed improvement or a likelihood to improve. Care was taken, however, to avoid those stats which might spell doom for Gio. For the record, I hope Gio turns it around and makes a fool out of me (but only if it’s for the A’s). No, I don’t want to continue the debate.
The point is this: maybe I’m not as smart about teh besbol as the stat-wizards. They clearly have a great love of the game. But I am smart enough to know when someone is using a statistical argument to prove a subjective point while acting like they “have the math on their side”. That’s some Karl Rove level chicanery. It’s all speculation, no matter how you dress it up.
by Rancho Canseco on Sep 21, 2009 6:17 PM PDT reply actions 4 recs
I couldn't agree more.
I’m pretty new here, but I have to say that I was disappointed at how many posted using the type of analysis you cite – clearly subjective opinion pieces (nothing against this if admitted to) posing as the factual results of rigorous quantitative analysis.
I’m embarrassed to admit to it, but it really angered me…I understand statistics, certainly enough to know when they’re crap, but my unwillingness to accept their validity simply because some sabermetrician threw them together somehow made me a simpleton.
As you say, clearly the stat wizards have a great love of the game if they’re willing to go to the trouble to try to quantify everything, but it doesn’t mean that those not buying into everything with a decimal point don’t love and understand the game just as much.
It’s a very emotional game, afterall…
by Joneser on Sep 21, 2009 8:15 PM PDT up reply actions 1 recs
I'd say that my argument
does support those who value scouting.
But my reasoning is VERY different from most people who don’t like baseball stats.
Agreed. And I'm glad that someone does.
I read your post down below, too, and I’m with you. “False precision” is an excellent way to phrase it…there’s a great deal of speculation and assumption in the weighting and data sets of the more complex metrics, and for some reason, that doesn’t always compute for some as subjectivity.
I hate the idea of a "war" personally
Even Baseball Prospectus admits that is a place for scouts and the non-statistical approach, and it’s a perfectly relevant place. There’s no reason that a person has to confine themselves to one side of thinking or the other, and think anyone on the other side is ignorant, or pompous, or whatever. I do think though that a lot of fans around this site argue using statistical analysis for a few reasons:
1) They don’t trust their eyes. I know that my personal images are tainted by a few select awesome performances (part of me still wishes Jay Payton was our starting Left Fielder) and that I don’t have the chance to watch nearly enough games to confirm what I think. Scouts who watch every game have the credibility to make such arguments because they aren’t working off a small sample size. I wish I could watch every game, but alas, I work, have roommates that watch TV, and sometimes prefer to listen to the radio while doing other things. I don’t think I’m in the minority here either. I don’t feel comfortable making arguments arbitrarily, because I know that if I go by what I personally have seen, I’m probably wrong.
2) Statistical analysis provides for a good discussion. The fact of the matter is that “Ryan Sweeney should start in CF because he looks better out there” doesn’t really allow for much discussion. Either you’re under the opinion that he looks comfortable or you aren’t, and because of the aforementioned problem with memory and judgment, there isn’t much that can be said in response other than “I agree” or maybe “look at this video/photos/other media that I found/took.” On the contrary, if I say that Ryan Sweeney takes better routes to the ball because his fielding metrics tend to gravitate towards him having a better range, then that can be argued and discussed with some degree of progress—I can cite other metrics (not all metrics are foolproof, as has been mentioned,) or point out the metrics of other fielders in comparison, etc. I’ve seen a lot of non-statistical arguments go nowhere because there just isn’t much to be said there, and I’ve seen a lot of statistical arguments become very interesting.
3) The quantitative approach to the game only gets better the more people make attempts to help. If someone makes a great post talking about X Metric, and some undergrad math major sees it and gets involved, then we’re making progress towards less subjective statistics. 20 years ago stats were a lot worse off, but people started reading Bill James, and putting more stats on the internet for public consumption, and now people are coming around to the idea that these statistics say a lot about the game.
The long and short of it is that statistics are worth mentioning and discussing. While I certainly don’t agree that posters should discount one side or the other, I do think it’s foolhardy to think that any argument which uses statistics to prove a point is straw man by necessity.
rebuildingseason.blogspot.com
by Rebuilding Season on Sep 22, 2009 11:20 PM PDT up reply actions 3 recs
Your #2 goes both ways.
There are plenty of people out there for whom “he looks better out there” leads to plenty of interesting debate whereas quoting a defensive metric is a conversational dead end. It’s just a matter of what you’re more familiar with or more fluent in.
"Go ahead and overachieve, you scrappy Brett-Favre-colored walk-takers." —Rev Halofan
How can you argue "he looks better out there"
it is not really possible to have anything more than a “uh huh” “na uh” type conversation about what people see.
There are differing opinions on me. According to Iglew "DFA is PT with a sense of humor. PT is DFA with introspective self-doubt. I like them both" but according to sirbed Im "The Stats Killer"
by designatedforassignment on Sep 23, 2009 12:26 PM PDT up reply actions
I've heard people say, "He looks like crap out there, but he gets the job done."
Is that the same thing in a round-about kind of way?
In 2008 I was watching a team that was rebuilding. In 2009 I feel like I'm watching a team that just sucks.
i guess not really this is why football is boring compared to baseball
Statistics provide a meta framework for debating players and standardizing data from which arguments can be drawn from a more or less agreed upon framework.
Without them there really is no framework for overall evaluation.
There are differing opinions on me. According to Iglew "DFA is PT with a sense of humor. PT is DFA with introspective self-doubt. I like them both" but according to sirbed Im "The Stats Killer"
by designatedforassignment on Sep 23, 2009 9:37 PM PDT up reply actions
Agree that football is boring in comparison, but...
…"agreed upon??? Oh, “more or less”. Yeah, that works.
Seriously, though, baseball has historically always been more dependent on stats, for both fans and teams, than pretty much any of the other major sports.
I have heard the phrase “He looks like crap out there, but he gets the job done.” applied to many players in my 35+/- years of being a baseball fans, and I always interpreted it to mean that the end stats and/or results are still there, just you never would have guessed it had you watched him play and saw how “ugly” he was doing it.
In 2008 I was watching a team that was rebuilding. In 2009 I feel like I'm watching a team that just sucks.
What are you talking about?
The statistical work being done in football (specifically by footballoutsiders) is brilliant and seemingly just as valid as what is being done in baseball.
by Pucking Insane on Sep 24, 2009 10:03 AM PDT up reply actions
Brilliant, yes
But also far behind what’s available for baseball.
"There's never enough time to do all the nothing you want" -Bill Watterson
I'm not as much a football fan as I am a baseball fan, so I have to ask...
…is it that football stats are increasing in validity to mainly serve the fantasy league industry? Not so much the actual game on the field?
In 2008 I was watching a team that was rebuilding. In 2009 I feel like I'm watching a team that just sucks.
Yeah its getting better but its way way behind.
There are differing opinions on me. According to Iglew "DFA is PT with a sense of humor. PT is DFA with introspective self-doubt. I like them both" but according to sirbed Im "The Stats Killer"
by designatedforassignment on Sep 24, 2009 3:49 PM PDT up reply actions
Plus it's got a much tougher task
22 people can effect the outcome of a play in football. It makes it hard to isolate individual performances.
"There's never enough time to do all the nothing you want" -Bill Watterson
and football is a much more of a team game than baseball.
There are differing opinions on me. According to Iglew "DFA is PT with a sense of humor. PT is DFA with introspective self-doubt. I like them both" but according to sirbed Im "The Stats Killer"
by designatedforassignment on Sep 24, 2009 4:01 PM PDT up reply actions
Definitely.
I mean, this may be over simplifying things, but part of the beauty of modeling baseball with statistics is because it is such a natural fit in that in baseball there really are a set number of outcomes that can happen in a situation.
Of course there is some randomness, but nothing like compared to football or hockey.
by Pucking Insane on Sep 24, 2009 4:35 PM PDT up reply actions
You can discuss
how a guy positions himself, the routes he runs, how good a read he gets on the ball as soon as it’s hit, his judgment on whether to dive on a ball he might not reach, how quickly and accurately he throws a ball back to the infield.
I’m just talking about outfielders. There’s plenty more for infielders.
It’s pretty much all the same factors Tom Tango asks about in his Fan Scouting Report. And then of course you can debate the relative importance of each.
I happen to be relatively ignorant of such things, but I still don’t see how it’s less of a conversation than “his UZR is 6.5”, followed by “[looks it up on FanGraphs] Yeah, you’re right, it is.”
"Go ahead and overachieve, you scrappy Brett-Favre-colored walk-takers." —Rev Halofan
by iglew on Sep 24, 2009 9:57 AM PDT up reply actions 1 recs
Well, you can discuss it, but most people are thoroughly mediocre at judging those things
The Fan Scouting Report only works at all because of the “wisdom of crowds,” and even then it’s subject to some pretty significant biases.
Linda's in the cold ground, won't see her anymore
Somewhere out on the highway tonight, the drunken engines roar
It's just one of those things, one of those things
-- Al Stewart, "Accident on 3rd St."
In memory of Nick Adenhart and all victims of drunk driving
But most people are adept at statistical analysis?
Not discounting what you’re saying, because you’re right, but most people are just thoroughly mediocre. Statistics don’t necessarily lead to a better discussion – they do in a crowd of people that understand and apply metrics correctly, but that situation isn’t necessarily the norm.
From reading your other posts, you seem to be able to handle yourself pretty well when it comes to that type of analysis, and my guess would be that even though a good deal of work is required to put the pieces together, you probably have a natural aptitude for working with metrics. For others, there may be an inherent ability to “see” things that you cannot pick up on as easily (generally speaking – not a me vs. you statement).
Is it possible that people who exalt the interpretive power of statistics in all cases do so because their natural abilities make them lean that way, and that they make the assumption that everyone is as bad at judging with their eyes as they are?
It's possible that there are people who really are good
at intuitively judging how good baseball players are at fielding (or hitting, for that matter) without looking at statistics, because their minds naturally aren’t subject to as much of the same kind of biases that make the average person so bad at it. Let’s call them “heurists.”
The problem is that unlike people who are good at reading statlines, the heurists are very difficult for anyone to identify correctly. Their opinions might be very useful, but I can’t figure out who they are without any objective standard to measure them against.
As for your last question, I see no reason to believe “stats people” are any less observant than anyone else. I don’t think the two skills have anything to do with each other, really.
Linda's in the cold ground, won't see her anymore
Somewhere out on the highway tonight, the drunken engines roar
It's just one of those things, one of those things
-- Al Stewart, "Accident on 3rd St."
In memory of Nick Adenhart and all victims of drunk driving
Agreed.
I think it would certainly be easier to fake being a “heurist,” so I you make a good point – when someone doesn’t know how to use statistics, it’s pretty clear.
And I don’t think the two skills have anything to do with each other, either, nor do I see any reason that some people couldn’t be equally adept at both (some people are just good at everything). But some people will be good at one or the other, and many people at neither.
As long as we agree that they are both skills…
Stats dissociates the person from their conclusion in helpful ways
that I see therefore I am right cannot.
There are differing opinions on me. According to Iglew "DFA is PT with a sense of humor. PT is DFA with introspective self-doubt. I like them both" but according to sirbed Im "The Stats Killer"
by designatedforassignment on Sep 24, 2009 3:50 PM PDT up reply actions
I don't really agree with that.
Anyone who is even slightly self-reflective can understand, identify, and work through their own bias…just the process of doing so opens you up to different perspectives and can be, well, helpful. Knowing that there is a propensity toward the recency effect or a horn/halo effect can lead a person toward other means of discovery, including the incorporation of statistical analysis to supplement (or confirm, or discount) what they see.
There is just as much “I read it on Fangraphs so I am right” floating around out there, and that disassociation referred to only comes into play when there is a genuine desire to get to the bottom of something coupled with a thorough understanding of the statistical tools available (and their shortcomings). When statistics are used only to support a preconceived opinion, then there is no such disassociation…there are many people (not necessarily in this forum) that use statistics in a “because the Bible says so” kind of a way.
Both approaches are equally corruptible.
by Joneser on Sep 24, 2009 7:25 PM PDT up reply actions 1 recs
You are caricaturing everyone who has an
eyeball-view perspective on baseball as a stubborn, pig-headed fool. Sure, the attitude of “everything I say is right and that’s that” is a conversation stopper, but that’s true whether the person is a stat-head or a scout-head.
I don’t know where you get this idea that baseball fan who isn’t into stats is completely arrogant and unreflective. The data do not support such a hypothesis.
"Go ahead and overachieve, you scrappy Brett-Favre-colored walk-takers." —Rev Halofan
The only way to discuss reeks of small sample size
“He’s not really a bad fielder, he made a good play yesterday”
"There's never enough time to do all the nothing you want" -Bill Watterson
Right and it doesn't even give you a frame work for evaluating how much that play you saw yesterday is worth
There are differing opinions on me. According to Iglew "DFA is PT with a sense of humor. PT is DFA with introspective self-doubt. I like them both" but according to sirbed Im "The Stats Killer"
by designatedforassignment on Sep 24, 2009 9:51 PM PDT up reply actions
The whole point of being a skilled "heurist" (as you say)
or scout is that you know how to avoid these biases. As a proud heurist, I’d say part of what helps me is that I know full well that a great play/game or awful play/game is more memorable but I’m careful not to allow it to become that in my more objective mind.
I think your point, that there aren’t very many skilled heurists/scouts and that it’s easy to fake being one, is valid. But that doesn’t mean that there aren’t some folks who have taken great pains to develop their heuristic/scouting abilities at a high level.
I like Cindi. A. She never pretends to know more than she does. B. She has unbridled enthusiasm for her "Hotties," and isn't afraid to show it. -IM4Oakgal
The problem is this
The whole point of being a skilled “heurist” (as you say) or scout is that you know how to avoid these biases. As a proud heurist, I’d say part of what helps me is
Finally, lay scouting is poor. However, with pride I would say that my own lay-scouting is not poor…. I’m sorry that you don’t have more respect for me and my observations.
Is that you make the assumption that you are good at being a scout when there is no reason to believe you are a good scout (unless you have written down and tracked a wide swath of scouting projections over years and can show us the data.) It is the delusion of self that everyone has but it is more prevalent when the truth of our own reality such as the things we see is questioned.
There are differing opinions on me. According to Iglew "DFA is PT with a sense of humor. PT is DFA with introspective self-doubt. I like them both" but according to sirbed Im "The Stats Killer"
by designatedforassignment on Sep 26, 2009 7:34 PM PDT up reply actions
Granted that some people
believe they are good scouts even though they’re not, why do you say categorically that “there is no reason” for Nico to believe he is a good one, and that he must be deluding himself?
That seems awfully dogmatic. Unless you’re claiming there’s no such thing as a good scout at all, what evidence do you have that Nico isn’t one?
"Go ahead and overachieve, you scrappy Brett-Favre-colored walk-takers." —Rev Halofan
That's backwards
Assuming it’s an uncommon skill, what evidence is there that he is one?
"There's never enough time to do all the nothing you want" -Bill Watterson
Everyone is deluded, both me and Nico everybody thinks they are right at a primal level.
I don’t think that it is on me to prove that Nico is not one, even though Nico has clearly been around the game for a long time and has had more access than most fans via his time as an A’s/minor league broadcaster. The reason I used the quotes is that it shows the confidence that while Nico considers lay scouting to be poor he considers himself himself is exempt from that conclusion. If I was a scouting centric guy and saw enough games Im sure I would feel the same way.
I am not saying that there are only bad scouts. Clearly there are those with the innate ability to process visual data in ways that are conducive to accurate projection, but I am of the belief that it is a rare trait.
There are differing opinions on me. According to Iglew "DFA is PT with a sense of humor. PT is DFA with introspective self-doubt. I like them both" but according to sirbed Im "The Stats Killer"
by designatedforassignment on Sep 27, 2009 12:22 AM PDT up reply actions
I don't think it's on you to prove
that Nico is not a good scout either … except that you baldly asserted that he is not, and that furthermore there is no reason for him to believe that he is. I’m just asking you to acknowledge that you don’t know whether he is or not.
You’re the one making absolutist claims here, DFA, not me and not Nico. Then when I call you on them you back off and say “it’s not up to me to provide evidence.” It’s not the first time I’ve seen you play this trick. Is this what they teach you on debate team?
"Go ahead and overachieve, you scrappy Brett-Favre-colored walk-takers." —Rev Halofan
Im not saying either way
what I am saying is that Nico (as a test case and not to pick on him because this really isn’t about him at all) believes that he is a good scout. What I am suggesting is that there is no logical reason for him to believe that he is a good scout.\ but he believes he is because from a psychological prospective people want to believe that their reality is the reality that everyone else lives in.
And yes debate did teach me to ask people for their warrants when people don’t back up what they are saying with llogic or evidence. You are claiming that I have said plenty of things I have not, namely that Nico isn’t a good scout. Nor am I making any absolutest claims.
There are differing opinions on me. According to Iglew "DFA is PT with a sense of humor. PT is DFA with introspective self-doubt. I like them both" but according to sirbed Im "The Stats Killer"
by designatedforassignment on Sep 27, 2009 1:07 AM PDT up reply actions
Apparently I'm misunderstanding you.
The sentence I read was
you [Nico] make the assumption that you are good at being a scout when there is no reason to believe you are a good scout
In this follow-up post, you say
What I am suggesting is that there is no logical reason for him to believe that he is a good scout.
To me that sounds like you’re asserting that Nico has no reason to believe he’s a good scout. How do you know that he has no reason to believe that? You don’t.
"Go ahead and overachieve, you scrappy Brett-Favre-colored walk-takers." —Rev Halofan
Missing word
If I can butt in here, this is how I think you guys are talking past each other.
I infer that DFA meant, “There is no reason [for me] to believe that you are a good scout,” and that iglew reads it as “There is no reason [for you] to believe that you are a good scout.”
The former means, “You haven’t shown me the evidence,” whereas the latter means, “I know that such evidence doesn’t exist.” The ambiguity in the original phrase looks like the problem to me.
"And Julio Franco is batting right-handed!" -- Wayne Hagin, A's radio play-by-play, mid-80s
I think this is actually an interesting question
Scouts like Billy Owens are hired for their ability to recognize talent and project performance from what they see, and while they can be assessed somewhat on a track record of how their recommended players do, it’s not that simple when injuries play a part, and when sometimes players every scout believes in might flame out, or be accused of murder…
So on what basis are scouts’ abilities assessed? Similar to hitting coaches; you can’t best assess their performance just based on how the hitters are hitting, or on any particular set of data.
What it comes down to, I think, is reputation among people with “street cred.” And while that may not be scientific, going back to this discussion I’d say that the members of AN with the best overall reputation, among their peers, for being sophisticated eyeball scouts, are probably in fact the most sophisticated eyeball scouts on AN.
And the same is likely true for our stat-based analysts, who aren’t poor analysts just because sometimes their stat-based analyses don’t prove to be accurate predictions.
I like Cindi. A. She never pretends to know more than she does. B. She has unbridled enthusiasm for her "Hotties," and isn't afraid to show it. -IM4Oakgal
I thought Billy Owens was hired
to play point forward, alongside Hardaway and Mullin.
"And Julio Franco is batting right-handed!" -- Wayne Hagin, A's radio play-by-play, mid-80s
Very astute point.
Goldstein made this point that even with regard to ML scouts there is very little in terms of performance review due to the fact that mlb scouts move around and change teams so often. There is also as you mention the problem of what failed during the scouting process, was it a failure of player development, scouting, or injuries that is to be blamed in failure of players and prospects. We can eve debate whether seeing injuries should be something discussed as well.
There are HUGE HUGE HUGE problems with evaluation by acclimation. Things like popularity not straying from the heard and tenure are critical to evaluation by acclimation but does not have any predictive value for sucees of the players that you think will succeed.
There are differing opinions on me. According to Iglew "DFA is PT with a sense of humor. PT is DFA with introspective self-doubt. I like them both" but according to sirbed Im "The Stats Killer"
by designatedforassignment on Sep 27, 2009 10:35 AM PDT up reply actions
That would explain it.
I was indeed reading it as the latter interpretation. If DFA meant the former then you’re right, we were talking past each other.
"Go ahead and overachieve, you scrappy Brett-Favre-colored walk-takers." —Rev Halofan
I know because he surely would have told us if he did
He uses his experience with the team as a credential so why wouldn’t if he had any real evidence, like that gained from a systematic look at his projections over a long period of time and their accuracy, why wouldn’t he have shown us?
With out that kind of proof it is impossible to judge the effectiveness of ones scouting abilities.
There are differing opinions on me. According to Iglew "DFA is PT with a sense of humor. PT is DFA with introspective self-doubt. I like them both" but according to sirbed Im "The Stats Killer"
by designatedforassignment on Sep 27, 2009 10:06 AM PDT up reply actions
You're not talking about me, are you?
If so, I thought we weren’t; I thought we were having a more general conversation about the topic.
But in case you are, let me say that I don’t use my experience with the team as a credential — it’s just part of my past that I’ve worked with professional baseball and talked to experts in the field — and that I’ve never tracked my accuracy because I don’t really care.
I enjoy watching, studying, predicting, analyzing, etc., try to be as good at it as I can, and don’t especially worry about my track record because I don’t plan to seek a job in scouting any time soon.
I like Cindi. A. She never pretends to know more than she does. B. She has unbridled enthusiasm for her "Hotties," and isn't afraid to show it. -IM4Oakgal
Again this isn't about you, you were just a convienent test case
What I am saying is that nobody does long term analysis of their own success and there are few people that accurately remember predictions of players accurately because we are humans. But it confirms that you have no real understanding whether you are a someone with an innate talent for baseball scouting, even if you enjoy it on a for personal enjoyment level.
I am not disparaging you when talking about your time as a broadcaster and involved with minor league ball rather it is a very valuable because of the access that you mention.
There are differing opinions on me. According to Iglew "DFA is PT with a sense of humor. PT is DFA with introspective self-doubt. I like them both" but according to sirbed Im "The Stats Killer"
by designatedforassignment on Sep 27, 2009 10:53 AM PDT up reply actions
"Innate talent" and "years of careful study"
are not one and the same — nor are they mutually exclusive.
I like Cindi. A. She never pretends to know more than she does. B. She has unbridled enthusiasm for her "Hotties," and isn't afraid to show it. -IM4Oakgal
Heck, I retired 15 years ago
It’s not exactly a part of my present set of “creds.”
I like Cindi. A. She never pretends to know more than she does. B. She has unbridled enthusiasm for her "Hotties," and isn't afraid to show it. -IM4Oakgal
Not really
When I as a stat person am wrong it is because my data is not accurate, rather than me as an observer being wrong. Its subtle but it is a huge difference.
I saw is not something that you can argue. It is a subjective evaluation of visual data. When someone believes that everything that they say is right from a stats prospective there is always data that can show something else and enough established and accepted error to allow for an individual to be wrong.
There are differing opinions on me. According to Iglew "DFA is PT with a sense of humor. PT is DFA with introspective self-doubt. I like them both" but according to sirbed Im "The Stats Killer"
by designatedforassignment on Sep 24, 2009 9:49 PM PDT up reply actions
Of course you realize this whole thing
is just a semantic difference about what counts as a “discussion”. I was only objecting to your assertion that you can’t have a meaningful discussion based only on visual observation with no stats. In defending your assertion you (and Paul) keep bringing up standards of objective evaluation of players’ relative skill, as if that is a necessary prerequisite for interesting conversation. I reject that premise.
If what you really meant to assert is that you can’t have meaningful objective analysis of players’ relative skills, well, OK then. But to me that sort of conversation gets tedious pretty fast. Kind of like this one did.
"Go ahead and overachieve, you scrappy Brett-Favre-colored walk-takers." —Rev Halofan
The only thing that matters is the persuit of truth which is why Paul and I bring it up.
There are differing opinions on me. According to Iglew "DFA is PT with a sense of humor. PT is DFA with introspective self-doubt. I like them both" but according to sirbed Im "The Stats Killer"
by designatedforassignment on Sep 25, 2009 2:58 AM PDT up reply actions
When old age shall this generation waste
Some day you’ll come to appreciate the other half of the Keatsian antimetabole.
"Go ahead and overachieve, you scrappy Brett-Favre-colored walk-takers." —Rev Halofan
Interesting
When I as a stat person am wrong it is because my data is not accurate, rather than me as an observer being wrong. Its subtle but it is a huge difference.
That comes off as a deep-seated… almost insecure… need to be always right. Or, at least “not wrong”. You’re right… subtle, but huge.
In 2008 I was watching a team that was rebuilding. In 2009 I feel like I'm watching a team that just sucks.
You don't think people have a deep seated need to be right?
There are differing opinions on me. According to Iglew "DFA is PT with a sense of humor. PT is DFA with introspective self-doubt. I like them both" but according to sirbed Im "The Stats Killer"
by designatedforassignment on Sep 25, 2009 6:50 PM PDT up reply actions
He just has a deep-seated need to identify spurious "insecurities" in posters he disagrees with
Linda's in the cold ground, won't see her anymore
Somewhere out on the highway tonight, the drunken engines roar
It's just one of those things, one of those things
-- Al Stewart, "Accident on 3rd St."
In memory of Nick Adenhart and all victims of drunk driving
Everybody does... to varying degrees.
Though the “not wrong” aspect… the ability to push blame off on someone else (i.e.: “it was the faulty data, it wasn’t me”)… is probably the more telling of the two.
In 2008 I was watching a team that was rebuilding. In 2009 I feel like I'm watching a team that just sucks.
You can be wrong with stats for many more reasons than bad data.
Bad methodology behind the stats in the first place is a lot bigger culprit.
"PECOTA can pretty much kiss my ass."-Nico
But I didn't come up with the methodology
someone else did. I just apply it.
There are differing opinions on me. According to Iglew "DFA is PT with a sense of humor. PT is DFA with introspective self-doubt. I like them both" but according to sirbed Im "The Stats Killer"
by designatedforassignment on Sep 26, 2009 7:34 PM PDT up reply actions
Applying it IS methodology.
"Go ahead and overachieve, you scrappy Brett-Favre-colored walk-takers." —Rev Halofan
How so
Methodology to me is the underlying theory. What I do is extend other people’s theories to examples that are present which is way different than creating tRA or what not.
There are differing opinions on me. According to Iglew "DFA is PT with a sense of humor. PT is DFA with introspective self-doubt. I like them both" but according to sirbed Im "The Stats Killer"
by designatedforassignment on Sep 27, 2009 12:23 AM PDT up reply actions
But you choose how to apply it.
Suppose you (or anyone else) were to come along and say that Ryan Sweeney is a better player this year than Nelson Cruz, offering as evidence his slightly higher WAR. That claim is on you, because you are the one who decided that WAR=better.
You can’t hide behind FanGraphs and say, “They’re the ones that said so,” because they did not say so. When Dave Cameron develops his WAR formula and posts the results, he’s not saying Sweeney is better than Cruz; he’s saying, “Here is my formula, this is what it measures, and here is what the data show.”
What you (or anyone else) then choose to do with the data is your own interpretation. That decision represents your underlying theory and beliefs, so I call that methodology.
"Go ahead and overachieve, you scrappy Brett-Favre-colored walk-takers." —Rev Halofan
by iglew on Sep 27, 2009 1:08 AM PDT up reply actions 1 recs
No,
When I as a stat person am wrong it is because my data is not accurate
No. Your interpretation/processing of that data can absolutely be erroneous.
It is different
processing of data can be checked.
There are differing opinions on me. According to Iglew "DFA is PT with a sense of humor. PT is DFA with introspective self-doubt. I like them both" but according to sirbed Im "The Stats Killer"
by designatedforassignment on Sep 26, 2009 8:10 PM PDT up reply actions
You really do seem dogmatic about statistics
But I didn’t come up with the methodology
someone else did. I just apply it.
When I as a stat person am wrong it is because my data is not accurate, rather than me as an observer being wrong.
Do those two statements come from someone who looks at statistics with a critical perspective, one that is aware that statistics can be (and are often) applied poorly? I like a lot of your analysis, but the dogma really hurts your credibility in my eyes. Believe me, I’m not an anti-stats guy (you can look back at every fanpost I’ve written recently) But
Premature post (lol)
But you really don’t allow for any middle ground on the issue.
Obviously stats can be applied poorly
If you are a stats poster from a psychological perspective you can justify being wrong with sample size problems or because you were using someone else’s stat that was faulty etc because you have dissociated yourself from your conclusions in ways that you cant if your entire argument is based on what you see.
There are differing opinions on me. According to Iglew "DFA is PT with a sense of humor. PT is DFA with introspective self-doubt. I like them both" but according to sirbed Im "The Stats Killer"
by designatedforassignment on Sep 27, 2009 12:40 AM PDT up reply actions
The war.
I realize this isn’t specifically on topic, but I’ll take this post as the latest salvo in the ongoing war between those enamored of their favorite analytical tools and those who would rather use a more qualitative approach.
Unless I’m missing my guess, here, this is not one of the salvos in that war — the qualitative/quantitative war. This is taking a well deserved shot at the extra wide confidence interval producing quantitative models that attempt to project future skills progressions/declines. It does seem ridiculous had wide these models project their guesses.
I rec’d this post.
Right
And I’m just applying the OP’s argument to a broader gripe. It’s a completely selfish move on my part, but I like this particular critique because it’s easily related to that broader issue: that statistical analysis isn’t infallible or even necessarily very useful.
To me it’s like cultural anthropologists arguing with sociologists.
by Rancho Canseco on Sep 21, 2009 6:35 PM PDT up reply actions
Obama's ahead of McCain?!
Al: We gotta form a government for the settlement.
Merrick: Who does?
Al: Us! You and me. Come to me in a vision! You stupid bastard.
where is that exactly?
Maybe you can find one made by Go F**k Yourself San Jose... -Poppy
by Leopold Bloom on Sep 29, 2009 3:11 PM PDT up reply actions
Hell, get your vote in for Pat Paulsen before HE ends...
he’s older than the hills, now.
"Flea Markets aren't just for blind dates anymore!"- The Reverend Billy Lard
by Gaijin_Suketto on Sep 22, 2009 10:05 AM PDT up reply actions
I just go with whatever stats DFA tells me to
although I do like to still see a player before I make a judgement on him.
I guess this is one of those times where the stats and my lying eyes agree because I think Mr. Anderson is the best of this baseball bunch.
Does anyone else remember the baseball bunch with Johnny Bench or am I the only one old enough?
Krylon, take me away!
"Flea Markets aren't just for blind dates anymore!"- The Reverend Billy Lard
by Gaijin_Suketto on Sep 22, 2009 10:06 AM PDT up reply actions
"huge intervals":
“huge” relative to what? and why is that bad? there is a wider range of plausible outcomes for pitchers/younger players/inconsistent players. I don’t understand what the problem is with those percentiles. Predicting what baseball players will do in the future seems pretty uncertain. So, why shouldn’t projections demonstrate the uncertainty?
The analogy to political polls does not make any sense. Political polls are saying with their margin of error that a 95% (or whatever) chance that their numbers are within a certain range of the “real” number, where the “real” number would be interviewing everyone. They are not projecting anything. No poll purports to predict what will happen in a year. That is completely beside the point of polls. It’s a completely different thing…
Since Stair’s median OPS is within Utley’s 80% confidence interval, statistically speaking, Pecota cannot be used to conclude that Utley is a better hitter than Stairs.
This does not make any sense. What does “statistically speaking” mean or “conclude.” Why couldn’t you conclude that Utley is very likely better than Stairs?
Projections are not a sampling of data.
With stout hearts, and with enthusiasm for the contest, let us go forward to victory. ----Hero Defector Montgomery
by mikeA on Sep 21, 2009 6:56 PM PDT reply actions 1 recs
to clarify:
the baseball projection equivalent of “actual opinion” in a poll is a player’s “true talent.” But players’ true talents change, and even if you were to get the “true talent” exactly right, there’s a reasonably broad range of outcomes that can flow from any given true talent level. So, there’s way more uncertainty projecting future performance of baseball players than past, static, political opinion… That is not surprising to anyone…
With stout hearts, and with enthusiasm for the contest, let us go forward to victory. ----Hero Defector Montgomery
by mikeA on Sep 21, 2009 7:10 PM PDT up reply actions 1 recs
I think that
if you concede these models have such huge uncertainty, you should not bandy the model as a good decision making tool. Your argument is that performance is hard to predict- I concede that. My point is that given the models cannot give specific projections we should not use them to differentiate finely separated players.
AN attributes a false precision to these models. We act as though a .5 difference in Zips ERA or a .50 difference in OPS is huge. These models are not precise enough to support that kind of argumentation (in my opinion).
I don’t think that these models are wrong for not being precise (they represent the very cutting edge, and I couldn’t have developed them), but that given their lack of precision, we shouldn’t misuse them.
This does not make any sense. What does "statistically speaking" mean or "conclude." Why couldn’t you conclude that Utley is very likely better than Stairs?
Projections are not a sampling of data.
I think you got me here. I just realized that the whole notion of a Confidence interval is not usually applied to predictive models. Thanks. This was a fairly quick rant on the issue (started as a comment on the Cahill thread)…my mistake.
I do stand by the original thesis, which is that AN attributes false precision to the models.
What is the alternative to reliance on models and public quotes by scouts?
We could and do use our own eyes but that isn’t evidence that’s likely to convince anyone else. When other posters simply state their observation, I don’t attach much value to it, because I don’t consider them experts.
If their observation happens to coincide with mine, it’s the same as if I just relied on my own observation.
It's not the results, it's how you look going about those results -- Tim McCarver
by WaddellCanseco on Sep 21, 2009 11:01 PM PDT up reply actions
This is pretty much the point
Most advanced stats (pointedly not including projection systems) exist because they correlate strongly year to year. Thus, someone with a FIP of X is much more likely to repeat that FIP than someone with an ERA of Y is to repeat that ERA.
The r-squared values aren’t anything like 1, of course, but they’re the best we’ve got. That’s why I focus on what a player has done this year or over his career rather than on PECOTA or other projections. It is also why I tend to dismiss arguments that Player Z is going to dramatically improve next season as wishcasting (but am happy on the rare occasions that it actually happens with an A’s player).
"There's never enough time to do all the nothing you want" -Bill Watterson
The question is
If the range of outcomes is that reasonably broad, what do statistics bring to projecting player performance that makes them any better than observation? Indeed, the eyes may well capture something qualitative about a player, like length of swing, attitude and work ethic, or some other thing that PECOTA would never find.
Clearly, statistics bring a lot to the table for evaluating player performance, by helping to shed light on what the eyes can’t see.
"PECOTA can pretty much kiss my ass."-Nico
The reason is that eyes only see a small sample
And brains only remember things that stand out.
So you get “Jeter is a great shortstop because he makes flashy plays” instead of “Jeter has weak range, but makes easy plays look hard.” You get “Player X never hustles because this one time he jogged to first on an infield popup.” You get “Player X’s swing is too long because I saw some long swings yesterday.”
Some of those observations may be true, but if they are having an impact on performance they’ll show up in the stats.
"There's never enough time to do all the nothing you want" -Bill Watterson
Agree and disagree
The fact these confidence intervals if you wanna call them that intersect means nothing. All that this is showing is that there is some low probability that Stair’s projected performance could out perform Utley’s worst performance.
I very much disagree with the paragraph on political polls and confidence intervals. While it is true that a projection will not fall into a confidence interval as the projection is by no means a set parameter, prediction and coverage intervals are widely used in the profession. Thereby asking for such intervals is a perfectly valid question.
by Pucking Insane on Sep 24, 2009 10:07 AM PDT up reply actions
Warning: I'm about to make the single nerdiest analogy ever made on AN
Nonetheless, here goes.
These projections seem to have a fairly uniform margin of error (about 100 points in either direction). One could conceptualize it as roughly like rolling a d10 and adding 20 OPS points times the face of the die to the low number.
Now, the uber-nerdy analogy. Imagine you’re playing a roleplaying game and you have a Sword +2 and your adversary just has an ordinary old Sword. You’re going to win a combat round (for simplicity let’s assume it’s just a straight roll-off to see who wins) a disproportionate number of times— even though it’s clearly within the margin of error for him to beat you. In fact, if you roll two d10s and add 2 to one of them, that die will:
Win the roll 64 times
Tie the roll 8 times
Lose the roll 28 times
And by analogy, a player with a projected OPS 40 points higher will be the better player in the upcoming season roughly 2/3 of the time.
It’s pretty poor prediction, but it is SOME prediction… it seems better than no prediction at all.
Linda's in the cold ground, won't see her anymore
Somewhere out on the highway tonight, the drunken engines roar
It's just one of those things, one of those things
-- Al Stewart, "Accident on 3rd St."
In memory of Nick Adenhart and all victims of drunk driving
by PaulThomas on Sep 21, 2009 7:00 PM PDT reply actions 3 recs
But if the prediction arrived at through statistics, dice rolling, etc.
is not more fine-tuned than a simple educated guess, then why bother? At that rate it’s all onanism. Puffery.
by Rancho Canseco on Sep 21, 2009 7:04 PM PDT up reply actions
It is more fine-tuned than a simple educated guess...
Linda's in the cold ground, won't see her anymore
Somewhere out on the highway tonight, the drunken engines roar
It's just one of those things, one of those things
-- Al Stewart, "Accident on 3rd St."
In memory of Nick Adenhart and all victims of drunk driving
by PaulThomas on Sep 21, 2009 7:29 PM PDT up reply actions 1 recs
How could we verify this claim, Paul?
Suppose a scout picks future stars out of AA at a rate 10% better than random.
Would a model that has error ranges so broad do any better?
Your statement that some prediction is better than none at all assumes that scouting is literally a 100% random guessing system, with no prediction whatsoever.
I don't think it's random, I think it's systematically biased in certain problematic ways
Linda's in the cold ground, won't see her anymore
Somewhere out on the highway tonight, the drunken engines roar
It's just one of those things, one of those things
-- Al Stewart, "Accident on 3rd St."
In memory of Nick Adenhart and all victims of drunk driving
I can understand that position...
but it’s not the same position you took before. Obviously scouting has some nasty biases (as evidenced by player comps, “grittiness” etc.).
But the real question is whether models are better at picking out success than your average MLB scout. The scout might be able to pick the better player 2/3 of the time too (as the model does in your example).
You're right
The biases thing only came to me just as I was reading that last post. I was trying to figure out why I didn’t have much confidence in scouts, and I think it has to be bias— there are certain players (let’s say 10% each, for argument’s sake) they will predictably overrate and underrate, while probably not doing any better than the projection systems at measuring the other 80%.
Linda's in the cold ground, won't see her anymore
Somewhere out on the highway tonight, the drunken engines roar
It's just one of those things, one of those things
-- Al Stewart, "Accident on 3rd St."
In memory of Nick Adenhart and all victims of drunk driving
got it
I don’t know how we’d measure the success rate of scouts and compare to projection systems, so we’ll have to leave this issue here.
One other question. On an ethical level, I certainly dislike umpire bias, but on a substantive impact on team success level, is scouting bias any worse than model bias? Many models are not as good at evaluating certain types of players (such as fast players) and thus don’t capture a predictor of performance. One could argue that the Angels are exploiting the fact that other team’s models are not capturing the value of speed/aggressiveness.
So if the model has biases against types of play, while scouts have biases against types of people, I know which one I prefer on an ethical level, but why should we prefer the model on a “likelihood of team success” level?
As one who has beaten the Angels-Pythag drum a few times
I’d like to elaborate on this a bit. I do believe that the Angels beating their Pythagorean consistently is more than a coincidence, but I don’t believe this means the Angels are pursuing some sort of Moneyball tactic whereby they gather up players who are undervalued by the standard sabermetric model used by all other teams.
Rather, I think the Angels are pursuing a strategy designed to win games, and the specifics of that strategy just happens to be such that many of their players outperform their projected numbers in certain ways. It does not necessarily follow that these players are undervalued by the market nor that the Angels acquired them as such.
Bill James’ Pythagorean theorem is simply his discovery that past history of runs for and against, when formulated in a certain way, is a better predictor of future win-loss than is past history of wins and losses. This doesn’t mean that it’s a perfect predictor, nor does it imply that a team’s Pythagorean record is the "true" measure of how good it is or what the team’s W-L record "ought" to be.
Given that the Pythagorean formula is not a perfect predictor, it would be no surprise if certain styles of play will lead the formula to slightly overpredict W-L and certain other styles will lead the formula to slightly underpredict. As years of data build up and some teams follow distinct strategies, we may start to see some teams consistently overperform or underperform their Pythagorean record. Though it’s still a little early to declare it conclusively, I think the Angels recent history is an example of that.
But to characterize it as "outperforming" the Pythagorean, as if the Pythagorean were an absolute standard, is potentially misleading. If a team’s W-L is well over its Pythag one year due to "luck", you might reasonably say that the team is not as good as its record. But if a team’s W-L is over its Pythag because it’s playing in a way that causes the Pythag to mismeasure it, then the team is simply as good as it is.
A strategy which wins by beating the Pythag is not inherently better than a strategy which doesn’t. If some other team were to come up with a strategy that caused it to finish with 90+ wins for six seasons in a row but to do so while underperforming its Pythag, that would be just as good a strategy.
"Go ahead and overachieve, you scrappy Brett-Favre-colored walk-takers." —Rev Halofan
by iglew on Sep 22, 2009 2:01 PM PDT up reply actions 3 recs
Rec'd
Not because I necessarily agree with you that the Angels are likely to continue beating their pythag going forward (I’m just not sure), but because you’re thinking about it exactly right.
"There's never enough time to do all the nothing you want" -Bill Watterson
fundamentals of Pythag
I’ve been thinking for a while about writing a fANpost describing how you can derive the formula for Pythagorean wins, but a) I don’t have free time right now, b) it probably exists somewhere on the internet with a clearer explanation than I will provide, and c) it would be a fANpost full of equations that no one would want to read.
But, for a discussion of how teams “beat” Pythag, a crucial point is that the formula isn’t really based on anything specific to baseball. Rather, the key assumption underlying the formula is that the runs you score and allow in a game are uncorrelated. This leads to the most well known feature of teams that beat their Pythagorean records, which is having bullpen aces. If the A’s are in a close game, then Bailey, Wuertz, and Ziegler will be pitching the late innings. If it’s a laugher (in either direction), then you are more likely to see EGon or Casilla on the hill. This does break the key assumption that stands behind the formula.
Not that I have any answers regarding the Angels. Also, I think that a lot of the LAAA discussions this season have been about how they got like 5 or 6 of their players to have career years simultaneously.
There is probably something to that
I was debating the Pythag record argument with a couple people last year and went as far as to look into the score of every game the A’s and Angels had played up to the time of the debate, noting how many times each team scored a certain amount of runs and so on, and it was clear that the Angels played more close games and the A’s had a lot of very low-scoring games skewing some of their totals with a few blowouts also adding in while the Angels had fewer real blowouts.
Either way, it was clear that it takes more than just looking at runs scored and runs allowed, and like you noted the closer games are the less likely you’ll see lower-quality pitchers who are mainly reserved for games that are out of reach one way or the other, so when they give up runs it may not matter to the game but it does affect the Pythag record.
Last of the Ninth - Photography
Funny that you give the A's as an example
If the way to beat one’s Pythagorean record is with bullpen aces, the Oakland A’s stand as a counter-example.
I haven’t looked up Oakland’s pythag numbers, but given that as of this Sunday the A’s had scored more runs than it had given up, and they most assuredly have not won more games than they’ve lost, I think it’s a safe guess that the team is not even remotely beating its Pythagorean record.
"Go ahead and overachieve, you scrappy Brett-Favre-colored walk-takers." —Rev Halofan
It is more common that teams that beat their pythag have really terrible backs of their bullpens as opposed to aces
I haven’t looked at it this year, but last year the reason they beat their pythag by so much (and I think it was the most of the decade) was clutch hitting, much more than anything about their bullpen. They had an .814 team OPS in high leverage situations, and a .677 team OPS in low leverage situations, which is pretty amazing. The league as a whole typically hits better in low leverage situations, for obvious reasons.
With stout hearts, and with enthusiasm for the contest, let us go forward to victory. ----Hero Defector Montgomery
You're absolutely right to distinguish
the Angels’ win-loss record relative to their Pythagorean from the fact that they’ve had several players outperform their projections.
I think the two are separate questions, though perhaps questions with some similarities. In both cases it is commonly attributed to luck, but it might not be luck.
If Angels players continue to beat their projections, I’ll start to wonder if the organization doesn’t simply have a better system of evaluating players. Even then, I don’t think that’s so much a statement on the Angels reading talent better than every other team so much as a statement that the projections that have wide currency on the Internet are not the best projections in existence. I don’t think any team uses them exclusively, as evidenced by the fact that there is sometimes consensus among many observers that so-and-so is better than the projection model says he is. That simply demonstrates that the model is known to be imperfect.
We already saw this in the BABIP discussion. It is known that some of the commonly used formulas don’t give proper value to speed, so if an organization in the habit of acquiring guys with speed, many of them will “beat their projections”.
"Go ahead and overachieve, you scrappy Brett-Favre-colored walk-takers." —Rev Halofan
One of the problems that often comes up in these discussions
is that people keep confusing strategies for beating your Pythagorean record with strategies for beating your third-order record (predicted wins and losses based on expected runs scored and expected runs against).
If you somehow perfected some sort of smallball strategy which led to a dramatically higher percentage of your baserunners scoring than do on a normal team, those runs would show up in RS and thus in Pythagorean record. Where they would not show up is in third-order record, which just looks at singles, doubles, walks, etc etc etc and ignores things like “clutch hitting” and “productive outs.”
I can see how a team with the Angels’ philosophy could, if they really execute it perfectly, consistently beat third-order record. I get the mechanism; I just question whether they’re really executing it as well as people think. (There also might be some strategies for beating your expected runs allowed— perhaps unusually incisive platooning of relievers by the manager?)
But it’s almost impossible to conceive of a “strategy” that involves beating your Pythagorean record. That strategy requires that your players play well when games are close and (relatively) poorly when they are not close. If your players are capable of playing well when games are close, why are they not doing that all the time? The only easy answer to that involves bullpen usage patterns (where different players are playing when games are close and not close). When it comes to position players, I’m completely at a loss— teams don’t, as a general rule, sub in lots of backup position players in blowouts, and the number of at-bats involved in that “strategy” would be quite minimal anyway.
Someone (Stephen Pinker, I think) once talked about the difference between a problem and a mystery. Beating third-order record is a problem— there seem to be some strategies, the question is whether you can execute them. Beating Pythag is a mystery— why and how would anyone ever end up there other than luck?
Linda's in the cold ground, won't see her anymore
Somewhere out on the highway tonight, the drunken engines roar
It's just one of those things, one of those things
-- Al Stewart, "Accident on 3rd St."
In memory of Nick Adenhart and all victims of drunk driving
I think the only way to start this
would be massive number-crunching. There’s an incredible amount of season-data on pythag, and although I can’t rattle off all the stats you need for 3rd-order, there must be at least a thousand seasons of full data, and many thousands if you’re willing to do without things like CS.
Just crunch all the numbers, find every team-season that’s way beyond some number of standard deviations, and look for all the similarities.
I’m pretty sure Bill James did a version of that in the 1980s, and said he couldn’t find anything that clearly led to under-performance or surpassing pythag projections. I don’t know whether he did that for 3rd-order though.
"And Julio Franco is batting right-handed!" -- Wayne Hagin, A's radio play-by-play, mid-80s
The key point here, if I may emphasize it again,
is that (in my opinion) the Angels aren’t trying to beat their Pythag; they’re simply trying to win. But they’re doing it in a way that coincidentally happens to break the Pythagorean correlation. (And by “break” I mean only that it dents it enough to show a small but significant discrepancy; on the whole, the Pythagorean theorem is still very robust.)
I think you’re probably on the right track when you suggest that players play relatively better when the game is close and relatively poorer when it is not. Some others who have studied the Angels numbers have found that the main component of the team’s Pythag-W/L discrepancy is attributable to the team’s hitters hitting better in high-leverage situations.
As for why the hitters don’t simply hit better all the time, I don’t know. I addressed the question in the earlier thread, including suggesting a few possible explanations. Not that I’m actually saying any of them really is the cause, but just to illustrate the fact that it’s not an inherently nonsensical possibility. Yes, players will try to play as well as they can all the time, and I don’t think they’re deliberately sandbagging in low-leverage situations in order to “save it” for higher-leverage situations. But there are a lot of things that can affect how well a hitter performs. Is it so inconceivable that there could be something going on that, on balance, tilts the entire team toward hitting better in high-leverage situations?
Just because you can’t pinpoint a cause doesn’t mean one doesn’t exist, nor even that it’s improbable. How many times have you faced a puzzle where you say, “I can’t think of anything that could possibly explain that,” but then later you find out what it is and you think, “Oh, ok, that makes sense.”
I believe that there’s a reason the Angels are beating their Pythag that is related to their style of play. Perhaps some day it will be explained, or perhaps it won’t. But I think there probably is a logical explanation besides just statistical noise.
"Go ahead and overachieve, you scrappy Brett-Favre-colored walk-takers." —Rev Halofan
This would be well and good if we knew nothing about clutch hitting
The problem is, we do know a lot about it— specifically, we know that it basically doesn’t exist. People have attacked it from a whole variety of methodological angles and about the best thing they can come up with is that maybe players with good BB/K ratios might, maybe, be like 1% better than average in high-leverage situations. And the Angels don’t even particularly have good BB/K ratios.
It’s not like there’s some wide-ranging unexplained mystery about clutch hitting. There’s literally nothing to explain.
Now, I suppose there could be something like sign-stealing going on, which might make this a different situation than your average “clutch hitting” game, but I kind of have a hard time seeing how the Angels could have successfully hoodwinked every other team in MLB for this long. Teams have been trying that trick since the invention of the curveball. On top of that, I have to think someone leaving the team would have leaked it by now, if only to help his own new team accomplish the same thing.
Linda's in the cold ground, won't see her anymore
Somewhere out on the highway tonight, the drunken engines roar
It's just one of those things, one of those things
-- Al Stewart, "Accident on 3rd St."
In memory of Nick Adenhart and all victims of drunk driving
Given that the Pythagorean formula is not a perfect predictor, it would be no surprise if certain styles of play will lead the formula to slightly overpredict W-L and certain other styles will lead the formula to slightly underpredict.
I like this. And that strategy could be finding players that are more consistent so that the distribution of run scored and the distribution of runs not surrendered lead to less blow-out type games where good performances are often insignificant [e.g. several 6-2 wins combined with several (plus one) 5-3 losses will net much less wins than several one-run decisions that still produce a .600 winning clip: yet, in the first scenario, the Pythagorean formula forecasts more wins than the latter scenario].
A year or three agao, someone here once did up a useful frequency table (with graphic) showing the runs scored and runs allowed by the Athletics over the whole season. it led to a pretty good discussion IIRC.
by LowcountryJoe on Sep 26, 2009 7:43 AM PDT up reply actions
finding players that are more consistent
1. For pitchers, inconsistency would be better.
2. For hitters, the relevant consistency would be performing about the same against good and bad pitchers, which would seem to be tough to find. Arguably, that would be true of very low-K/low-BB/low-HR guys, but trying to find it would be unlikely to be fruitful.
With stout hearts, and with enthusiasm for the contest, let us go forward to victory. ----Hero Defector Montgomery
"figuring out a way to beat the pythag"
This question seem to all assume that the angels beating the Pythag is because of positive work they do. Could it be their runs scored should be higher, but they activley do something that supresses the runs.
If you know of something that actively suppresses the runs,
could you shoot me a private email ASAP? kthanxbye.
I like Cindi. A. She never pretends to know more than she does. B. She has unbridled enthusiasm for her "Hotties," and isn't afraid to show it. -IM4Oakgal
Why private?
I mean, in spite of the nature of the crappy topic, wouldn’t you want others to benefit from the knowledge? Why not make diary a a diary out of it.
by LowcountryJoe on Sep 27, 2009 9:28 AM PDT up reply actions
Kissing should be done in private, not on the bases
And LowcountryJoe, I’m too busy working on a different fanpost called “Once AGAIN, Can The A’s Get To .500??”
I like Cindi. A. She never pretends to know more than she does. B. She has unbridled enthusiasm for her "Hotties," and isn't afraid to show it. -IM4Oakgal
You know, 6 and 0 is all it would take
and then that guy is going to be somewhat justified in writing “I told you all so!”
by LowcountryJoe on Sep 28, 2009 3:09 AM PDT up reply actions
On my fantasy team,
there is making out on the bases.
"Go ahead and overachieve, you scrappy Brett-Favre-colored walk-takers." —Rev Halofan
On your fantasy team, sometimes you even get to 3B
I like Cindi. A. She never pretends to know more than she does. B. She has unbridled enthusiasm for her "Hotties," and isn't afraid to show it. -IM4Oakgal
There is even occasional scoring,
but we do so enjoy the station-to-station strategy.
"Go ahead and overachieve, you scrappy Brett-Favre-colored walk-takers." —Rev Halofan
Station-to-station?
Is that some Kurt Cobain/Red Hot Chillli Peppers reference? If so, I finally get the spheres part of it?
by LowcountryJoe on Sep 28, 2009 3:16 AM PDT up reply actions
Unless it is Queen or Billy Joel
it’s a pretty safe bet that I’m never making reference to any pop music after 1970.
I’m just saying we like to spend some time at 1B, then at 2B, then at 3B, before finally continuing home.
Touch ’em all, baby!
"Go ahead and overachieve, you scrappy Brett-Favre-colored walk-takers." —Rev Halofan
No one cares about your fantasy team!
There are differing opinions on me. According to Iglew "DFA is PT with a sense of humor. PT is DFA with introspective self-doubt. I like them both" but according to sirbed Im "The Stats Killer"
by designatedforassignment on Sep 27, 2009 10:53 AM PDT up reply actions
The thing is, I think scouts do have biases about types of play as well as types of players
Certain kinds of plays are extremely impressive (diving catches, dazzling speed on the basepaths, etc) while not actually contributing that much to team victory.
I think in order to really come up with a truly predictive model based on scouting, you’d have to packetize the analysis and give a dozen or two dozen scouts little pieces of a player to analyze, like “footwork when fielding grounders.” Asking guys to make accurate overall evaluations of a player is, I think, asking the near-impossible.
Linda's in the cold ground, won't see her anymore
Somewhere out on the highway tonight, the drunken engines roar
It's just one of those things, one of those things
-- Al Stewart, "Accident on 3rd St."
In memory of Nick Adenhart and all victims of drunk driving
Where is this scout? If AN wants to interview one that would be great.
All we have access to is the scouts interviewed by Goldstein, BA, Rosenthal et al, and the publicly available projections.
It's not the results, it's how you look going about those results -- Tim McCarver
by WaddellCanseco on Sep 21, 2009 10:58 PM PDT up reply actions
WC
I’m not sure I get your argument. Is the point that because we don’t have access to info other than statistics, we should privilege those models? It seems to me that we have access to plenty of sources of info. Maybe instead of making fun of Keith Law, we can treat him like a slightly miscalibrated statistical model (I’m laughing as I type this).
To be honest, my position is not that we stop using the models. I just don’t think they’ve earned the privilege of being held above all other forms of analysis.
I'm totally in favor of using expert scouting opinions if we can find them
It’s just that we don’t often have access to them. Other than professional scouts and statistical projections, what else is there?
It's not the results, it's how you look going about those results -- Tim McCarver
by WaddellCanseco on Sep 22, 2009 1:41 AM PDT up reply actions
Right
It’s one thing to say “stats aren’t everything” (a true statement). It’s another thing to say that anyone’s observations can be summoned to trump stats (false). The question is what the practical effect of that is when most professional non-stat opinions are not publicly available.
"There's never enough time to do all the nothing you want" -Bill Watterson
I suspect there's quite a bit of
professional stat opinion that’s not publicly available either.
When a top sabermetrician gets hired by a team, he doesn’t stop working, he just stops publishing.
"Go ahead and overachieve, you scrappy Brett-Favre-colored walk-takers." —Rev Halofan
There's certainly some, just a whole lot less
Scouts don’t get hired via free publication of their ideas. Sabermetricians generally do.
There certainly isn’t sophisticated and comprehensive scouting available online to match what Fangraphs et. al. provide.
"There's never enough time to do all the nothing you want" -Bill Watterson
+5!
PT breaks out the vorpal sword of analogies.
and then makes saving throw versus nerdiness.
by LowcountryJoe on Sep 21, 2009 8:13 PM PDT up reply actions
reced for title alone.
I guess I should read the post now.
[b]More Rajai Davis & less mount Davis[/b]
Does Rajai Davis know Al Davis?
by Athletics fan and runner on Sep 21, 2009 8:26 PM PDT up reply actions
PT hits the nail on the head
76 +- 8 percent of the time… and this is definitely one of them.
Yes, some of the projections have laughably large margins of error, but the fact that the error estimates exist represents a clear advantage over a scouting report. Variance is a fact of life in baseball (and pretty much everywhere else). Any team will have breakout performers and under-achievers in any given year — and there is no way to predict these with 100% confidence. However, there are 25 players on a baseball team (and many more if you count everyone who shuffles through a roster over a given season). A projection system might have a 100 point margin of error for OPS predictions, but if you construct a lineup of nine guys with projected OPS of 700 and I construct a lineup of nine guys with projected OPS of 750, then my lineup will beat yours something like 85% of the time, even though the difference is only half a standard deviation per player.
That 85% number came from assuming Gaussian statistics on this unnamed projection system, and that’s almost certainly not right. But the advantage of using a projection system based in statistics is that we have some hope of understanding the distribution of outliers, biases in the distribution, higher moments, and so on. This could be an incredibly complex problem, but at least it is one that can be posed.
I would propose that you should rely on subjective scouting judgements only when those judgements are significantly more precise than projection systems. Probably this happens more than people give credit to it. The evaluative process that goes into a scouting report is probably orders of magnitude more complex than the spreadsheet formulas. But when you get down to the low signal-to-noise regime (and remember, the noise level is very high in baseball), then you have to deal with a model where uncertainties can be quantified.
to put it another way
if I was writing this fANpost, instead of ohmangoas, then instead of saying something like “why use these predicition systems when they have such large margins of error?” (not a direct quote, just my paraphrasing), it would be “why use these scouting reports that make no effort to quantify their uncertainties?”
There is no way to quantify uncertainties in scouting
(besides, ironically, to keep statistics on the scout over a long period of their career)
but the lack of precision in the projection systems is really large. There’s no reason why any MLB scout couldn’t have told you that Rajai would be between a 625 and 800 OPS. You should justify why a known imprecise system is automatically better than a less known imprecise system. (I say less known because a scout’s track record might tell us some things about their typical error margin.)
Your argument that a team with 9 750 players will beat a team with 9 700 players is certainly true. But I could construct an identical argument for a skilled scout.
A team put together based on a scouts’ projection will beat a team picked without said projection most of the time.
Certainly the scout will bust on some of his projections, but since he’s better than a random player selector, he’ll improve the team on balance. Granted, I can’t quantify how much better he is than random, but hopefully, his track record of skill in player evaluation is the reason the A’s hired him.
I'm not at down on scouting as you suggest
I said above that you should base decisions on scouting when they are more precise than the model. As you point out, it’s pretty easy to be more precise than a predicition of “anywhere between 625 and 800 OPS”.
poor planning on my part, thinking i could escape the horrors of the stat midterm i just took
by coming here
Save Rajai Davis
haha
don’t learn about confidence intervals from me.
I rec'd it because I want to learn.
I love it!
Hey, Raburn! YOU ever dive into the shallow end of a pool?--noava22
Just to clarify,
your argument only applies to projection systems. The complex stats that are used by us statheads on A’s nation most of the time (WAR, wOBA, FIP, tRA, etc.) are not projections, but simply means of classifying data. There’s no “margin of error” for WAR, because WAR is just a data set of already-completed events put through a formula to produce a number.
Sure there are problems with all these stats in their ability to show skill levels, but there’s no margin of error in WAR’s ability to show Wins Above Replacement.
"Life is a horizontal fall" -Jean Cocteau
by King Richard on Sep 21, 2009 8:54 PM PDT reply actions 1 recs
it isn't updating the title
At least it isn’t on my comp.
But you are correct, all those metrics are just that- evaluations of past outcomes.
Yup,
it changed on my computer.
Definitely agree with you when it comes to projections, that the margin of error should be factored in. I wouldn’t call projections “meaningless” without them, as .660-.860 is still different than .750-.950. I actually really love looking that those percentile projections, because it really shows how there’s a certain percent chance that a player will figure things out, or break down, or stay the same, etc. and here are the possibilities, sorted by outcome.
It also lets me be optimistic about the A’s every year by saying “they might all break out!” and then looking at everybody’s 90th percentile projections…
"Life is a horizontal fall" -Jean Cocteau
by King Richard on Sep 21, 2009 9:59 PM PDT up reply actions
As others have stated already,
the confidence level of the various sabermetric stats is 100%. Aside from the possibility that the person compiling them typed a wrong number into the spreadsheet, they are a perfect measure of what they seek to measure. They are no more uncertain than, say, batting average. You don’t ask what’s the margin of error on a guy’s batting average.
The question is what predictive power they have. In the list of ZiPS score, there’s no margin of error. That really is the correct ZiPS scores for those guys. How likely it is that the guy with the best ZiPS score will be the best pitcher next year is another question entirely.
I would also add that some of the stats are rather aggressively named. WAR, for instance, stands for “wins above replacement”. A guy’s WAR number is a completely reliable measurement of his skills as defined by the WAR formula. But if you proceed on the assumption that it represents that many actual wins over an actual replacement player, then you’re making an interpretation equal to your faith in the formula. The formula itself is based on analysis of past data, so it’s not just arbitrary or subjective, but even so it’s just measuring what numbers have correlated with wins in the past.
"Go ahead and overachieve, you scrappy Brett-Favre-colored walk-takers." —Rev Halofan
by iglew on Sep 21, 2009 10:32 PM PDT reply actions 1 recs
Since you've quoted my post, I want to clarify....
I don’t believe that Cahill is one of the five most talented pitchers on the A’s solely because of the ZiPS projection. It’s my opinion based on a combination of scouting opinions as conveyed by Sickels and Goldstein and BA, projections including ZiPS and PECOTA and CHONE, the judgement of the A’s, and my own eyes.
The fact that ZiPS thinks he’s ready to be one of the five best pitchers on the team based on his record up through the middle of 2009, means that there’s a strong argument that he should be in the rotation right now, no matter how bad he was in the first half of 2009.
Since ZiPS doesn’t provide confidence intervals or standard deviations, I choose not to worry about them, since it is the most respected mid-season projection that I’m aware of.
Since I assume that other posters don’t care about my amateur scouting or gut feeling on Cahill’s talent or readiness, I believe the best way for me to contribute to the discussion is to cite whatever evidence that I have access to.
If someone chooses to ignore evidence just because it doesn’t come with a disclaimer or confidence intervals that’s their decision.
It's not the results, it's how you look going about those results -- Tim McCarver
by WaddellCanseco on Sep 21, 2009 10:55 PM PDT reply actions
It's hardly meant to call you out
by the way. I hope that was clear.
Since ZiPS doesn’t provide confidence intervals or standard deviations, I choose not to worry about them, since it is the most respected mid-season projection that I’m aware of.
Since I assume that other posters don’t care about my amateur scouting or gut feeling on Cahill’s talent or readiness, I believe the best way for me to contribute to the discussion is to cite whatever evidence that I have access to.
I agree with your second paragraph. I think we should link to any analysis we find interesting or compelling. But I think that “respect” you refer to in paragraph one is worthy of question.
Well there aren't any other midseason projections with any level of respect in the
statistical community, I think I’m on solid ground in saying ZiPS is the most respected. Now whether ANY projection is worthy of respect is another matter entirely.
It's not the results, it's how you look going about those results -- Tim McCarver
by WaddellCanseco on Sep 22, 2009 1:44 AM PDT up reply actions
You used ZIPs because it is respected, but I think that’s just group think. We should actually think about the metrics we cite, and how valid they are for what we are using them for.
I’m really sorry you were the only example in the fanpost, because I hardly think you were wrong or even unusual for using ZIPs that way. I didn’t cite the quote as being you because I didn’t want it to be about the validity of your argument, but rather, I wanted it to be about the validity of this type of argument.
Projections have value, but only to certain extents
That was easy. :-D
Last of the Ninth - Photography
Political polling and baseball projections
have basically nothing in common other than the fact that Nate Silver’s made money off of both of them.
The data involved are massively different. Poll aggregation (like what fivethirtyeight.com did during the 2008 election) is collecting data on future performance, i.e. a voter says “I’m definitely voting for Obama”. What baseball projections are usually doing is taking the past performance of a player, comparing it to players who had similar past performances, and using the continued performances of those players to predict the future performance.
When you get players who have a small number of comps like Giambi, such systems are going to give much less definite results because of a much smaller pool of data. PECOTA always chokes on Ichiro. Silver and the rest say it flat out, there’s no good comps out there for him. He is a truly unique player in the history of baseball.
I heard Nate Silver talk
at a UChicago alumni event (before taking in a White Sox game). He actually did claim that there are some similar players to Ichiro, but that they tend to be from the early 20th century.
Which, I'm sure he noted was it's own issue
Comparing Rogers Hornsby to Chase Utley is fraught with peril.
Yeah, I acknowledge this above in the comments
I screwed up the analogy, since polls are attempting to measure something in the current population, not a projection.
I think the principle still applies (we should know the error intervals on any statistical tool we use)
Whether political polls are or aren't a projection
is really at the center of the issue of political prediction. The polls as published are simply a snapshot of what the electorate is saying right now. They are generally published in the news and treated by journalists as if they are a projection even though they aren’t.
For example, if in the few days after the Republican convention the Republican candidates poll numbers move up, all the journalists will say that that candidate has “moved ahead”, even though the change it has negligible effect on the probability that he’ll win the election. Some journalists will acknowledge this by conceding that we expect that after the convention effect wears off his numbers will “move” back down again, but they still persist in treating it as a swing in momentum rather than a measure of the inadequacy of the polls.
Pollsters and statisticians who study political polling are well aware of all this, and it has been widely understood for decades that if you’re more than a few weeks out from an election there are other metrics which are far better predictors of the final result than a poll of people telling you how they would vote “if the election were held today”, but until recently there was little demand for such analysis in the media.
What Silver and others like him have done in recent years is bring that distinction out in the open, by publicly “correcting” for the various “errors” inherent in the polls.
Incidentally, I was once told by a professional statistician who did some sabermetric work that that too is only a measure of past data and he was emphatic that he is not in the business of prediction. If the client chooses to draw predictive conclusions, that’s on them.
"Go ahead and overachieve, you scrappy Brett-Favre-colored walk-takers." —Rev Halofan
This
I was once told by a professional statistician who did some sabermetric work that that too is only a measure of past data and he was emphatic that he is not in the business of prediction. If the client chooses to draw predictive conclusions, that’s on them.
FIP is not a prediction, it just does a better job than ERA (or pitcher wins) of measuring a pitcher’s skill. As a result, it correlates year-to-year better because the pitcher’s true talent (usually) doesn’t change (very much).
"There's never enough time to do all the nothing you want" -Bill Watterson
by nevermoor on Sep 22, 2009 3:07 PM PDT up reply actions 1 recs
And there's still another way to slice that data conceptually
“Which A’s player produced the most for the A’s this year?” is not the same as, “Which A’s player played the best this year?”
The first question will ask you to factor in all kinds of accomplishments that we might attribute to luck (e.g., a very high BABIP). The second question will ask you to exclude those from consideration.
Quick SSS example: if Player A goes 0-4 with 3 screaming liners and a HR that’s caught over the fence, and his teammate Player B goes 0-3 with 3 Ks, all on swinging strikes on sliders that bounced low and away, but also has a HBP, then player B produced more for the team in that game, since a .250 OBA is better than a .000. But Player A actually played better.
"And Julio Franco is batting right-handed!" -- Wayne Hagin, A's radio play-by-play, mid-80s
I don’t directly provide the margin of error, but choose a different approach of having my projections available to any and all interested parties that want to evaluate how projections fare (and they are a lot of them floating around on the web). Simply put, the differences between ZiPS and PECOTA and CHONE are so minute and there are enough legitimate methods for determining accuracy that one can always make one of these systems “win” over another one.
Even if I go into that with the best of intentions, it still involves people, 99.999% of them not knowing me on a personal level, having to take my word for something and even if I don’t consciously pick the best method for making ZiPS look good, I’m also a human with a human ego and my brain will likely hint to me that an approach that makes me look better is better than an approach that makes me look worse.
I do post error ranges, but I present them slightly differently as a series of odds rather than percentiles (so BP would list the BA that the player has 10%, 20% and so on chance of achieving and I list what chance specific BA threshold have of being surpassed, etc.)
In-season projection margins are really tricky to do and present, simply because it’s not fixed. As the season advances, your sample size to base a projection off of increases, but the actual future play decreases, which results in the odd situation of rest-of-season projections being the most accurate in May (after game 161, you have the most information about a player to-date, but only 1 game left, so projections have a ridiculous error!).
For in-season, I do a relatively simple model, as I’ve noted. It has to be as simple as possible so that Appelman can have everything automatically update on FanGraphs, so it’s kind of using Ghetto Bayesian Inference on Excel.
--
Dan Szymborski
dan@baseballprimer.com
by D.Szymborski on Sep 22, 2009 5:57 PM PDT reply actions 2 recs
thanks for posting here
If you have error estimates for full season projections, it would be awesome to provide rough estimates of the in-season results.
I’m thinking that you could just scale the full season uncertainty by sqrt(162/N), where N is the number of games played (maybe you actually want a number somewhat less than 162). Then, if you have rate stat predictions, you could put an additional error bar on counting stats for the remaining games with Poisson statistics. Finally, just add the two errors in quadrature. I’m sure it’s not the most technically correct thing to do, but it would definitely put you in the ballpark.

by 























