Athletics Nation: An SB Nation Community

Navigation: Jump to content areas:


Pro Quality. Fan Perspective.
Around SBN: Ole Miss-Alabama: "Let's Go Eat.Wait. What Happened?"

salb918 article at Beyond the Boxscore

i don't know why he hasn't posted it here yet (out of modesty, maybe?), but our fellow ANer sal wrote a pretty interesting guest article over at beyond the boxscore.  the more stat-oriented folks here at AN should definitely check it out.  we're lucky to have such smart people as a's fans!

i've only read 1/3 of it so far, and there are tons of excellent points that i had never even thought about...

http://www.beyondtheboxscore.com/story/2005/7/1/17483/60819

0 recs  |  Comment 38 comments

Story-email Email Printer Print

Comments

Display:

i'm glad i posted this
right before i got to point 3: "Here is where it gets tricky."

i think my head's going to start hurting pretty soon...

by xbhaskarx on Jul 1, 2005 5:45 PM PDT reply actions   0 recs

Damn!
Interesting stuff... I understand just enough statistics to nod while reading this and say to myself "hmm, sounds like he knows what he's doing..."

Recommending this though - this is the kind of thing we should all read. Not necessarily because we'd understand it, but this is the kind of thinking that goes on in our front office all the time.

by RickeySteals on Jul 1, 2005 6:04 PM PDT reply actions   0 recs

I know just enough statistics
that if someone who actually knows statistics read it, he or she would say, "This guy is a blowhard."
Fearing Mecir since 2000.

by salb918 on Jul 1, 2005 8:42 PM PDT up reply actions   0 recs

I swear...
I read almost the same thing on bp earlier this year when they were trying to find out exactly how much more important OBP was than SLG, and if the quote from DePodesta in Moneyball about OBP was accurate...

Not taking anything away from that post though.  Very well written.

by chri5 on Jul 1, 2005 6:28 PM PDT reply actions   0 recs

Can you give me
a link to this article.  I'm interested.
Fearing Mecir since 2000.

by salb918 on Jul 1, 2005 10:40 PM PDT up reply actions   0 recs

needle in a haystack
I'll see if I can find it for you.

by chri5 on Jul 1, 2005 10:47 PM PDT up reply actions   0 recs

Thanks
No rush.
Fearing Mecir since 2000.

by salb918 on Jul 1, 2005 11:03 PM PDT up reply actions   0 recs

cant find it
These are the closest I could find that are relevant to what your article is about.  First is looking at run scoring for bad teams, and which components best predict the percentage of runers on base that will score.  The bottom is looking at lineup optimization.

http://baseballprospectus.com/article.php?articleid=4004
http://baseballprospectus.com/article.php?articleid=3766
http://baseballprospectus.com/article.php?articleid=3779

I swear I read something where they looked at a better number than OPS, and used similar methodology to your project.  hmmm...

by chri5 on Jul 1, 2005 11:03 PM PDT up reply actions   0 recs

I read the
lineup optimization ones, and contacted Mr Click (the author) about them, since his work was similar to what I was doing.  He was kind enough to exchange ideas about my project. Good folks at BP, they are.
Fearing Mecir since 2000.

by salb918 on Jul 1, 2005 11:10 PM PDT up reply actions   0 recs

good job sal!
Interesting stuff! You should send a resume to Billy.

One tweak you might consider is your model assesses the predictive power of different coefficients by referring to a reference model of 9 clones batting.  It seems likely to me that you would get something different if there were different random mixes of players hitting around a player with different profiles... e.g. 8 Ichiros with 1 power hitter would probably score more runs than the weighted average of 8 all-Ichiro teams and 1 all-slugger team.

Of course, I'm not the one who has to mess with Matlab...

by Apricot on Jul 1, 2005 6:47 PM PDT reply actions   0 recs

Good idea
And I would like to do something in line with that.  See my reply to andeux.
Fearing Mecir since 2000.

by salb918 on Jul 1, 2005 8:41 PM PDT up reply actions   0 recs

interesting
Interesting. I like the but here are a few other things you might want to consider.
  • There's no good mathematical reason to demand that the two variables you use in the regression be minimally correlated. For example, when you run your regression using AVG and ISO, you could get an identical answer using AVE and SLG instead. If it's not a lot of work, I'd like to see what you get using OBP and SLG on the same data. I suspect it would be as good or slightly better a fit as the one using OBP and ISO.(Geek talk: the reason to avoid variables that are too highly correlated with each other is for reasons of numerical stability. If two variables are close to being linearly dependent, you end up trying to invert a matrix that is close to being singluar, which can end up giving the wrong answer. But I don't think that will happen here, and trying to avoid "double counting" is really an aesthetic issue, not a mathematical one.)
  • If you're trying to judge a players true value, it's probably better to see how a lineup of that player + 8 average players (or that player + a typical lineup of 8 other players) would perform. That's not really an issue for this study, though, and probably only makes a big difference in extreme cases. For example, as you note, a lineup full of players who got on base with a walk or a single every time would score an infinite number of runs, but if you add such a player to a more typical lineup, his actual value would depend mostly on the ability of others on the team to drive him in, though it would still be very high.
  • Again, it probably doesn't make much difference in this case, but as we discussed before, if you're looking for smaller effects (on the order of a few runs over the course of a season) 1500 games still turns out to be way too small a sample size. matlab=slow.
  • Miggy had 150 RBI! You need to add a clutchness variable. ;-)
Wasted? What about our staring contests? And the way we always knew what football coaches should have done? - Homer

by andeux on Jul 1, 2005 7:13 PM PDT reply actions   0 recs

Good thoughts
  1. You're right.  I did do the regression with OBP and SLG and you get a pretty good r^2 value, around what you get for OBP and ISO.  I need to look at the results more closely, but I still think that avoiding the double counting is a good reason not to use SLG.
  2. You're right again.  I have tried this and it is difficult to implement.  The result, a sort of MLVr, is hard to interpret because, honestly, the values for many players are bunched together and differ only at the third decimal place.  I would like to refine this and get and MLVr out of it.
  3. You're right again, again.  MATLAB is slow, and is a big reason I chose only 1500 games.  ANer Genaro tried converting my program to a faster language.  We are still working on this front.
  4. Don't even get me started.
Thanks for your comments.  If I could quit my day job (school) I would work full time on stuff like this.
Fearing Mecir since 2000.

by salb918 on Jul 1, 2005 8:40 PM PDT up reply actions   0 recs

keep it up
and you might get drafted right off this website.  Just be sure to keep a few ideas in your back pocket for the non-disclosure agreement.

by anahola fan on Jul 1, 2005 10:46 PM PDT up reply actions   0 recs

::Rummaging through back pocket::
Hmmm...here's an idea on bunting philosophy...plans for cold fusion reactor...wallet...and my ass.

That's all I got.

Fearing Mecir since 2000.

by salb918 on Jul 1, 2005 11:03 PM PDT up reply actions   0 recs

too bad...
hope you didn't have to resort to using both hands to find it.

by anahola fan on Jul 2, 2005 12:48 AM PDT up reply actions   0 recs

Stats
I've always had problems with OPS as a statistic because the AVG is counted twice.  Therefore, hitters with lower batting averages get punished by the statistic more that higer batting avegage hitters.  I like the idea of modified OPS,  Though I think it's unecessary to go scrounging around for ISO (isolated power), in order to calculate it (OBP+ISO) when all one really has to do is go to web sites that give OPS (I believe everyone provides it as a main statistic now) and just subtract AVG - it's the same thing, right?
http://www.cafehayek.com ~ a blog for classical liberals

by LowcountryJoe on Jul 2, 2005 7:09 AM PDT reply actions   0 recs

OPS makes no sense
as a stat, in my mind. It's like wanting to know the weather of a place and adding the humidity and temperature. Huh? Yeah, high is probably uncomfortable, but so much is lost.

I like the attempts by people like Sal to come up with a reasonable way to measure hitting both in consistency and power.  Most current attempts, like this one, involve setting up target data like actual runs (or in this case a run simulation) to predict using this magic new statistic.

It seems weird and unlikely to me that this new statistic would be a linear combination of OBP and SLG, but this is a good first shot at it. I'd imagine it's more complicated and is coupled with the performance profile of the rest of the lineup in some nonlinear way.

Hey Sal, have you thought about crazier curve fits, like quadratic regressions, etc etc?

by Apricot on Jul 2, 2005 9:43 AM PDT up reply actions   0 recs

Thought about crazier fits, yes
Desire to do it?  Not really.

It seems to me that while baseball has a lot of sample space, most of the interesting phenomena occur over a small range of that space and a first-order Taylor expansion (ie, linear fits) is sufficient.  The effort expended in getting an incremental increase in information is, well, lets just say there are diminishing returns.  
Using a linear combination is just the easiest way to go.

Like I said in my article, using the run simulator as my target data is wide open to criticism, since I have no real defense of that.  But I do think it is better than say, Runs Created.  RC uses what I feel like is an arbitrary set of weights and factors.  EqA and VORP are kind of inaccessible.  It would be nice if there were a simple stat, based on easily available information, that could be manipulated and understood by the casual fan.  That's what I was after, I don't know if I succeeded, but I think I took baby steps in the right direction.

"I'd imagine it's more complicated and is coupled with the performance profile of the rest of the lineup in some nonlinear way."  --- I would like to look into this farther.  If only I had more time...

Fearing Mecir since 2000.

by salb918 on Jul 2, 2005 11:49 AM PDT up reply actions   0 recs

linear in one variable probably enough
but I'm thinking more about multiplicative effects. Like it's possible that the Holy Stat that Models Runs Created is actually something like OBP + SLG*2 + .5 OBP * SLG

(off the top of my head).

And after all, The James Pythagorean Thm is an approximately quadratic relationship between Runs Scored, Runs Allowed and Winning Pct.

by Apricot on Jul 2, 2005 4:17 PM PDT up reply actions   0 recs

Could be.
If I knew more about regressions, I might try that.  But then again, I'm not after the holy stat, I'm after a good dashboard metric, easily calculable by the public, that gives a good idea as to who the most valuable players are.
Fearing Mecir since 2000.

by salb918 on Jul 2, 2005 4:27 PM PDT up reply actions   0 recs

Ow! My head hurts....
I find these statistical analysis a bit over my head, but I really appreciate your interest in this side of baseball and our team specifically.....

I'm glad to have you on my team.

Now THIS is Billy Ball

by Masaryk on Jul 2, 2005 9:54 AM PDT reply actions   0 recs

OPS
The thing about OPS is that it more closely correlately to runs scored than any other easily available statistic.  The other stats which fit runs scored better are much more difficult to compute/understand.  OPS is a good approimation of how well a player/team is hitting.

by skwid on Jul 2, 2005 10:23 AM PDT reply actions   0 recs

someone who isn't in calculus
isn't supposed to understand this at all, right?

good. [feels less stupid]

let's go oakland [clapclap clapclapclap]

the a's fan lj community.

by Jjjsixsix on Jul 2, 2005 11:26 AM PDT reply actions   0 recs

the basic idea is simple
A statistic is only helpful if it helps you predict something that's going to happen.

So in theory BA is helpful if it helps you decide the probability that a hitter is going to hit safely.

OBP and SLG are helpful in that they help predict how often the batter will be safe, and how many extra base hits the hitter gets.  But they don't quite predict how many runs a batter contributes, which is after all the whole point.

Some people use OPS = OBP + SLG to account for both skills. It doesn't predict that well either, mainly when used in models, it over-counts OBP.

So Sal has decided to see if he can tweak the formula to predict better by multiplying the SLG by some magic number "S" (for Sal)  to help count SLG like it should be.  How to do this? He said, I'm going to use a computer program to run simulated games where 9 clones of a batter hits for a game. Then I'm going to see which number S helps me predict the number of simulated runs best. He came up with a number.

At that point Billy Beane said, Sal why don't you come and program on our Linux cluster which is located right behind Our Black Muslim Bakery in the Coliseum. But Sal refused because he knew he would be distracted with the smell of the fish sandwiches.

The end.

by Apricot on Jul 2, 2005 4:24 PM PDT up reply actions   0 recs

a couple corrections
My last post was written just after waking up...

First "it over-counts OBP" should be "under-counts"...

Second, I think I simplified Sal's work too much. The key idea (I think) is that instead of adding multiples of OBP and SLG, let's find two measures of hitting that are less related, since if you have high SLG, you tend to have high OBP.  So he compared how related a bunch of different measures were. He decided that AVG and ISOlated power (SLG - AVG) and OBP and ISO were least related. Then he fussed to find the right numbers that linearly related them.

by Apricot on Jul 2, 2005 4:44 PM PDT up reply actions   0 recs

Hey!
How did you know I like fish sandwiches?
Fearing Mecir since 2000.

by salb918 on Jul 2, 2005 5:14 PM PDT up reply actions   0 recs

you look like that kind of guy
Also, I happen to always get those fish sandwiches, back when I wasn't a shut-in. My wife would get their tofu sandwiches. We used to get the lemonade down the hall, until we realized we were paying $5 for about one inch of actual lemonade.

by Apricot on Jul 2, 2005 5:16 PM PDT up reply actions   0 recs

more on OPS (and similar stats)
Here are a couple more articles that explain why OPS is a good way of measuring offensive value

http://remarque.org/~grabiner/baseball.html

This one, by David Grabiner gives a good mathematical justification for why taking linear combination of OPS and SLG really is a sensible thing to do.

http://baseballprospectus.com/article.php?articleid=2596

This one, from the BPro Basics series compares the correlation of a number of common stats to team run scoring (similar to what Sal is doing, but using historical data rather than simulations). OPS does much better than AVG, OBP, or SLG by themselves. Stats like EqA and RC/27 are only a little better than OPS, at the expense of much greater complexity. So for most purposes, OPS strikes the right balance between simplicity and accuracy.

Wasted? What about our staring contests? And the way we always knew what football coaches should have done? - Homer

by andeux on Jul 2, 2005 12:16 PM PDT reply actions   0 recs

eh
Yeah, I think OPS has its purposes. However, it feels to me that there is a level of useful precision missing. I agree that someone with an OPS of 1.000 is better than someone at .700.

But if OPS approximates the 'real' statistics off by a factor of .2 * OBP, that means that comparisons of straight OPS can typically be off by .080 for each batter in 'real' weight... it just makes OPS fuzzier to calculate comparisons with, I think.

by Apricot on Jul 2, 2005 4:53 PM PDT up reply actions   0 recs

a couple more articles
Aaron Gleeman on his "Gross Production Average" which is (1.8*OBP + SLG)/4:
http://www.aarongleeman.com/2003_11_23_baseballblog_archive.html#106974007971391611

Tangotiger on optimal weights:
http://www.tangotiger.net/ops.html
http://www.tangotiger.net/ops2.html
His articles on linear weights and "How are runs really created" are also relevant.

Wasted? What about our staring contests? And the way we always knew what football coaches should have done? - Homer

by andeux on Jul 2, 2005 5:12 PM PDT reply actions   0 recs

Thanks for all the articles
I haven't gone through all of them, but they make for useful reading.  I guess the moral is that I'm a couple of years too late to be original?  Oh well, it was a good exercise.

Thanks again.

Fearing Mecir since 2000.

by salb918 on Jul 2, 2005 5:16 PM PDT up reply actions   0 recs

From what I see
You aren't really behind. GPA is one number that tries to equalize it to the same scale as average, and your not including slugging as stated before. I personally think this stat seems like it could be useful. Maybe it needs some tweaking, but statistics don't just come out of nowhere. If you do any additional work on it I'd love a followup at BtB.

by Marc Normandin on Jul 3, 2005 9:16 AM PDT up reply actions   0 recs

Way over my head
But good that some people think about baseball in an intelligent way.... not just "how many beers can I drink before the game is over?"
Not that there's anything wrong with that but I like to watch the game when I pay to watch it :)

BTW, there is an article on Kielty being some kind of math whiz over at the official site. They mentioned him doing calculus etc. but then he said something about not using math much anymore for playing baseball.... this reminded me of that article a little.

by streetfan on Jul 2, 2005 11:26 PM PDT reply actions   0 recs

Bobby Kielty
Having read the article on Kielty at the A's website I can why BB loves him so much. On a small market team you need guys who can crinch numbers by day and play outfield at night.

by Larry E on Jul 3, 2005 9:50 AM PDT reply actions   0 recs

Comments For This Post Are Closed


User Tools

Welcome to the SB Nation blog about Oakland Athletics.

Community Guidelines ANcillary Terms
Start posting about the Athletics »

Join SB Nation and dive into communities focused on all your favorite teams.

FanPosts

Community blog posts and discussion.

Recommended FanPosts

Eck_small
DLD 2/8/10: Statue of Bud Selig to be erected in Milwaukee
Maya_papi_small
Please Welcome Your 2030 Starting Oakland/San Jose/Fremont/Las Vegas Athletics Third Baseman Alexander Bleszinski
Oak_small
Ratto: Bill King *should* be in HoF!
Countdown_small
Tuesday Tidbits with Taj
Baseball_small
Ladies and Gentelmen, Meet Your 2010... Moshi Ants!

Recent FanPosts

3208444410_7f31090a14_small
How Adrian Gonzalez can become an Oakland A.
Small
A's Chances
Small
Billy Beane's Magic: Mark Mulder (Version 2)
39135485-59af19dbb26654095f910f34176af094_4ae8a81e-scaled_small
Predictions Group
Small
Blue Jays acquire Dana Eveland for player to be named.
Funny-pictures12_small
A's Agree To Terms With......
Cat_nmorgan_small
A's Stolen Base Records, odds and ends, and such...
Small
Gammons (Discusses Oakland's Future)
Dscf0038_small
Clearance, Clarence

+ New FanPost All FanPosts >

SPONSORS


Managers

Maya_papi_small Tyler Bleszinski

08-_the_author_small 67MARQUEZ

Baseball_small baseballgirl

Poochini-butt_in_box_2_small Nico

Authors

Jacoav200_small danmerqury

Paradeshot_small jeffro

A_s_pic_5 emperor nobody

Moderators

Countdown_small Taj Adib

Logo_small gigglingone

Img_1185_small notsellingjeans

Small vignette17

Venasfans_small OaklandSi

As_kings_cal_small louismg