Need help! A's-related regression model
So here's the deal...the main project in my Decision Sciences (just a fancy stats class basically) is to create a regression model. The assignment sheet is in this link. I want to do an A's-related regression model so it's more interesting than some boring crap about sales or population figures. Any ideas for my regression model? Some ideas I came up with myself:
Pitchers' K/9 ratio as they age
The "clutch" stat...analyzing hitting with RISP and RISP/2 out over time
Consistency of closers
Consistency of quality starts or game scores for a select group of pitchers
Etc.
Help! Let me know what you think would make a good project. Thanks guys!
25 comments
|
Do you like this story?
Comments
If you want to help out the site, I'd suggest trying to figure out MLB ERA from minor league stats
3 variables… you could go with K rate, BB rate and GB rate, or you could throw in something that we would expect to have minimal explanatory power, like BABIP, as a change of pace.
Actually, I’d be interested to know if minor league BABIP actually DOES mean anything.
Linda's in the cold ground, won't see her anymore
Somewhere out on the highway tonight, the drunken engines roar
It's just one of those things, one of those things
-- Al Stewart, "Accident on 3rd St."
In memory of Nick Adenhart and all victims of drunk driving
That's an outstanding idea
The only issue I can see running into is getting 50 observations per variable from minor league stats. For 50 observations, I’d have to be able to get at LEAST monthly stats, and it’s hard enough to find reliable minor league YEARLY stats. Honestly based on the amount of time most players spend in the minors I don’t know if I’d have enough observations per variable.
Unless I’m completely misunderstanding the directions and it’s supposed to be at least 50 total between all the players I choose…that’d be a LOT easier.
WordUpThome: "TRENIDAD HUBBARD WENT TO HIS CUPBOARD TO FEED HIS POOR DOGS AND PETS...WHEN HE GOT THERE, THE CUPBOARD WAS BARE, AND THEY TRADED HIS DOG TO THE METS"
by CaliforniaJag on Oct 27, 2009 7:03 PM PDT up reply actions
I was operating under the impression that using 50 different minor leaguers would be 50 observations of the variable
I could be wrong. I’ve never taken a true stats course and don’t know any of the terminology— basically everything I know about regression analysis is stuff I’ve absorbed by osmosis through baseball discussions.
I was figuring what you could do is take, say, Sickels’ top 50 pitchers from 2003 and 2004, get rid of the guys who never made it to the majors (which, along with the overlap of guys who are on both lists, will probably put you at about 50 guys) and then see how much of their career ERAs can be explained by those variables.
Linda's in the cold ground, won't see her anymore
Somewhere out on the highway tonight, the drunken engines roar
It's just one of those things, one of those things
-- Al Stewart, "Accident on 3rd St."
In memory of Nick Adenhart and all victims of drunk driving
Guess that makes sense
So, for example…
Joe Pitcher: MLB ERA 3.45, MiLB ERA 2.89, MiLB K/9 8.7, MiLB BB/9 3.1, MiLB BABIP .275
John Pitcher: MLB ERA 4.50, MiLB ERA 3.89, MiLB K/9 6.8, MiLB BB/9 4.2, MiLB BABIP .305
etc.?
WordUpThome: "TRENIDAD HUBBARD WENT TO HIS CUPBOARD TO FEED HIS POOR DOGS AND PETS...WHEN HE GOT THERE, THE CUPBOARD WAS BARE, AND THEY TRADED HIS DOG TO THE METS"
by CaliforniaJag on Oct 27, 2009 7:31 PM PDT up reply actions
Though I was going to go further back and get some pitchers who have had significantly long careers, like Tim Hudson, C. C. Sabathia, etc.
WordUpThome: "TRENIDAD HUBBARD WENT TO HIS CUPBOARD TO FEED HIS POOR DOGS AND PETS...WHEN HE GOT THERE, THE CUPBOARD WAS BARE, AND THEY TRADED HIS DOG TO THE METS"
by CaliforniaJag on Oct 27, 2009 7:32 PM PDT up reply actions
I am not sure this would help
but I found this website in my journeys and it has minor league stats and some college stats too
"The trouble with baseball is that it is not played the year round." Gaylord Perry
by BERRYJO on Oct 27, 2009 8:06 PM PDT up reply actions
Well, the issue isn’t the lack of minor league stats OVERALL; it’s the lack of MONTHLY minor league stats. I’ll have to ask my teacher. If each player I choose counts as an observation, I’ll just choose enough pitchers to make there be over 50 observations. If I have to have 50 observations per player, I’m SOL.
WordUpThome: "TRENIDAD HUBBARD WENT TO HIS CUPBOARD TO FEED HIS POOR DOGS AND PETS...WHEN HE GOT THERE, THE CUPBOARD WAS BARE, AND THEY TRADED HIS DOG TO THE METS"
by CaliforniaJag on Oct 27, 2009 8:22 PM PDT up reply actions
Monthly splits are too small to be useful anyway
and especially so in the case of starting pitchers, who pitch 5 or 6 times a month at most. One unusually bad or good game— indeed, one unusually lucky or unlucky game— will throw a pitcher’s monthly ERA into a tizzy.
Linda's in the cold ground, won't see her anymore
Somewhere out on the highway tonight, the drunken engines roar
It's just one of those things, one of those things
-- Al Stewart, "Accident on 3rd St."
In memory of Nick Adenhart and all victims of drunk driving
Not sure monthly stats are the way to go with pitchers, either
but if you decide to do so, minorleaguesplits.com is where you can find them. Here is an example for Gio Gonzalez in 2008
To be hit by Moriyama's fastball is an honor exceeded only by being crushed under the wheels of the imperial carriage
Just be careful to avoid selection bias when you're picking the sample
It might be even better to take an entire year of minor league play— let’s say 2003 at the AA level— and then pick 50 random pitchers from among the subset of those players who’ve seen significant time in the majors.
Alternatively, you could junk ERA as the dependent variable and replace it with IP, which would allow you to use a truly random sample of pitchers. In other words, you’d just be asking “given this group of pitchers, which ones are actually going to see big-league time?” That’s not quite as useful for us— because IP is a fairly poor proxy for pitching skill— but it might make a cleaner experiment for your class.
Linda's in the cold ground, won't see her anymore
Somewhere out on the highway tonight, the drunken engines roar
It's just one of those things, one of those things
-- Al Stewart, "Accident on 3rd St."
In memory of Nick Adenhart and all victims of drunk driving
I was thinking of picking a year of big-league play and taking the top 50 in terms of ERA, or just RA, or ERC, or something useful, and going back and finding their minor-league stats.
WordUpThome: "TRENIDAD HUBBARD WENT TO HIS CUPBOARD TO FEED HIS POOR DOGS AND PETS...WHEN HE GOT THERE, THE CUPBOARD WAS BARE, AND THEY TRADED HIS DOG TO THE METS"
by CaliforniaJag on Oct 28, 2009 1:11 AM PDT up reply actions
So basically...
…here are some moderately successful major league pitchers; did they have anything in common in their minor league stats that were significant in making them successful?
WordUpThome: "TRENIDAD HUBBARD WENT TO HIS CUPBOARD TO FEED HIS POOR DOGS AND PETS...WHEN HE GOT THERE, THE CUPBOARD WAS BARE, AND THEY TRADED HIS DOG TO THE METS"
by CaliforniaJag on Oct 28, 2009 1:13 AM PDT up reply actions
My "advice."
My guess going into this project would be that it will be difficult to choose your independent variables such that do not covary with each other and thus subsequently covary with your residuals. If this is the case, then this covariance could be used to improve your prediction and should thus be eliminated. If it is within the scope of the course, I’d suggest using an instrumental variables or two-stage least squares approach which should be readily available in any program such as SAS, STATA, Excel, etc.
by Pucking Insane on Oct 27, 2009 10:58 PM PDT reply actions
You're over my head
It’s a beginner’s DS class, which is just a step up from Statistics 1. We use “Statgraphics” to do all the analyses. I just wanted to do something baseball-related so it wouldn’t be super boring.
WordUpThome: "TRENIDAD HUBBARD WENT TO HIS CUPBOARD TO FEED HIS POOR DOGS AND PETS...WHEN HE GOT THERE, THE CUPBOARD WAS BARE, AND THEY TRADED HIS DOG TO THE METS"
by CaliforniaJag on Oct 28, 2009 1:08 AM PDT up reply actions
Here's an idea
When I had enough time I was planning to do a FP on this. I’ve done a fair amount of work on it already, but I might have bit off more than I can chew while simultaneously doing a FP series (that I’m a week behind on; thank you midterms).
Take your favorite projection system. Say CHONE. Compare a couple of predicted stats to the actual stats players put up. The obvious/simplest would be BA/OBP/SLG/OPS.
"Loyal? I'm the most loyal player money can buy." - Don Sutton
by vignette17 on Oct 28, 2009 3:26 AM PDT reply actions 1 recs
I like this one...
The foundational Western philosophical quote; "I think, therefore I am..." applies to everyone except Booby "the joke" Crozby
rec'd!!!
Good request for ANers…
The foundational Western philosophical quote; "I think, therefore I am..." applies to everyone except Booby "the joke" Crozby
Moneyball follow-up
In the book Moneyball, there’s a discussion about which factors help an offense score the most runs. Many people talk about OPS, eg On Base Pct + Slugging Pct. But Paul DePo indicates that the A’s used a slightly different formula, one that weighted OBP more heavily than just 50:50. So, if you gather the data for each Major League team over a period of several years, you should be able to determine which factors correlate with the most runs scored. What’s the proper weighting of OBP vs Slugging Pct? Has it changed since Moneyball came out in 2002?
Regression isn't the right way to work out offensive run values
and they’ve been done to death anyway using historical analyses of what happens after various offensive events occur. (I.e. take all the innings where you have a certain base/out situation and see how many runs scored in those innings. Then take all the innings where that state was followed up by, say, a walk, and see how many runs scored in THOSE innings. The difference is the run value of the event.)
It’s just a better way of doing things.
Linda's in the cold ground, won't see her anymore
Somewhere out on the highway tonight, the drunken engines roar
It's just one of those things, one of those things
-- Al Stewart, "Accident on 3rd St."
In memory of Nick Adenhart and all victims of drunk driving
Amen
Phil Birnbaum offered up a very nice analysis on this very subject.
by CapgrasDelusion on Oct 28, 2009 12:49 PM PDT up reply actions
Daily Home attendance as the Y
possible Xs:
- Winning percentage of vistiting opponent (at time of visit) – does this help the draw?
- The ‘delta’ or change in winning percentage in last x number of games – does a uptick/downtick in play (winning) over a certain number of games create an uptick/downtick in attendance.
- Promotional day at park (or weekend game or both) – satisfies a dummy variable requirement; if it’s really required.
- Player payroll of opponent – do the freely spending teams draw more because of name recognition (in spite of team record)?
- Record, ERA, K per 9 of opposing starter; same for Oakland starter (or starters from both teams)
Just some thoughts. Take and run or ignore at your discretion.
Is there a place to easily find the attendance of a game for every game without having to click through 162 different box scores? It’s a good idea, but I don’t want to spend a week doing it.
WordUpThome: "TRENIDAD HUBBARD WENT TO HIS CUPBOARD TO FEED HIS POOR DOGS AND PETS...WHEN HE GOT THERE, THE CUPBOARD WAS BARE, AND THEY TRADED HIS DOG TO THE METS"
by CaliforniaJag on Oct 28, 2009 10:23 PM PDT up reply actions
I actually really like this idea, and it’d be easy to run with, so if you have a site that has this kind of info grouped together please send me a link. Thanks man!
WordUpThome: "TRENIDAD HUBBARD WENT TO HIS CUPBOARD TO FEED HIS POOR DOGS AND PETS...WHEN HE GOT THERE, THE CUPBOARD WAS BARE, AND THEY TRADED HIS DOG TO THE METS"
by CaliforniaJag on Oct 28, 2009 10:26 PM PDT up reply actions
DURR
Found it. Here. Thanks.
WordUpThome: "TRENIDAD HUBBARD WENT TO HIS CUPBOARD TO FEED HIS POOR DOGS AND PETS...WHEN HE GOT THERE, THE CUPBOARD WAS BARE, AND THEY TRADED HIS DOG TO THE METS"
by CaliforniaJag on Oct 28, 2009 10:27 PM PDT up reply actions
Thanks again Joe
I am going to use this. It’s simple and easy to find. I found all the data I needed on ESPN.com. I’m going to use attendance as my dependent variable, and month, day of the week, and opponent as explanatory variables. Thanks for your help everyone!
Honestly as a whole this is an interesting concept for the entire board; make some sort of statistical analysis for the rest of the board. I know I for one enjoy reading the analyses posted by some of you.
WordUpThome: "TRENIDAD HUBBARD WENT TO HIS CUPBOARD TO FEED HIS POOR DOGS AND PETS...WHEN HE GOT THERE, THE CUPBOARD WAS BARE, AND THEY TRADED HIS DOG TO THE METS"
by CaliforniaJag on Oct 28, 2009 10:49 PM PDT up reply actions

by 























