Athletics Nation: An SB Nation Community

Navigation: Jump to content areas:


Sports blogs for fans, by fans.
New Blog: RSL Soapbox for Real Salt Lake Fans!

Need help! A's-related regression model


So here's the deal...the main project in my Decision Sciences (just a fancy stats class basically) is to create a regression model.  The assignment sheet is in this link.  I want to do an A's-related regression model so it's more interesting than some boring crap about sales or population figures.  Any ideas for my regression model?  Some ideas I came up with myself:

Pitchers' K/9 ratio as they age

The "clutch" stat...analyzing hitting with RISP and RISP/2 out over time

Consistency of closers

Consistency of quality starts or game scores for a select group of pitchers

Etc.

 

Help!  Let me know what you think would make a good project.  Thanks guys!

2 recs  |  Comment 25 comments  |  Add comment

Story-email Email Printer Print

Comments

Display:

If you want to help out the site, I'd suggest trying to figure out MLB ERA from minor league stats

3 variables… you could go with K rate, BB rate and GB rate, or you could throw in something that we would expect to have minimal explanatory power, like BABIP, as a change of pace.

Actually, I’d be interested to know if minor league BABIP actually DOES mean anything.

Linda's in the cold ground, won't see her anymore
Somewhere out on the highway tonight, the drunken engines roar
It's just one of those things, one of those things
-- Al Stewart, "Accident on 3rd St."
In memory of Nick Adenhart and all victims of drunk driving

by PaulThomas on Oct 27, 2009 6:53 PM PDT reply actions   0 recs

That's an outstanding idea

The only issue I can see running into is getting 50 observations per variable from minor league stats. For 50 observations, I’d have to be able to get at LEAST monthly stats, and it’s hard enough to find reliable minor league YEARLY stats. Honestly based on the amount of time most players spend in the minors I don’t know if I’d have enough observations per variable.

Unless I’m completely misunderstanding the directions and it’s supposed to be at least 50 total between all the players I choose…that’d be a LOT easier.

WordUpThome: "TRENIDAD HUBBARD WENT TO HIS CUPBOARD TO FEED HIS POOR DOGS AND PETS...WHEN HE GOT THERE, THE CUPBOARD WAS BARE, AND THEY TRADED HIS DOG TO THE METS"

by Player To Be Named Later on Oct 27, 2009 7:03 PM PDT up reply actions   0 recs

I was operating under the impression that using 50 different minor leaguers would be 50 observations of the variable

I could be wrong. I’ve never taken a true stats course and don’t know any of the terminology— basically everything I know about regression analysis is stuff I’ve absorbed by osmosis through baseball discussions.

I was figuring what you could do is take, say, Sickels’ top 50 pitchers from 2003 and 2004, get rid of the guys who never made it to the majors (which, along with the overlap of guys who are on both lists, will probably put you at about 50 guys) and then see how much of their career ERAs can be explained by those variables.

Linda's in the cold ground, won't see her anymore
Somewhere out on the highway tonight, the drunken engines roar
It's just one of those things, one of those things
-- Al Stewart, "Accident on 3rd St."
In memory of Nick Adenhart and all victims of drunk driving

by PaulThomas on Oct 27, 2009 7:27 PM PDT up reply actions   0 recs

Guess that makes sense

So, for example…

Joe Pitcher: MLB ERA 3.45, MiLB ERA 2.89, MiLB K/9 8.7, MiLB BB/9 3.1, MiLB BABIP .275
John Pitcher: MLB ERA 4.50, MiLB ERA 3.89, MiLB K/9 6.8, MiLB BB/9 4.2, MiLB BABIP .305

etc.?

WordUpThome: "TRENIDAD HUBBARD WENT TO HIS CUPBOARD TO FEED HIS POOR DOGS AND PETS...WHEN HE GOT THERE, THE CUPBOARD WAS BARE, AND THEY TRADED HIS DOG TO THE METS"

by Player To Be Named Later on Oct 27, 2009 7:31 PM PDT up reply actions   0 recs

Though I was going to go further back and get some pitchers who have had significantly long careers, like Tim Hudson, C. C. Sabathia, etc.

WordUpThome: "TRENIDAD HUBBARD WENT TO HIS CUPBOARD TO FEED HIS POOR DOGS AND PETS...WHEN HE GOT THERE, THE CUPBOARD WAS BARE, AND THEY TRADED HIS DOG TO THE METS"

by Player To Be Named Later on Oct 27, 2009 7:32 PM PDT up reply actions   0 recs

I am not sure this would help

but I found this website in my journeys and it has minor league stats and some college stats too

The Baseball Cube

"The trouble with baseball is that it is not played the year round." Gaylord Perry

by BERRYJO on Oct 27, 2009 8:06 PM PDT up reply actions   0 recs

Well, the issue isn’t the lack of minor league stats OVERALL; it’s the lack of MONTHLY minor league stats. I’ll have to ask my teacher. If each player I choose counts as an observation, I’ll just choose enough pitchers to make there be over 50 observations. If I have to have 50 observations per player, I’m SOL.

WordUpThome: "TRENIDAD HUBBARD WENT TO HIS CUPBOARD TO FEED HIS POOR DOGS AND PETS...WHEN HE GOT THERE, THE CUPBOARD WAS BARE, AND THEY TRADED HIS DOG TO THE METS"

by Player To Be Named Later on Oct 27, 2009 8:22 PM PDT up reply actions   0 recs

Monthly splits are too small to be useful anyway

and especially so in the case of starting pitchers, who pitch 5 or 6 times a month at most. One unusually bad or good game— indeed, one unusually lucky or unlucky game— will throw a pitcher’s monthly ERA into a tizzy.

Linda's in the cold ground, won't see her anymore
Somewhere out on the highway tonight, the drunken engines roar
It's just one of those things, one of those things
-- Al Stewart, "Accident on 3rd St."
In memory of Nick Adenhart and all victims of drunk driving

by PaulThomas on Oct 27, 2009 9:02 PM PDT up reply actions   0 recs

Not sure monthly stats are the way to go with pitchers, either

but if you decide to do so, minorleaguesplits.com is where you can find them. Here is an example for Gio Gonzalez in 2008

To be hit by Moriyama's fastball is an honor exceeded only by being crushed under the wheels of the imperial carriage

by elcroata on Oct 27, 2009 11:21 PM PDT up reply actions   0 recs

Just be careful to avoid selection bias when you're picking the sample

It might be even better to take an entire year of minor league play— let’s say 2003 at the AA level— and then pick 50 random pitchers from among the subset of those players who’ve seen significant time in the majors.

Alternatively, you could junk ERA as the dependent variable and replace it with IP, which would allow you to use a truly random sample of pitchers. In other words, you’d just be asking “given this group of pitchers, which ones are actually going to see big-league time?” That’s not quite as useful for us— because IP is a fairly poor proxy for pitching skill— but it might make a cleaner experiment for your class.

Linda's in the cold ground, won't see her anymore
Somewhere out on the highway tonight, the drunken engines roar
It's just one of those things, one of those things
-- Al Stewart, "Accident on 3rd St."
In memory of Nick Adenhart and all victims of drunk driving

by PaulThomas on Oct 27, 2009 9:17 PM PDT up reply actions   0 recs

I was thinking of picking a year of big-league play and taking the top 50 in terms of ERA, or just RA, or ERC, or something useful, and going back and finding their minor-league stats.

WordUpThome: "TRENIDAD HUBBARD WENT TO HIS CUPBOARD TO FEED HIS POOR DOGS AND PETS...WHEN HE GOT THERE, THE CUPBOARD WAS BARE, AND THEY TRADED HIS DOG TO THE METS"

by Player To Be Named Later on Oct 28, 2009 1:11 AM PDT up reply actions   0 recs

So basically...

…here are some moderately successful major league pitchers; did they have anything in common in their minor league stats that were significant in making them successful?

WordUpThome: "TRENIDAD HUBBARD WENT TO HIS CUPBOARD TO FEED HIS POOR DOGS AND PETS...WHEN HE GOT THERE, THE CUPBOARD WAS BARE, AND THEY TRADED HIS DOG TO THE METS"

by Player To Be Named Later on Oct 28, 2009 1:13 AM PDT up reply actions   0 recs

My "advice."

My guess going into this project would be that it will be difficult to choose your independent variables such that do not covary with each other and thus subsequently covary with your residuals. If this is the case, then this covariance could be used to improve your prediction and should thus be eliminated. If it is within the scope of the course, I’d suggest using an instrumental variables or two-stage least squares approach which should be readily available in any program such as SAS, STATA, Excel, etc.

by Pucking Insane on Oct 27, 2009 10:58 PM PDT reply actions   0 recs

You're over my head

It’s a beginner’s DS class, which is just a step up from Statistics 1. We use “Statgraphics” to do all the analyses. I just wanted to do something baseball-related so it wouldn’t be super boring.

WordUpThome: "TRENIDAD HUBBARD WENT TO HIS CUPBOARD TO FEED HIS POOR DOGS AND PETS...WHEN HE GOT THERE, THE CUPBOARD WAS BARE, AND THEY TRADED HIS DOG TO THE METS"

by Player To Be Named Later on Oct 28, 2009 1:08 AM PDT up reply actions   0 recs

Here's an idea

When I had enough time I was planning to do a FP on this. I’ve done a fair amount of work on it already, but I might have bit off more than I can chew while simultaneously doing a FP series (that I’m a week behind on; thank you midterms).

Take your favorite projection system. Say CHONE. Compare a couple of predicted stats to the actual stats players put up. The obvious/simplest would be BA/OBP/SLG/OPS.

"Loyal? I'm the most loyal player money can buy." - Don Sutton

by vignette17 on Oct 28, 2009 3:26 AM PDT reply actions   1 recs

I like this one...

The foundational Western philosophical quote; "I think, therefore I am..." applies to everyone except Booby "the joke" Crozby

by MMunoz33 on Oct 28, 2009 6:26 AM PDT up reply actions   0 recs

rec'd!!!

Good request for ANers…

The foundational Western philosophical quote; "I think, therefore I am..." applies to everyone except Booby "the joke" Crozby

by MMunoz33 on Oct 28, 2009 6:26 AM PDT reply actions   0 recs

Moneyball follow-up

In the book Moneyball, there’s a discussion about which factors help an offense score the most runs. Many people talk about OPS, eg On Base Pct + Slugging Pct. But Paul DePo indicates that the A’s used a slightly different formula, one that weighted OBP more heavily than just 50:50. So, if you gather the data for each Major League team over a period of several years, you should be able to determine which factors correlate with the most runs scored. What’s the proper weighting of OBP vs Slugging Pct? Has it changed since Moneyball came out in 2002?

by A'sFan36Yrs on Oct 28, 2009 10:05 AM PDT reply actions   0 recs

Regression isn't the right way to work out offensive run values

and they’ve been done to death anyway using historical analyses of what happens after various offensive events occur. (I.e. take all the innings where you have a certain base/out situation and see how many runs scored in those innings. Then take all the innings where that state was followed up by, say, a walk, and see how many runs scored in THOSE innings. The difference is the run value of the event.)

It’s just a better way of doing things.

Linda's in the cold ground, won't see her anymore
Somewhere out on the highway tonight, the drunken engines roar
It's just one of those things, one of those things
-- Al Stewart, "Accident on 3rd St."
In memory of Nick Adenhart and all victims of drunk driving

by PaulThomas on Oct 28, 2009 12:39 PM PDT up reply actions   0 recs

Amen

Phil Birnbaum offered up a very nice analysis on this very subject.

by CapgrasDelusion on Oct 28, 2009 12:49 PM PDT up reply actions   0 recs

Daily Home attendance as the Y

possible Xs:
- Winning percentage of vistiting opponent (at time of visit) – does this help the draw?
- The ‘delta’ or change in winning percentage in last x number of games – does a uptick/downtick in play (winning) over a certain number of games create an uptick/downtick in attendance.
- Promotional day at park (or weekend game or both) – satisfies a dummy variable requirement; if it’s really required.
- Player payroll of opponent – do the freely spending teams draw more because of name recognition (in spite of team record)?
- Record, ERA, K per 9 of opposing starter; same for Oakland starter (or starters from both teams)

Just some thoughts. Take and run or ignore at your discretion.

by LowcountryJoe on Oct 28, 2009 4:57 PM PDT reply actions   0 recs

Is there a place to easily find the attendance of a game for every game without having to click through 162 different box scores? It’s a good idea, but I don’t want to spend a week doing it.

WordUpThome: "TRENIDAD HUBBARD WENT TO HIS CUPBOARD TO FEED HIS POOR DOGS AND PETS...WHEN HE GOT THERE, THE CUPBOARD WAS BARE, AND THEY TRADED HIS DOG TO THE METS"

by Player To Be Named Later on Oct 28, 2009 10:23 PM PDT up reply actions   0 recs

I actually really like this idea, and it’d be easy to run with, so if you have a site that has this kind of info grouped together please send me a link. Thanks man!

WordUpThome: "TRENIDAD HUBBARD WENT TO HIS CUPBOARD TO FEED HIS POOR DOGS AND PETS...WHEN HE GOT THERE, THE CUPBOARD WAS BARE, AND THEY TRADED HIS DOG TO THE METS"

by Player To Be Named Later on Oct 28, 2009 10:26 PM PDT up reply actions   0 recs

DURR

Found it. Here. Thanks.

WordUpThome: "TRENIDAD HUBBARD WENT TO HIS CUPBOARD TO FEED HIS POOR DOGS AND PETS...WHEN HE GOT THERE, THE CUPBOARD WAS BARE, AND THEY TRADED HIS DOG TO THE METS"

by Player To Be Named Later on Oct 28, 2009 10:27 PM PDT up reply actions   0 recs

Thanks again Joe

I am going to use this. It’s simple and easy to find. I found all the data I needed on ESPN.com. I’m going to use attendance as my dependent variable, and month, day of the week, and opponent as explanatory variables. Thanks for your help everyone!

Honestly as a whole this is an interesting concept for the entire board; make some sort of statistical analysis for the rest of the board. I know I for one enjoy reading the analyses posted by some of you.

WordUpThome: "TRENIDAD HUBBARD WENT TO HIS CUPBOARD TO FEED HIS POOR DOGS AND PETS...WHEN HE GOT THERE, THE CUPBOARD WAS BARE, AND THEY TRADED HIS DOG TO THE METS"

by Player To Be Named Later on Oct 28, 2009 10:49 PM PDT up reply actions   0 recs


User Tools

Welcome to the SB Nation blog about Oakland Athletics.

Community Guidelines ANcillary Terms
Start posting about the Athletics »

Join SB Nation and dive into communities focused on all your favorite teams.

FanPosts

Community blog posts and discussion.

Recommended FanPosts

Depaulbluedemons_small
Community Prospect List #17
Imgp0089_editedagasin_small
DLD 11/17/09 - Nintendo 64 and a Nerd's top 10 Epic Movie Fights
Me_at_att_park_small
Greener Grass, Episode 6: It's All About Culture
Cimg0007_small
Bailey wins ROY!!!!
Depaulbluedemons_small
Community Prospect List #16

Recent FanPosts

Countdown_small
Some things I am "coming around" on...
Bill_king_small
On Trading Catchers....
Small
A's ink 10 year deal with KTRB 860 am
Bill_king_small
Huston Street and the Blown Save
413niegoftl__sl500_aa280__small
UPDATE: Denorfia Outrighted; Becomes 6-Year Minor League Free Agent
Small
Free agents and ballpark

+ New FanPost All FanPosts >

SPONSORS


Managers

Tyler_at_maya_school_small Tyler Bleszinski

08-_the_author_small 67MARQUEZ

Baseball_small baseballgirl

Poochini-butt_in_box_2_small Nico

As_kings_cal_small louismg

Editors

Countdown_small Taj Adib

Ziegler160px_small Flashfire

527918550406_0_bg_small notsellingjeans