Athletics Nation: An SB Nation Community

Navigation: Jump to content areas:


Sports blogs for fans, by fans.
Around SBN: UNC 77, Ohio State 73

Making Sense of the Fangraphs Game Graph: A Basic Look at Win Probability


This is a Fanpost aimed at those of us who may not be as stat-literate as others, but want to know what the Fangraphs game graph is all about. I’ll start from the very beginning, hopefully passing along useful knowledge about run expectancy and win probability, among other things. And if you already have a good handle on the principles, hey, it’s good review. Or something.

Star-divide


There’s a fundamental concept in this area of baseball research called base/out states. It’s simple, really. If we’re just talking about one inning, what’s the least amount of information you absolutely have to know to understand the situation? All you really need is the number of men on base and how many outs there are. It turns out that there are eight different ways to have men on base: no one on, a guy on first, a guy on second, a guy on third, guys on first and second, first and third, second and third, and bases loaded. Furthermore, all of these can occur with zero, one, or two outs, which leads to 24 total base/out states.


Most of the data that I'll talk about in this post comes from The Book: Playing the Percentages in Baseball (Tom Tango, Mitchel Lichtman, Andrew Dolphin). In The Book, the authors created a run expectancy table. They threw all of the games played from 1999-2002 into a massive database (over 85,000 innings!) and crunched the numbers. What they primarily looked at was each base/out state, and how many runs the average team scored starting from that state until the end of the inning. For example, they were able to show that, on average, a double that starts the inning makes a team score 0.634 more runs than they would have scored before the hit. And a strikeout with the bases loaded and no outs hurts the team to the tune of -0.767 runs. Over a full season, we can add up a player’s contributions (positive and negative), and come out with a number which expresses the total amount of offense that our example player produced. Fangraphs tracks this stat under the name RE24.


To take run expectancy a step further, Tom Tango decided to expand the context. Sure run expectancy is fine for an inning, but what if we wanted to look at a full ballgame? By crunching the numbers again, Tango created a set of tables that express the win probability for both teams at any point in the game. In other words, if team A was up by four runs in the bottom of the sixth, and team B had a man on first with two outs, there’s a section of the table that gives the probability of either team pulling out a win. As you can imagine, due to including the 24 base/out states in addition to the inning number and score difference between the two teams, it’s a really large set of tables.


It’s this win probability that appears on the game graphs generated by Fangraphs. Here’s an example. Click the graph to see the full page.

20090720_twins_athletics_0

via www.fangraphs.com


Yeah, I chose a good one. Ignore the bar graph at the bottom, I’ll get to that in a bit. For now, focus on the crazy line graph. Remember, in win probability, the numbers assume perfectly average teams. So at the beginning of the game, both teams have exactly a 50% chance of winning. In this game, by the top of the second, after Justin Morneau’s grand slam, the Twins had an 87.6% chance of winning the game. The score was 7-2. And by the middle of the third inning, the Twins had a 98.9% chance of winning, assuming perfectly average teams. Of course, we know what happened. Most game graphs aren’t nearly this dramatic, but that’s the beauty of the model. When the graph takes crazy dives and turns, it’s visually obvious that it was a great game to watch.


Also, much like RE24, we can tally up a player’s contributions to win probability. This stat is known as Win Probability Added (WPA). In WPA, 0.500 corresponds to one full win, as the probability of winning changed by 0.500 (50% to 100%). In the aforementioned game, Gio Gonzalez was tagged with -0.587 WPA, more than enough to singlehandedly lose the game. Thankfully, Holliday and Cust combined for a positive 0.638, enough to overcome Gio’s poor pitching. The unique thing about WPA is that, unlike other stats, it’s not context-neutral. Almost every other statistic out there (batting average, OPS, wOBA, whatever) deals with players in a theoretical vacuum. But WPA deals with actual game occurrences. Give up a solo home run when your team is up by ten? Not a big deal, and WPA shows that. Give up a bases-loaded walk in a tie game in the bottom of the ninth? That’s far worse. WPA reflects that.


As an extension to this, Tom Tango created what he called a Leverage Index (LI). Leverage Index basically explains how critical a situation is. To put it in more technical terms, Leverage Index gives a numerical representation of the biggest possible swing in win probability, where 1.0 is the average. To use my previous examples, the LI in a game situation where you’re up by ten is very low. If you give up a home run, the win probability isn’t going to change much. But the LI in the bottom of the ninth of a tie game with the bases loaded is off the charts. The high LI shows that any little mistake could be devastating. Leverage Index is always denoted in a Fangraphs game graph by the bar graph underneath the regular win probability graph. In that crazy Twins game I mentioned earlier, the LI sinks to near-zero levels when the Twins are up 12-2. And yet, when the A’s came roaring back, the LI started to rise again. The last play of the game had a huge LI, as Michael Wuertz had runners on first and second with two outs in the bottom of the ninth of a one-run game.


I admit I’ve gotten into the habit of keeping the game graph open in another tab alongside MLB Gameday (and AN, of course) during a game. It’s pretty addicting to watch the graph unfold in real-time, and wonderfully cathartic to see a crazy game in graphical form.

4 recs  |  Comment 13 comments  |  Add comment

Story-email Email Printer Print

Comments

Display:

rec'd

Nice explanation, I’m going to start tabbing the game graphs now. Hopefully more of these graphs will end up with 100% Oakland in 2010, with a couple like the one above :)

ESPN also had some percentage thing going on, wonder if it’s the same.

100% Athletics, 100% Baseball. 2009 Athletics, 40% Baseball.

by fruitattack on Oct 28, 2009 9:56 AM PDT reply actions   0 recs

one warning

I’ve found that certain people (ahem, my Red Sox fan girlfriend) don’t appreciate it when you constantly update them on run expectancy values while they are watching the game… especially when it’s the postseason and the game in question is this one. The part where Boston’s win expectancy dropped from 0.759 to 0.193 with one swing of Vlad’s bat? She wasn’t so happy to hear about the incredible 9.05 Leverage Index at that point.

by colin on Oct 28, 2009 12:08 PM PDT up reply actions   0 recs

Nice explanation

Fangraphs rocks. Here is a game which I took my family to see, which was very much the opposite of that A’s/Twins thriller. The LI chart is very illustrative. Note that although Reds SS Paul Janish gave up 6 runs in the 8th inning as a pitcher, his WPA score was 0.000, since it was already 16-1 when he was thrown to the wolves put in to mop up.

Hey Al, just go away, baby.

by doctorK on Oct 28, 2009 11:06 AM PDT reply actions   0 recs

Wow.

The Phillies had a 100% WE in the 7th inning. Or, I guess, if not 100%, it’s at least 99.95%, rounded.

And your dream, absolve.
And your path, dissolve.

by danmerqury on Oct 28, 2009 1:37 PM PDT up reply actions   0 recs

Also

There’s an app for that.

"There's never enough time to do all the nothing you want" -Bill Watterson

by nevermoor on Oct 28, 2009 11:31 AM PDT reply actions   0 recs

I can't wait until they put out one for Android.

They call their best player "Kung Fu Panda" and they complain that people aren’t taking them or the game seriously enough? -Nick

by mikev on Oct 28, 2009 1:54 PM PDT up reply actions   0 recs

did you get your htc hero?

do you like it? I heard that there were issues with lag.

"Since other people actually read these threads, though, probably best that your particular brand of wrongness not go completely unchallenged." - PT

There are differing opinions on me. According to Iglew "DFA is PT with a sense of humor. PT is DFA with introspective self-doubt. I like them both" but according to sirbed Im "The Stats Killer"

by designatedforassignment on Oct 28, 2009 4:12 PM PDT up reply actions   0 recs

Yes. Yes. No lag.

It’s fucking awesome.

They call their best player "Kung Fu Panda" and they complain that people aren’t taking them or the game seriously enough? -Nick

by mikev on Oct 29, 2009 7:57 AM PDT up reply actions   0 recs

interesting

I might have to investigate further.

"Since other people actually read these threads, though, probably best that your particular brand of wrongness not go completely unchallenged." - PT

There are differing opinions on me. According to Iglew "DFA is PT with a sense of humor. PT is DFA with introspective self-doubt. I like them both" but according to sirbed Im "The Stats Killer"

by designatedforassignment on Oct 29, 2009 10:16 PM PDT up reply actions   0 recs

I get the notion of Win Probability

But I do have a question:

They threw all of the games played from 1999-2002 into a massive database (over 85,000 innings!) and crunched the numbers.

Looking at league-wide stats such as runs per game, home runs per game and OPS, it seems like 1999 and 2000 (what we might call the “McGwire/Sosa/Palmeiro seasons”) were extreme offensive years, producing numbers generally not seen since the 1930s. 2001 and 2002 do seem relatively similar to the last couple years.

I wonder if the charts are skewed a bit by using 1999-2002 as the base data for WPA. It seems like a base on balls drawn by a batter is a bit less valuable if the chance that it will be followed by a home run later in the inning is actually 7% or 8% lower than what the 1999-2002 data tells us. Do the authors of The Book have any plans to make updated calculations, due to the possibility that the data set they used was an outlier?

by Soaker on Oct 28, 2009 4:22 PM PDT reply actions   0 recs

I suppose I should have mentioned this, but I took it out for simplicity's sake.

The tables that Fangraphs uses in their run expectancy/win probability calculations aren’t the original 1999-2002 tables, but their modern counterparts. Tom Tango used the original RE/WE tables and adjusted them for different run environments. Fangraphs looks at the league-average runs/game, then uses that set of tables. They have tables for 3.5 runs/game, all the way to 6.0 runs/game. So, long story short, if offense is down this year compared to 1999-2002, Fangraphs uses a modified version of the base data to represent the cooler run-scoring environment.

If you really want to get into the super-technical aspect, they used Markov chains in the run environment conversions.

And your dream, absolve.
And your path, dissolve.

by danmerqury on Oct 28, 2009 4:43 PM PDT up reply actions   0 recs

Thanks, I'll accept that explanation

Econometrics (Econ 141 at Cal, at least back in the day) was the toughest course I ever took and was really a bit beyond the limits of my mental capabilities. It might have been a bit easier if the personal computer had been invented at that time; we had to enter all of our data onto those damned punch cards and then run the stack of cards through the scanner. Anyway, I realized at that point that I was pretty much done with college mathematics and statistics and that helps to explain why I’ve never done more than scratch the surface as far as learning about these advanced baseball metrics.

by Soaker on Oct 28, 2009 4:57 PM PDT up reply actions   0 recs


User Tools

Welcome to the SB Nation blog about Oakland Athletics.

Community Guidelines ANcillary Terms
Start posting about the Athletics »

Join SB Nation and dive into communities focused on all your favorite teams.

FanPosts

Community blog posts and discussion.

Recommended FanPosts

Depaulbluedemons_small
Community Prospect List #17
Imgp0089_editedagasin_small
DLD 11/17/09 - Nintendo 64 and a Nerd's top 10 Epic Movie Fights
Me_at_att_park_small
Greener Grass, Episode 6: It's All About Culture
Cimg0007_small
Bailey wins ROY!!!!
Depaulbluedemons_small
Community Prospect List #16

Recent FanPosts

Countdown_small
Some things I am "coming around" on...
Bill_king_small
On Trading Catchers....
Small
A's ink 10 year deal with KTRB 860 am
Bill_king_small
Huston Street and the Blown Save
413niegoftl__sl500_aa280__small
UPDATE: Denorfia Outrighted; Becomes 6-Year Minor League Free Agent
Small
Free agents and ballpark

+ New FanPost All FanPosts >

SPONSORS


Managers

Tyler_at_maya_school_small Tyler Bleszinski

08-_the_author_small 67MARQUEZ

Baseball_small baseballgirl

Poochini-butt_in_box_2_small Nico

As_kings_cal_small louismg

Editors

Countdown_small Taj Adib

Ziegler160px_small Flashfire

527918550406_0_bg_small notsellingjeans