Winexp 6: Pitching 2005 & Comparing Win Shares

As part of this on-going series on win expectancy (search for AN diaries on winexp), I compiled in an automated way the win expectancy from play-by-play data for all of 2005. I think (as do many other folks, notably the folks at Hardball Times) win expectancy is fascinating and thought provoking, but there is a question of what it really indicates. This is too big a topic to tackle at once, so I'm going to just address a little bit here on pitchers.

In the automated way I computed win expectancy, pitchers got credit for every play that happened when they were pitching, except for one involving errors. Those got credited to the fielders and tracked separately. It's a weakness of the methodology that positive fielding plays (e.g. incredible dives) are not credited to fielders. Win Shares estimates that fielding is about 20% of the battle, so this is not a trivial amount to overlook. But keeping that in mind, one can still get interesting insights into pitching and hitting contributions. This is a limitation of the automation, not the win expectancy framework. Places like Lookout Landing have painstakingly assigned fielding credit to great plays by hand... there is no automated way to do this. The calculations were done using software dubbed Baby Winexp, the 'baby' in recognition that there are probably relevant bugs to work out but that she does pretty cool and interesting things already.

In any case, here are the win expectancy contributed (WXC) for the A's pitchers this year. It corresponds to the number of games over .500 that a player contributes. The team's WXC sums to within .5 of the number of games above .500. The number is probably only meaningful rounded to the nearest .5 (i.e. we probably shouldn't make too much of differences of less than .5), but for convenience, I'm reporting them to 3 decimal places.

Win Expectancy Contributed

Huston  Street       3.656
Rich    Harden       2.829
J       Duchscherer  1.59
Joe     Blanton      1.448
Kiko    Calero       0.514
Barry   Zito         0.397
Ron     Flores       0.064
Britt   Reames       0
Jai P.  Garcia      -0.003
Keiich  Yabu        -0.072
Tim     Harikkala   -0.203
Ricardo Rincon      -0.205
Jay     Witasick    -0.217
Seth    Etherton    -0.222
Dan     Haren       -0.246
Kirk    Saarloos    -0.397
Octavio Dotel       -0.657
Joe     Kennedy     -0.822
Juan    Cruz        -1.068
Ryan    Glynn       -1.307

Now for comparison's sake, let's look at Pitching Win Shares for A's pitchers. In principle, Win Shares tries to assign credit for actual team wins, so in principle if Winexp is meaningful, then there should be some relationship, right?

Pitching Win Shares

B   Zito         14.5
J   Blanton      14.4
D   Haren        14.3
H   Street       13.3
R   Harden       12.9
J   Duchschere   10.6
K   Saarloos     9.5
K   Calero       5.5
J   Kennedy      2.9
K   Yabu         2.8
R   Rincon       2
O   Dotel        1.9
J   Witasick     1.8
R   Flores       1.1
J   Garcia       0.2
T   Harikkala   -0.1
B   Reames      -0.5
R   Glynn       -0.9
J   Cruz        -1.9

I found this initially surprising. I don't know about you, but I see no relationship between the rankings of Winexp and Win Shares. In particular, Winexp declares Huston and Harden to be clear #1 and #2 followed by Duke and Blanton, followed at a distance by Zito and Calero and a bit further back Haren and Saarloos. This is NOT AT ALL what Win Shares says, which says Zito, Blanton, and Haren are basically equivalent and Street and Harden are close behind.

My surprise went away when I recalled that Win Shares is biased towards players who play more. In principle, because Zito pitched a ton more than Harden, he could catch up in Win Shares even if the rate of his performance was poor.

Hardball Times also provides an adjustment called Expected Win Shares which is the Win Shares an average player would get in the same playing time and one can compute "Win Shares Above Average" by subtracting WS - EWS. Here's the list. This is more comparable to Winexp, which is a relative measure to an "average" player in the sense that a Team that scores 0 WXC total would have a .500 record.  Let's see what we get...

Win Shares Above Average        Win Expectancy Contributed

H   Street       6.3            Huston  Street       3.656
R   Harden       5.9            Rich    Harden       2.829
J   Duchschere   4.6            J       Duchscherer  1.59
J   Blanton      3.4            Joe     Blanton      1.448
B   Zito         2.5            Kiko    Calero       0.514
D   Haren        2.3            Barry   Zito         0.397
K   Saarloos     1.5            Ron     Flores       0.064
K   Calero       1.5            Britt   Reames       0
J   Garcia       0.2            Jai P.  Garcia      -0.003
R   Flores       0.1            Keiich  Yabu        -0.072
B   Reames       0              Tim     Harikkala   -0.203
J   Kennedy     -0.1            Ricardo Rincon      -0.205
O   Dotel       -0.1            Jay     Witasick    -0.217
J   Witasick    -0.2            Seth    Etherton    -0.222
R   Rincon      -1              Dan     Haren       -0.246
T   Harikkala   -1              Kirk    Saarloos    -0.397
R   Glynn       -1              Octavio Dotel       -0.657
K   Yabu        -1.2            Joe     Kennedy     -0.822
J   Cruz        -2              Juan    Cruz        -1.068
                                Ryan    Glynn       -1.307

It's so beautiful when you get similar results from two completely different computations. Winexp and Win Shares use none of the same numbers to compute a pitcher's contribution, and yet they give nearly the same pitcher rankings. (Keep in mind that differences of less than one Win Share and .5 WXC are probably not meaningful.) At this point, I am convinced Win Expectancy Added measures something interesting.  

Now the next step would be to investigate the differences between Winexp and Win Shares Above Average rankings. Recall, we investigated two wacky results from winexp before: (1) Baby Winexp thought Crosby was a huge albatross on the team, despite him being our lucky charm; a close look at the statistics convinced me that Winexp was absolutely correct; (2) Baby Winexp spat on Miggy. Again, a close look at the stats in meaningful situations (runners on, game within three runs) convinced me she was right again.

The case study that might be worth doing would be on Haren (who impressed my eyeballs... it has annoyed me that Baby W thinks he's noticeably worse than the top three starters) and Calero. Baby W thinks Calero is worth a swing of one and half games more than Haren and WSAA tells us Haren was about a third of a win better.  WSAA and Winexp are so indifferent about the other players that there aren't any huge discrepancies as in the Crosby numbers. The only other notable exception is Dotel, and I think we remember why Baby W think he was a huge loser for us. I vote with Baby W on Dotel.

