As someone who lives for baseball analysis, the first few weeks of the baseball season are both euphoric and incredibly frustrating. On one hand, after 6 long months of waiting, real live the-games-actually-count baseball is back, and it's heavenly! On the other hand, for the first few weeks of the season, the statistics have such a small sample as to be nearly devoid of any meaning whatsoever. You have things like Alberto Callaspo hitting .538/.600/.615 (seriously... what the hell?!) while All-Star Jonathan Lucroy hits .050/.136/.150.
That leaves me with two options: either to try not to do any analysis for another month or so, or to try to make do with the puny bit of numbers we've got. Since you're currently reading an article I wrote, I'm guessing you can figure out which one I chose.
The A's, due to some very frustrating losses, currently sit with a record of 4-4. That's not terrible, but not great. However, through the first seven games they've outscored their opponents by 24 runs. Should we be frustrated by the A's 4-4 record? Maybe. But on the flip side, is a run differential of +24 through 8 games something that should make us optimistic, or should we just write it off as small-sample madness, like we do everything else in the first week?
Over the course of a season, it's pretty well established that a team's run differential is a better predictor of a team's future record than even its current record. But how many games do we have to play before we start to trust the run differential as something meaningful? Obviously, one or two games isn't enough. But is 10 games good enough to tell us something meaningful? 20? 40?
Interestingly enough, Russell Carleton of Baseball Prospectus actually tried to answer this very question last year- with regards to our very own Oakland A's (for those of you who don't want to trust run differential even after 162 games, you need only look to those hard-luck 2014 A's who had baseball's best run-differential but only won 88 games).
A quick explanation of Carleton's study. He took every team for every season from 1962 to the present, and calculated each team's run differential after Games 1 through 162. For each game, he looked to see just to see how much the team's run differential correlated with the team's record at the end of the season. The results were unsurprising in that run differential after 1 game tells us very little and that run differential after 162 games tells us a lot. But just how quickly the numbers become (somewhat) meaningful might surprise you.
Here is the graph from Carleton's study, and where the A's stand right now (don't worry, you non-mathematically-inclined folk! I'll explain it):
The x-axis of the graph is the number of games the team has played, and the y-axis is a value called the r2. An r2 basically tells us how closely two things are correlated (in this case, run-differential and eventual team record). An r2 falls between 0 and 1, and the higher the number, the better the correlation. So an r2 of 0.99 means that the two things are almost perfectly correlated (like, say, someone's batting average versus their number of hits), and an r2 of 0.1 means that they basically have nothing to do with one another (like the number of home runs versus the amount of bacon they ate for breakfast).
As you can see from the chart above, after only 8 games, a team's run differential is correlated with their eventual record with an r2 of 0.42. Now, to be fair, an r2 of 0.42 isn't great. But it doesn't tell us absolutely nothing, either.
Here's an example to put this in perspective: let's say you wanted to look at the correlation between batting average and on-base percentage. OBP is made up of hits, but also of walks and, at a much smaller scale, HBP. Therefore, you'd expect OBP to be correlated to batting average, but only somewhat, since it's one of a few contributing factors. Sure enough, the r2 if you regress batting average against OBP is about 0.5. This means that batting average is a significant contributing factor to OBP, but only a piece of the puzzle.
The same is true of the A's 8-game run-differential. While only 5% of the season has been completed, an r2 of 0.42 tells us that the run differential at this point can actually be a decent clue in determining the team's eventual record. It tells us that with an 8-game run differential of +24, the A's have a slight trend towards a stronger record over the course of the season. At this early juncture, it tells us nothing conclusive, but it's just that: a piece of the puzzle, but a bigger one than you might think. And you can count on this: it tells us a whole lot more than 4-4 does.
In other words, the A's run differential after 8 games shouldn't mean that we expect them to win 110 games or so. But it does mean that we can be somewhat optimistic that they just might be a bit better than their record indicates. Maybe, just maybe, there's a good team here.