WAR Sucks

By Daniel Waldman | July 17, 2020

Baseball is a gorgeously paced game, to an extent that almost defies description. Its drama operates in negative space, the slow-burning tension that matures and warps and snaps into action. The ball moves faster than in any other game this side of jai alai, tearing through still air and distilling the game into a series of rapid, beautifully unfair uncertainties. Wins are the 0-1 binary outcome the best run-creating and run-preventing efforts of 50 players are reduced to, and it is in their service that baseball, at least when the world is spinning, is played.

For those of us thankfully free enough from insufferable debates about baseball statistics to not have an opinion, WAR stands for Wins Above Replacement. The idea is that this number, more so than the humble home run or on-base percentage could ever hope to, tells you how many more wins a player gets you every year than a given 25-year-old playing in Triple-A Wichita, a lovely quantity to be able to know for sure. It’s a seductive concept, this idea of a number that tells you the value of a player in the only units that matter, and something it would undoubtedly be nice to have a great estimate of.

And it turns out our estimates are pretty good! The number of WAR your team has correlates extremely well with the number of games your team wins, better than any other stat we can think of.

This bullet point is usually enough to facilitate intelligent discussion of baseball. And like any baseball nerd, I like WAR for this reason! Once everyone agrees on and understands its usefulness, it is a sound and robust measure of baseball performance that levels discussions on temporally-adjusted terms. Gone are the days where imagining 2018 Lorenzo Cain’s impact on the 1971 Dodgers is a fully abstract exercise.

WAR has a tragic flaw though: it sucks. And not for the reasons that usually make this argument a nauseatingly boring one. It is also killing baseball.

So why the controversy? For the grouchy columnist on your local sports page, and dads everywhere, it’s easy. We watch baseball games all the time, and we have our own ideas of who to credit for wins and losses. Sometimes players have great games and lose. Sometimes one player completely and unilaterally blows it. Sometimes it seems like the starting pitcher deserves credit for the entire win. WAR, offering a neat sum of these contributions, philosophically cuts through these subtleties entirely. And it’s easy to see why this might offend the sensibilities of baseball fans, notoriously rational and unromantic as we are. You thought Ichiro Suzuki has ever had a better season than Marcus Semien’s 2019 ? Guess again, dumbass!

Sure, those of us who watch a lot of games can reason about portioning out this credit with a little more complexity than this, but the cavemen who announce these games certainly can’t, and they love little more than making this known to the viewer. In a sport that has spent decades centering its business model and TV contracts on people watching their local team every day, this same casual fan who is supposed to buy what the MLB is selling for the next 40 years gets to hear 5 times a week about how confusing these new stats are, and how WAR just doesn’t sit right with the voice of their favorite team. It is terrible for baseball for the same reason it took me 7 paragraphs to even get this far. For their part, though, baseball’s more careful thinkers are not doing WAR any service.

Here’s Tom Tango, a baseball thinker I have intense respect for, posting an endlessly complex analogy that will surely clear up the WAR confusion once and for all:

And here’s Bill James, father of thinking seriously about sports, deciding on what I have as Day 63 of quarantine to once again attempt to put this debate to bed:

Just look at how inane the responses are1, especially to Bill’s tweet. This discussion never, ever changes, and instead of arguing about literally anything else, WAR’s great gift to baseball has been an eternal, insufferable argumentative carousel where everyone is always right and nobody is ever making a point. That I can feel myself hopping on now, getting sucked into the same dumb argument about whether this one number is an affront to everything beautiful about life, is part of the reason why we have to get rid of it.

The larger reason is that, for most purposes, it is useless.

I’ll quote the Wikipedia entry for WAR. Here’s the first paragraph:

“The basis for a WAR value is the estimated number of runs contributed by a player through offensive actions such as batting and base running, and runs denied to opposition teams by the player through defensive actions like fielding and pitching. Statistics such as weighted on-base average (wOBA), ultimate zone rating (UZR), ultimate base running (UBR), and defense independent pitching statistics (DIPS) measure the effectiveness of a player at creating and saving runs for their team, on a per-plate appearance or per-inning basis. These statistics can be multiplied by the playing time of a player to give an estimate of the number of offensive and defensive runs contributed to their team.”

Crazy, right? Sounds complex, looks like a lot of effort went into making this a sound estimate! Anyway, here’s the next sentence:

"Additional runs contributed to a team lead to additional wins, with 10 runs estimated to be equal to roughly one win."

I mean, are you kidding me? All that math to tell you how many runs a player gave their team, and then we just divide it by 10 and call it wins? This is the stat people tear their hair out over?

Mike Trout didn’t win you 9.6 games last year; he netted you 96 runs. Framing this debate in terms of what WAR is actually measuring, the run value a player is creating for you, makes the noise associated with these numbers bearable too. It is obvious that this number, Runs Above Replacement, will bounce around season-by-season, and that a 48 RAR season isn’t all that different from a 50 RAR season. Maybe the 76 runs Marcus Semien created for the A’s in 2019 translated to more wins than the 60 runs Ichiro gave the 2001 Mariners. The inferences that go into those numbers, however, are eminently more believable as inputs to the addition and subtraction of runs than they are as requests of fans to buy that Semien won the A’s 7.6 games while Ichiro won 6.0 for the Mariners. Which is insane because they are the same inferences, except for the part where we divided by 10!

Again, this wouldn’t be a problem if everyone who wanted to talk baseball did their homework all the time, but this will never happen. When we push away the people who don’t want to think about the epistemological rigors of the numbers that have taken over the sport they’ve loved since they were 6, we shove them backwards into a comfortable stone age that is harming the product for today’s 6-year-olds. RBI, which depends almost entirely on factors outside of the batter’s control, is to a well-documented extent as stupid and arbitrary as it gets, yet cavemen in broadcast booths still turn to it as their preferred measure of offensive output, because those of us who can produce real statistics have dramatically failed at providing competition. When we spit out decent guesses of idealized quantities, we obscure the accessible intermediate conclusions that our hard work is generating and throwing away, all for a marginal abstract benefit for us eggheads who think we know better.

With one quick change, so stupid it’s almost cosmetic, we can get rid of the actually dumb stats like RBI that lead to actually dumb conclusions. If you want to convert RAR to WAR you can just imagine the decimal point there.

Runs are concrete and small. Everyone knows the pieces that go into making and preventing runs, and that’s what everyone on a baseball diamond is always trying to do. Everyone understands that they don’t all come at the same time, or when you necessarily want them, that a handful of players are responsible to some degree for each of them, and that having more of them is always nice. This is the kind of thing we should be measuring, and it is a little absurd that we take a rigorous yet digestible stat like RAR and decide to make it a headache by dividing it by 10.

Over very long periods of time, this denominator does actually vary a fair amount, as the long-term trend is a real increase in runs scored, but the necessary contextual adjustment RAR would need to compare Tris Speaker and Andruw Jones is one we make in our heads all the time for home runs, triples, strikeouts, and any other counting stat in baseball. The overwhelming bulk of baseball discussion does not need this adjustment, because normal people having a conversation in between the third and fourth innings of a ballgame on a sunny day don’t generally try to compare a statline from 1916 to one from 2005 - we compare one statline from 2019 to another from 2019. There is no need to translate a concept accessible on the level of “runs added” to the highest abstraction of outcome possible, and dooming ostensibly fun baseball discussion to statistical semantic hell is not worth the tidy universality of contextual intertemporal run value adjustment. WAR, which outsources this adjustment to the computer, is great for rarer exercises in baseball discussion like comparing careers, long-separated seasons, and overall team strength. It should stick to those.

As neat as it would be to be able to portion them out for sure, wins are stochastic and ugly. Their unpredictability on an individual level gives baseball its beauty. Great teams lose to bad teams every day, and a game of baseball is never, ever over until it is actually over. As long as the game is still going, the win has not been decided even if several runs have, and if your team plays perfectly going forward you will win the game. To the extent that a win is a product of constraint, forcing the brilliant, complex mess of a ballgame into a 1 and a 0, it only ever occurs because one team played well enough as a unit to win on their own merits. There is no kneeling out the clock, no garbage time, no reason to give up. I might just be a romantic who misses baseball, but there is something real and magical here that we lose when we run in circles arguing about whether this one number just ended our argument.

The great part is that, whether or not anyone actually takes it as a be-all end-all, the toothpaste is never going back in the tube on this one. Your team’s 67-year-old color commentator will never get sick of talking about those people, real or not, or anyone who dares imagine they can think about baseball as intelligently as your team’s first baseman from 1979 to 1985.

Because he’s the one who’s right. Winning is the be-all end-all of sports, and the promoters of WAR frankly should’ve realized their stat would have this effect. It is obviously and uninterestingly impossible to perfectly represent a baseball season or career in one number, and none of us nerds who sit behind computers will ever understand the game like your team’s color commentator had to to make it out of Double-A Huntsville in 1991. We didn’t spend years looking out of bus windows, thinking about laying off 1-2 sliders on hot nights in front of a couple thousand people, and intentionally or not we project a kind of arrogance when we imply that we know what a win is anyway. When every action of every player can be isolated, chopped into its constituent parts and reassembled largely according to the whims of the hardy researcher, we behind the computers have some responsibility to make sure our output is productive beyond idealized abstraction. Yes, winning is everything, but a statistic will never be everything, and a statistic that takes an idealistic stab at everything, with an implied nod to 5 caveats that the fan is responsible for appreciating, is worse than nothing.

How many WAR was that catch worth? What about that slide? That bad call?

This isn’t a problem unique to baseball - any number of spreadsheet dweebs are trying their best to reduce talking about the NBA to a set of interacting VORPs - but a sport that increasingly leverages its business on fans having a relationship with their team, and by extension their local broadcasts, just cannot afford the mutual contempt between the reality of the game and the language it is spoken in. Nobody likes arguing about WAR, and nobody will ever win the argument when both sides are arguing that the argument is over. “We” on the numbers side will never convince the people who actually play baseball that our number knows better than they do, and they will never take us seriously as long as we even hint that this is what our number is trying to do. Our efforts to take baseball as seriously as is computationally possible ultimately undermine a game that sparks joy in its unseriousness.

Baseball is a beautiful thing. Devoting oxygen to picking it apart is a waste of everyone’s time, and nobody likes wasting everyone’s time like the cranky old men that don’t seem to have enough of it. I certainly don’t have the energy to litigate this philosophical battle forever, not least when we don’t have to. Just give us RAR, leave WAR to the historians, and let us have our game back. It’s more fun this way.

Daniel Waldman is President Emeritus of the Sports Analytics Group at Berkeley.

1. If you have the commendable patience to wade into the replies, you’ll see me there too, using all 280 characters, as every bad tweet should.