The story has essentially attained the level of holy writ, at least to those committed to data and evidence, such that it now seems almost too good to be true. The quick-and-dirty version of the tale is that stats geeks with computers, like those former player and broadcaster Tim McCarver called “drooling eggheads,” outsmarted and outmaneuvered the stupid yet arrogant “traditional baseball men” who ran our most traditional sport at the professional level and who thought they knew all there was to know about the game. Thus, it is said, everything the old-time baseball men thought they knew about evaluating players and teams has been found wanting, not that those whose day has passed, committed to wizardry and witchcraft as they are, have recognized it.
This revolution – as shocking as it has been comprehensive – is said to have brought about the ultimate revenge of the nerds. The geeks now run the baseball show, having moved the level of analytical precision involved in running teams and evaluating players from zero-to-sixty in a flash. The new breed of “baseball men” aren’t grizzled scouts looking for “five tool guys” but, rather, Ivy League educated experts in computer modelling and statistical analysis who use those skills to determine who to scout, who to sign, who to play and how to play. The prevailing narrative describes this new contingent as dominating professional baseball at every level, down to the depths of the independent minor leagues.
Is the analytics overhaul of baseball proper as complete and comprehensive as the telling claims? No. The real story is much more interesting and enlightening than that.
Baseball is particularly amenable to the use of statistical analysis because it offers large sample sizes, discrete individual-performance measures (such as plate appearances, pitches, and the like), and ease of identifying positive results (such as winning, home runs, and the like). However, when humans are involved – and baseball is as human as can be – interpretation of the underlying data is highly complicated.
Great interpretation of difficult data sets, especially those involving human behavior, involves more sculpting than tracing. It requires great skill, imagination and even a bit of whimsy as well as collaboration as to whether the various interpretive choices are acceptable (not to say the right) ones. That’s why we understand reality better with respect to the natural sciences than in the social sciences. As ever, information is cheap but meaning is expensive.
To lead off, let’s recall that if it seems too good to be true, it usually is. To see what I mean by that in the context of our story will require some in-depth analysis of its own, starting with more than a bit of background information and history.
This past Sunday was the last day of the Major League Baseball season. The post-season has begun and the World Series awaits, but for fans of teams like my San Diego Padres, who haven’t been a serious contender all season or in years, who finished last yet again and who have never won a World Series, next season is already here. I was at the home finale last week (we lost, of course) at Petco Park, the “undisputed” best ballpark anywhere, to give a happy send-off into retirement to our Hall of Fame announcer, Dick Enberg, as well as to see some of our bright prospects, and the “next year” talk was in full swing even then.
Of course, anytime is a good time to talk baseball. If you’re watching a game, its pace is perfectly conducive to discussing (arguing about) players, managers, strategy, tactics, the standings, the pennant races, the quality of ballpark peanuts, and pretty much anything else. In the off-season, the “hot stove league” allows for myriad possible conversations (arguments) about how to make one’s favorite team better. Every year, around February 1, when “pitchers and catchers report” to Spring Training, baseball talk about the upcoming Major League season and its prospects officially begins again in earnest, at least for those who took a break after the season. But I don’t. In fact, the “winter meetings” (when player transactions move center stage) are a crucial part of the annual baseball year. The coming of spring means the return of hope — maybe this will finally be the year — which of course means talking (arguing) about it some more.
Our neighborhood quarrels about the National Pastime when I was a kid were incessant and invigorating, and didn’t have to include the vagaries of team revenues and revenue-sharing, player contracts, free agency and the luxury tax, as they do now. We could focus on more important stuff. Who should be the new catcher? Who should we trade for? Do we have any hot phenoms? Who’s the best player? The best pitcher? The best hitter? The best third baseman? Who belongs in the Hall of Fame? Which team will win it all next year? How do the new baseball cards look? Is the new Strat-O-Matic edition out yet?
Early on, my arguments were rudimentary and, truth be told, plenty stupid. They were ideological (the players on my team were always better than those on your team, no matter what the standings might have said), authority-laden (“The guy in the paper says…”), narrative-driven (“Remember that time…”), overly influenced by the recent (“Did you see what Jim Northrup did last night?”) and loaded with confirmation bias.
Quickly I came to realize that it’s really hard to change an entrenched opinion, and not just because I was arguing with dopes. Slowly it became clear that if I wanted to have at least a chance of winning my arguments, I needed to argue for a position that was reality-based. I needed to bring facts, data and just-plain solid evidence to the table if I wanted to make a reasonable claim to being right, much less of convincing anyone (the current political campaign notwithstanding). Arguments and beliefs that are not reality-based are bound to fail, and to fail sooner rather than later.
Fortunately, I had a few weapons for this fight, even as a kid. I pulled out my baseball cards and poured over the data from previous seasons on the back. I subscribed to the then unquestioned “Bible of baseball,” The Sporting News – my local paper didn’t even carry box scores, much less comprehensive data — to get current statistics and detailed information. I kept checking out The Baseball Encyclopedia and other books from my local library and studied them intently. And I paid careful attention to what real experts said.
The good news was that there was knowledge to be had. Baseball has more statistics than any other sport, after all. And those statistics are relatively easy to understand, at least at an elementary level. Clearly, a .320 hitter is better than a .220 hitter, a 20-win Cy Young Award winner is better than a journeyman who goes 2-7, and 40 home runs are better than 10.
But the bad news was the knowledge base’s remarkable limitations. It didn’t take any great insight even for a kid to figure out that a pitcher on a good team ought to have better stats than one on a poor team, or that right-handed power hitters for the Red Sox had a significant advantage over their counterparts on the Yankees on account of their respective stadium configurations, or that fielding percentage alone didn’t tell much about a player’s defensive value.
RBI opportunities would obviously impact RBI totals. Players at the top of the order would score more runs. Pitchers in big ballparks would benefit therefrom. My Dad always insisted that “a walk’s as good as a hit.” But was he right? Examining issues and concepts like these and what they actually meant with respect to players, wins and losses was not possible with the information that was then available to me. As the great scientist Lord Kelvin put the problem, “When you cannot measure it and express it in numbers, your knowledge is of a very meagre and unsatisfactory kind.” We simply didn’t have enough good numbers.
In those days, I was always disappointed at how difficult it was to get down to the “real nitty-gritty” of player analysis. It was easy to show that Johnny Bench was better than Joe Azcue. But objectively differentiating between great performers like Tony Perez and Brooks Robinson, for example, or any set of roughly equivalent players – always bound to be exceedingly difficult — remained essentially impossible (but Brooks was just a bit better, if you want to know). The tools simply weren’t available to make the fine distinctions that are required. There was more than enough basis to argue (there always is), but the body of available evidence and analysis remained tiny and limited. Good, solid, objective conclusions were few and hard to come by.
A Really Smart Security Guard
In a fascinating bit of historical serendipity, these problems — that had seemed insoluble to me as a kid – began to be met head-on and become widely disseminated on account of the work of a really smart security guard in Kansas. In 1977, Stokely-Van Camp (the pork & beans people) night watchman Bill James created a 68-page “book” of mimeographed sheets stapled together that he called a Baseball Abstract (featuring “18 Statistical Categories That You Can’t Find Anywhere Else”). Only 75 people responded to that first ad, although Norman Mailer was one of them. It was hardly an auspicious beginning. But I was an early adopter. So was famous commodities trader John Henry, who would later go on to buy the Boston Red Sox.
James marketed his Abstract via small classified ads in the back pages of The Sporting News. This seemingly minor event was actually the dawning of a new era in baseball research. James’ approach, which he termed sabermetrics in reference to the Society for American Baseball Research (SABR), set out to analyze baseball scientifically through the use of objective evidence in an attempt to determine why teams win or lose. Along the way, he got to debunk lots of traditional baseball dogma, what James called “baseball’s Kilimanjaro of repeated legend and legerdemain.”
James not so modestly wrote that he wanted to approach baseball “with the same kind of intellectual rigor and discipline that is routinely applied, by scientists great and poor, to trying to unravel the mysteries of the universe, of society, of the human mind, or of the price of burlap in Des Moines.” That approach was in stark contrast to that of the traditionalists, “an assortment of half-wits, nincompoops, and Neanderthals…[who] are not only allowed to pontificate on whatever strikes them, but are actually solicited and employed to do this.” Best of all, James made careful statistical analysis accessible and interesting while also seeking to be practical in a way fans could appreciate.
James’ pioneering approach began inauspiciously, but slowly built a following. He clearly demonstrated that many of the supposed truisms about baseball weren’t actually true. He showed, for example, that starting pitchers have no effect on attendance, that catchers have a great effect on base stealing, that sacrifice bunts are usually counterproductive and that ballplayers peak in their late 20s (rather than the early 30s as had been supposed). Sabermetrics allowed the young professional adult me to build better arguments and beliefs about baseball by marshaling far more pertinent facts and data and creating from them better informed arguments and beliefs, even if and as the “traditional baseball men” ignored it.
In fact, many such “insiders” and their media enablers often ridiculed those with analytical acumen who put their skills to work on the national pastime. For example, consider this gem from journalist Mike Judge: “Tell a sabermetrics guy that Getz is a better all-round ballplayer than Billy Butler and he’ll have an asthma attack and ask his mom to bring him a fresh box of Pop Tarts.” Or take Royals manager Ned Yost, who acknowledges that he doesn’t even understand how pitchers are evaluated by baseball front offices. Imagine calling a Major League general manager with brains and analytical skills “Google Boy,” a “computer GM,” or “geek.”
Some of these old-time insiders still ridicule statistical reality. Perhaps Hall-of-Famer Goose Gossage said it best.
“I’ll tell you what has happened, these guys played rotisserie baseball at Harvard or wherever the [bleep] they went and they thought they figured the [bleeping] game out. They don’t know [bleep].
“A bunch of [bleeping] nerds running the game.”
That’ll show ’em, Goose.
The Baseball Abstract books by James — which I read as carefully as I read the backs of baseball cards as a kid — were the modern predecessors to the sports analytics movement, which has since spread to all major sports. More importantly, this sort of analysis eventually went mainstream and came to be used by the teams themselves, allowing all of us who engaged in such arguments to begin “checking our work” by seeing if and how what would come to be known as “moneyball” played out in the real world. James’ innovations — such as runs created, range factor, and defensive efficiency rating — were significant. More importantly, his approach has led to many other noteworthy sabermetric developments, some proprietary to individual teams, who now often have entire departments dedicated to data analysis.
On account of the success of Moneyball (both the book and the movie), the Bill James approach was put on display by Michael Lewis for everyone to see. The success of Moneyball (and the Oakland A’s) meant that the “traditional baseball men” could no longer readily ignore it, especially because Wall Street people (like John Henry) dedicated to using analytics and data-driven processes started buying baseball franchises.
Moneyball focused on the 2002 season of the Oakland Athletics, a team with one of the smallest budgets in baseball. Sabermetrics was then pretty much the sole province of stats geeks. At the time, the A’s had lost three of their star players to free agency because they could not afford to keep them. A’s General Manager Billy Beane was in a very tough spot, which both allowed and spurred him to go “all in” on the James approach. Beane armed himself with reams of performance and other statistical data, his interpretation of which, heavily influenced by James (although taught to him by A’s consultant EricWalker1), was routinely rejected by “traditional baseball men,” including many in his own organization. Not coincidentally, Beane was also armed with three terrific — according to both traditional and newfangled measures — young starting pitchers, not to mention AL MVP Miguel Tejada, Eric Chavez and Jermaine Dye. Beane used that data to complete his roster with a number of otherwise undesirables on the cheap to create a team that proceeded to win 103 games and the division title largely because the newly acquired players’ true value was actually much higher than traditional measures recognized.
The crucial insight of Moneyball, which came from Walker, was a “Mungeresque” inversion. In baseball, a team wins by scoring more runs than its opponent. Walker’s epiphany was to invert the idea that runs and wins were achieved by hits to the radical notion that the key to winning is avoiding outs (bonus points to you if you’re now thinking about the Charley Ellis classic book on investing, Winning the Loser’s Game). That led Beane to “buy” on-base percentage cheaply because the “traditional baseball men” overvalued hits but undervalued OBP even though it doesn’t matter how a batter avoids making an out and reaches base. Of course, as with investing, on account of the success of the A’s, it didn’t take long for other GMs to drive up the price of OBP.
Therefore, the crucial lesson of Moneyball was that Beane was able to find value via underappreciated player assets (some assets are cheap for good reason) by way of an objective, disciplined, data-driven (Jamesian) process. In other words, as Lewis explained, “it is about using statistical analysis to shift the odds [of winning] a bit in one’s favor” via market inefficiencies. As Assistant GM Paul DePodesta said, “You have to understand that for someone to become an Oakland A, he has to have something wrong with him. Because if he doesn’t have something wrong with him, he gets valued properly by the marketplace, and we can’t afford him anymore.” Accordingly, Beane sought out players that he could obtain cheaply because their actual (statistically verifiable) value was greater than their generally perceived value. Despite the now widespread use of James’ approach, broadly construed, smart baseball people are still finding underappreciated value (more here) and lunkheads are still making big mistakes.
After the 2002 season, sabermetrics went mainstream in a hurry, despite some very prominent detractors among traditional baseball men and the media. Winning does that. In fact, the 86-year old “Curse of the Bambino” was lifted (Henry had hired Theo Epstein, today’s leading practitioner of the James approach2 in late 2002 when Beane turned the job down; the Red Sox went on to win the 2004 World Series), largely on account of sabermetrics. Nearly every Major League team emphasizes the use of advanced statistical measures now. And today, MLB general managers and executives don’t look like the baseball men of yesteryear. They look like investment bankers in chinos. Some even used to be investment bankers.
By just 2006, Time magazine named James as one of the 100 most influential people in the world. Epstein described James’ impact on the game to Time. “The thing that stands out for me is Bill’s humility. He was an outsider, self-publishing invisible truths about baseball while the Establishment ignored him. Now 25 years later, his ideas have become part of the foundation of baseball strategy.” In fact, Epstein hired James to a consulting position he still holds, as a kind of chief ideas and culture officer for the Red Sox, to keep the team pointed in the right direction. It shouldn’t be a surprise, then, that even though Epstein moved to the Cubs after the 2011 season, the Red Sox won the American League East again in 2013 and yet again this season.
The bottom line is that baseball analysis has become far more objective, relevant and useful largely because of Bill James. Most importantly, the James approach works (if not as well as some might hope — there are other factors involved, as James concedes). Evidence-based baseball has won the war (and championships). It has even made its way into baseball broadcasting (finally). Despite decades of ostracism and ridicule, the reality revolution was accomplished with shocking speed once it started and today seems complete.3The Power of Science
James contrasted the right approach (his) with the wrong approach (not his) in his 1985 Abstract within the context of a discussion about then-Red Sox outfielder Jim Rice. “Virtually all sportswriters, I suppose, believe that Jim Rice is an outstanding player. If you ask them how they know this, they’ll tell you that they just know; I’ve seen him play. That’s the difference in a nutshell between knowledge and bull****; knowledge is something that can be objectively demonstrated to be true, and bull**** is something that you just ‘know.’ If someone can actually demonstrate that Jim Rice is a great ballplayer, I’d be most interested to see the evidence.”
The Oxford English Dictionary defines the scientific method as “a method or procedure that has characterized natural science since the 17th century, consisting in systematic observation, measurement and experiment, and the formulation, testing, and modification of hypotheses.” Science is about making observations and then asking pertinent questions about those observations. What it means is that we observe and investigate the world and build our knowledge base on account of what we learn and discover, but we check our work at every point and keep checking our work. It is inherently experimental. In order to be scientific, then, our inquiries and conclusions need to be based upon empirical, measurable evidence. We won’t just “know.”
The scientific method, broadly construed, can and should be applied not only to traditional science, but also, to the fullest extent possible, to any sort of inquiry into or study about the nature of reality, like baseball and investing. The great physicist and Nobel laureate Richard Feynman even applied such experimentation to hitting on women. To his surprise, he learned that he (at least) was more successful by being aloof than by being polite or by buying a woman he found attractive a drink.
Science progresses not via verification (which can only be inferred) but by falsification (which, if established and itself verified, provides relative certainty only as to what is not true). That makes it unwieldy. Thank you, Karl Popper. In investing and in baseball, as in science generally, we need to build our processes from the ground up, with hypotheses offered only after a careful analysis of all relevant facts and tentatively held only to the extent the facts and data allow. Yet the markets demand action. Running a baseball team requires action. There is nothing tentative about them. That’s part of what makes running a baseball team and good investing so difficult.
Ultimately, then, the Jamesian approach is a scientific one. It seeks to be objective and fact-based. Such data-driven (reality-based) analysis is crucial, most fundamentally, because it works. Indeed, when done well it is even self-correcting. Science works far better than any alternative approach, which is why similar data-driven approaches are increasingly gaining influence in other disciplines, including investing.
The financial services industry (at least at the retail level4) is behind baseball in terms of analytical engagement and adoption, but so-called evidence-based investing (“EBI,” which is analytically driven and has much in common with evidence-based medicine as well as sabermetrics) is becoming an important and vital movement. As Robin Powell describes the problem, “[a]ll too often we base our investment decisions on industry marketing and advertising or on what we read and hear in the media.” The investment world is run by too many “traditional baseball men.”
By contrast, EBI is the idea that no investment advice should be given unless and until it is adequately supported by good evidence. Thus evidence-based financial advice involves life-long, self-directed learning and faithfully caring for client needs. It requires good information and solutions that are well supported by good research as well as the demonstrated ability of the proffered solutions actually to work in the real world over the long haul.5In essence then, as he had always intended, what Bill James did was to make baseball more scientific. He did so using the common scientific tools of investigation, reason, observation, induction and testing with an attitude of skepticism. The bottom line is that James tested a variety of traditional baseball dogmas and found them wanting and demonstrably so. The “demonstrably” ultimately comes from the reality that the body of evidence now available to examine how baseball is played is exponentially greater.
The great baseball writer Roger Angell said that traditional baseball people didn’t want anyone to shine any light in the darkness of baseball. But the sports analytics guys did and, today, every MLB team and, indeed, every major professional sports team uses analytics. Many have substantial analytics departments. Moreover, MLB now uses Statcast and its Pitchf/x system in every ballpark. These interactive tools track the baseball and the players on the field in minute detail. They measure factors like the trajectory of the ball as well as the speed and angle at which the player runs after it. “With radar guns we’ve focused on velocity because we had a tool to measure it,” said Sandy Alderson, general manager of the New York Mets (and the guy who made Billy Beane the GM of the A’s). “Now we can measure spin rates and suddenly we have a handle on something other than velocity.” Indeed, they have a handle on a lot of such somethings.
The force of the evidence with respect to the importance of the analytics revolution is inferentially stronger on account of the successes of highly analytical teams such as the Boston Red Sox and New England Patriots, the San Francisco Giants and 49ers, the Dallas Mavericks and the San Antonio Spurs. Perhaps most remarkable is that the Chicago Cubs, the perennial doormat of a team now run by analytics wunderkind Theo Epstein, have the best record in baseball by a mile and seem poised to win the World Series for the first time since 1908.
If the Cubs are able to avoid the upset – far from a sure thing (as the play-offs began, their World Series win likelihood was the best of all postseason teams at 26 percent) due to the power of randomness in a short series – and capture the long-elusive championship their long-suffering fans so desire, it would seem time to declare absolute victory for the analytics movement and evidence-based baseball. Many have already made that declaration. In one sense, that’s surely accurate. Baseball analytics have helped smart users of such tools tremendously. But it is crucial to note that even the best practitioners of baseball analytics are still prone to the all-too-human foibles that plague us all. In related news, note that professional investors are as prone to these foibles as anyone else.
As a matter of fact and belief, a smart baseball team will be committed to evidence-based baseball in the same way our businesses should be committed to evidence-based investing. Doing so requires being data-driven. Ideology alone is not enough. Being sold on a story isn’t enough. A good idea isn’t enough. A good investment strategy, like the proper evaluation of baseball players and teams, will be – must be – supported by the data. Reality must rule.
Yet, most investors, even professional investors, are frequently motivated by hope, fear, greed, ego, recency, narrative and ideology; baseball GMs too. We would all be far better off if our processes were reality-based and data-driven at every point. If only our human make-up didn’t make it so difficult for us to do that.6
“A man of genius makes no mistakes”
Honoring our limitations is particularly difficult because we so readily “see” more than is really there. As Charlie Munger famously said to Howard Marks, “none of this is easy, and anybody who thinks it is easy is stupid.” A key reason investing is so hard is that the data tells us so much less than we’d like it to (similarly, that may well explain why MLB teams today value prospects more highly than current major-leaguers of similar stature and age, a huge change from when I was a kid).
Being truly data-driven also requires that we go no further than the data allows. Huge quantities of data do not necessarily correlate to huge quantities of insight. Remember, information is cheap but meaning is expensive. In James-speak, “A statistician is concerned what baseball statistics ARE. I had no concern with what they are. I didn’t care, and I don’t care, whether Mike Schmidt hit .306 or .296 against left-handed pitching. I was concerned with what the statistics MEAN.” (Emphasis in original).
Good player evaluation, like good investing, demands humility. We need to be humble so as to be able to recognize our mistakes and correct our errors. We need to remember that we don’t know everything (or even necessarily all that much). And we need to be able to recognize what and when we just don’t know. But that’s not something humans tend to be very good at. When we make big breakthroughs and have significant success, we tend to think that we’ve got things pretty well figured out across the board, that we’ve arrived.
It is utterly human to be or to become full of ourselves and overconfident rather than humble. We tend to believe, like the pretentious Stephen Daedalus in James Joyce’s overrated masterpiece, Ulysses, that “[a] man of genius makes no mistakes. His errors are volitional and the portals to discovery.” But human geniuses, including the geniuses leading the baseball analytics movement (and some of them are real geniuses), still make plenty of mistakes and, in something akin to bias blindness, often fail to recognize their foibles.
Drunk with success as well as righteously indignant with the “traditional baseball men” who had ignored them and denigrated their work, many in the sabermetrics movement are now eager to ridicule their enemies, real and imagined. But sometimes even the traditional baseball men get things right.
Among the favorite targets of the sabermetrics crowd was a former big league catcher (for 21 years) and long-time (but now retired) television baseball announcer, the “worthless pontificator” and quintessential “traditional baseball man” Tim McCarver. The entertaining and analytics-heavy site Fire Joe Morgan (run by some very funny television comedy big-wigs, who used analytics largely as a basis to criticize stupid announcers and announcing), which burned brightly over its lifespan from 2005 to 2008, described McCarver as “the worst color commentator in the history of the world, in any sport.”
Among McCarver’s ridiculous statements is this gem: “I only care about on-base percentage if you can run. If you can’t run, I could care less about on-base percentage.” Of course, it shouldn’t be surprising since McCarver “knows exactly nothing about” baseball (per Fire Joe Morgan, emphasis in original), demonstrated by his shocking insistence that lead-off walks lead to more big innings than lead-off homers (even though teams hitting lead-off home runs already have a run instead of just a runner on first).
Framing the Debate
Long among Tim McCarver’s favored ideas (shared by baseball old-timers) has been the practice of pitch-framing. For example, Tim McCarver’s Baseball for Brain Surgeons & Other Fans, published by McCarver in 1998, extols pitch-framing in significant detail. When a catcher (as McCarver was) frames a pitch, he tries to give the umpire as clear and as focused a view of the ball as possible so that a pitch in the strike zone is not called a ball. Framing also includes the catcher creating the illusion to an umpire that a ball just off the plate actually crossed the plate in order to “steal” strikes. It even wants to create the impression that a pitch several inches off the plate just missed. That’s so the umpire gets the idea that the pitcher has very good control, which can influence his calling of future balls and strikes. Old-timers swear by this technique. It has been written about instructionally since at least the 1950s. But Sabermetricians were not impressed.
In 2000, Baseball Prospectuspublished what purported to be a comprehensive analysis of “whether a catcher influences the rates of hits, walks, and extra bases a pitcher surrenders to the opposition” and concluded that he does not, at least in any meaningful way. Because of the apparent meaninglessness of catcher defense, the study went so far as to suggest that it might be possible to hide good hitters who couldn’t play defense elsewhere by using them as catchers. However, the absence of evidence for a particular proposition is not evidence of absence, that no such evidence exists. In this case, the discovery of additional evidence, made possible by technological innovation (most prominently, Pitchf/x, which captured pitch speed, movement and location with great precision), blew the idea that catchers don’t matter much defensively completely out of the water.
Take a look at the two pitches below, as shown and described in Grantland. Each is a four-seam fastball thrown by a right-handed pitcher to a left-handed hitter, with umpire Sam Holbrook behind the plate, that passes through the strike zone 21 inches off the ground, between 11.7 and 12.9 inches from the center of home plate. Each pitch hits the target, so the catchers know to expect them to be where they come in and thus have time to prepare.
There are only two significant differences between these scenarios. One is that the top pitch, thrown by James Shields in July of 2012, is called a strike while the bottom pitch, thrown by Liam Hendriks in June of 2012, is called a ball. The other is that the pitch on the top was caught by the Rays’ Jose Molina, now retired but always one of baseball’s best receivers while the pitch on the bottom was caught by the Twins’ Ryan Doumit, also now retired, but then one of the worst. The essence of pitch-framing is that depending on how the catcher goes about his job, two essentially identical pitches look a lot different once they get to the glove, allowing good catchers both to “save” strikes and “steal” strikes.
Concentrate on the catchers in those clips. Molina sets up farther outside, so even though the pitch to him is farther from the plate, he catches it in the center of his body. Doumit has to reach for the ball, drawing attention to its distance from the strike zone. The bases are empty in both clips, giving the catchers the freedom to set up any way they want without worrying about base runners. But only Molina goes down to one knee to present a lower, more stable target. Doumit’s head jerks sharply downward the instant after he catches the pitch. Molina’s remains still. And Doumit’s glove, descending to meet the pitch, dips even more after he catches it. This sends the ball farther outside the zone and forces him to jerk the glove back up in an exaggerated fashion. Molina’s glove never gets any lower than it is when he receives the pitch. He makes a much more subtle upward movement, and it takes about half as much time for his glove to come to rest.
The full body of video evidence across players, teams, leagues and seasons now shows that these differences in technique are persistent in players, as constant as a pitcher’s motion or a batter’s stance.
A careful 2011 analysis using this new evidence examined where strikes are typically called and determined which pitchers were getting more or fewer strikes than they “should” have, given where their pitches were located. This study allowed researchers to isolate the effect of the catcher and conclude that pitch-framing does in fact provide an astonishing level of impact persistently, more consistent from year-to-year than even reliable offensive metrics like on-base percentage and slugging percentage. Thus Molina, a really good receiver, was worth 35 runs above average per 120 games and Doumit, among the worst, was worth 26 runs below average. In laymen’s terms, that means that Molina’s defense was worth about an extra 3.5 wins per year while Doumit cost his team an extra 2.5 losses or so in that ten extra runs is generally worth about one extra win.
One sophisticated model for pitch-framing that accounts for more factors, including umpire, ballpark, batter, count, pitch location and pitch type showed that Molina added 0.50 runs and Doumit subtracted 0.55 runs per 100 pitches, which is an enormous difference when you consider that the best Major League pitchers (AL/NL) average about 15 pitches per inning (average pitchers of course average more). This research checked to see whether pitch-framing skill was reflected in the performance of each catcher’s battery mates and found that it was. Pitchers had significantly higher strikeout rates and lower walk rates when throwing to skilled pitch-framers.
When you consider that Barry Bonds was “only” worth about 0.78 extra runs above average per game during the 2001-2004 seasons, the height of his insane, steroid-fueled, record-breaking production streak, the potential value of pitch-framing comes into sharp relief. Indeed, the 2016 MLB offensive leaders were Mike Trout (AL) and Joey Votto (NL), worth 56.2 and 51.2 extra runs per season (around one-third of a run per game), respectively. According to the data, a good pitch-framer is worth a good bit more than that. Pursuant to further analysis by Dan Turkenkopf, now Director of Baseball Research & Development for the Milwaukee Brewers, the average value of turning a single ball into a strike is 0.13 runs. Do that a few times per game, as good receivers do, and a surprising number of losses turn into wins.
Before pitch-framing became statistically quantifiable, Molina’s receiving skills were generally known and valued (to a point – he never made big money), but nebulous. His weak bat, on the other hand, was easier to see, and almost as easy to quantify. On account of this second factor, a good pitch-framer is still an undervalued asset to be exploited by smart general managers and an especially valued commodity to teams that don’t have the financial resources to pay for superstars by more traditional measures.
On the other hand, as more teams recognize the value of pitch-framing, we can expect both that the cost of a good pitch-framer will go up and that the advantage of having a good receiver will go down as teams focus on developing good ones, driving the value of framing down (in much the same way that, in the investing world, good trades get crowded and, on account of the paradox of skill, investors can get better yet find it harder to outperform). In other words, since we’re talking about zero sum games (both baseball and investing), if everybody’s good then nobody’s good. Indeed, there is some evidence, as yet inconclusive, that pitch-framing may already be a diminishing advantage.
That said, there will almost surely continue to be inefficiencies to exploit for those smart and creative enough to find them and hold onto them. “I hear people say silly things like, ‘What happens when everybody in baseball is smart?’” Bill James says. “That’s just ridiculous. One form of ignorance will always replace another. There will always be smart teams and dumb teams. As [comedian] Ron White said: ‘You can’t fix stupid.’” In the meantime, it’s “a lot easier to teach somebody how to frame a pitch than it would be to teach them how to hit homers and drive in runs,” as Toronto Blue Jay catcher Russell Martin notes.
When pitch-framing video analysis was in its infancy, it was discounted because the impact found was so large: “the results just seem too outlandish to be correct.” And it remains uncertain if pitch-framing is truly as significant as current analysis suggests. It remains unclear, for example, how to divide the credit between pitcher and catcher for pitch-framing (the umpire and the hitter have roles to play in this drama too). Surely the catcher deserves plaudits for framing the pitch but the pitcher had to hit that spot.
The discussion and the debate are far from over. But we do know that old-timers like Tim McCarver were a lot closer to being right than the analytics geeks, at least about pitch-framing. Even so, the easy answer for baseball front offices at this point is something like what Aristotle called the “golden mean,” what Confucius called the “doctrine of the mean” and what Buddhists call the “middle way,” which is the desirable middle between two extremes (or at least a happy medium), including both traditional scouting and analytics.
Leave it to a stats guy, Dayn Perry, to offer up a great description of the ideal marriage between sabermetrics and traditional thinking: “A question that’s sometimes posed goes something like this: ‘Should you run an organization with scouts or statistics?’ My answer is the same [as] it would be if someone asked me: ‘Beer or tacos?’ Both, you fool. Why construct an either-or scenario where none need exist?” Happily, in many places today, both sides are increasingly getting along. “If you ask all the nerds in all the front offices, there’s a lot of respect for what people who played the game know about this stuff,” ESPN’s Sam Miller says (note that Miller is the other co-author of the great new book, The Only Rule Is It Has to Work).
Still, the people supposedly so committed to good analytic processes and data-driven decisions were wrong and almost surely really wrong for a surprisingly long time about pitch-framing (and may still be wrong on any number of matters). Even worse, they were often arrogantly wrong. Sabermetricians went around “[f]looding the inboxes of sports writers with VORP-laden snarky commentary,” while denying that baseball geeks don’t suffer from groupthink because they’re above such things. What they’re all about is “critically testing, analyzing, and evaluating ideas” (cough, cough). Consider too “the stat geeks [who] carry themselves with a pomposity that says to disagree with them is to challenge the very basis of science itself,” or who express “disdain for” old-time baseball men, who are seen as irredeemably “wrong.”
But why were these really smart, data-driven people so wrong? Regular readers should immediately intuit that even the most rigorously analytical stat-heads are as prone to human foibles, mistakes and biases as old-time baseball men (and everyone else). Daniel Kahneman, the world’s leading authority on human error, readily acknowledges making the same sorts of mistakes the rest of us do. “I never felt I was studying the stupidity of mankind in the third person. I always felt I was studying my own mistakes,” he says. Thus the haughty arrogance of the analytics experts was as badly misplaced as that of the traditionalists, as many now concede. “We were overcorrecting,” Fire Joe Morgan founder Mike Schur (a writer and producer for The Office and later the co-creator of Parks and Recreation and other television shows), told Slate. “We went too far in the direction of, screw all your traditional thinking. We have all the answers.” (Emphasis in original).
The most extreme practitioners of sabermetrics (and science generally) seem to think that it is possible precisely to measure everything or nearly everything and that anything that can’t be measured isn’t worth talking about. Some have grandiose visions of how complete our knowledge might be or become. But baseball, baseball analysis and investing all require as much art as science, with a certain level of subjective appreciation and understanding mixed in with hard data. And there is no good reason to think we’ll ever understand nearly as much as we hope and expect. The sad truth is that none of us is as good as we think we are.
Near the end of her wonderful novel, Housekeeping, Pulitzer Prize winner Marilynne Robinson notes the following: “Fact explains nothing. On the contrary, it is fact that requires explanation.” The great historian and philosopher of science Thomas Kuhn pointed out that “the so-called facts proved never to be mere facts, independent of existing belief and theory.” These are telling observations, observations that those who are overly enamored with and optimistic about science are prone to ignore or forget. Science is a fabulous tool but also merely a tool. It is not a be-all nor is it an end-all. Brute fact requires both meaning and context in order to approach anything like truth or understanding. Information requires meaning to become actionable. Moreover, the interpretation of difficult data sets, especially those involving human behavior, is difficult indeed.
To be clear, I don’t mean that our proffered solutions should be any less evidence-based because of our human frailties. Indeed, they should be more evidence-based by also considering the science of human behavior and how that behavior manifests itself, especially under stress. They should be more realistic too.
Even when performing careful analytical, evidence-based work, we are and remain biased, ideological and inherently tribal. For example, alleged scientific authorities have predicted the end of the world and civilization as we know them at the hand of pandemics, environmental catastrophes and otherwise over and over again. Yet we are still here, at least for now, in defiance of Thomas Malthus’s eighteenth-century warnings about overpopulation and ecologist Paul Ehrlich’s 1968 prophesy (with proposed authoritarian solutions) in The Population Bomb that “[i]n the 1970s and 1980s hundreds of millions of people will starve to death in spite of any crash programs embarked upon now.”
Despite the clarity and specificity of both his predictions and his errors, Ehrlich went on to compound his mistakes by trying to deny that they were really errors. The Stanford biologist and MacArthur Fellow sounded like a foolish market analyst insisting he was right but early last year when he asserted, “[o]ne of the things that people don’t understand is that timing to an ecologist is very, very different from timing to an average person.” To be sure, I take climate change and other potential environmental calamities most seriously indeed, but the odds are strongly against our ever having everything (and especially future events) all figured out with great specificity, much less soon. But we should all recall the insidious incentives supporting bold predictions, most notably potential fame, funding and influence, especially because there seems to be no accountability for being wrong. The list of alleged scientific “facts” that turned out not to be is a distressinglylong one.
Our inherent tribalism also at least partly explains the frequent animosity between stat-heads and old-time baseball people, irrespective of who might be right in any particular instance. It is a constant threat for all of us. To take one extreme example, Sam Harris, a polemicist who is also a scientist, has no difficulty imaging a nuclear first strike against those who believe differently than he does (most notably, Muslims) in the event that they acquire nuclear weapons: “it may be the only course of action available to us, given what Islamists believe.” Indeed, “[s]ome propositions are so dangerous that it may even be ethical to kill people for believing them.” It can be dangerously hard to bear in mind that we are rarely as right and as pure as we tend to assume. There is usually more than enough error to go around.
Underestimating the Density of the Fog
As Ben Lindbergh pointed out in Grantland, Adam Dunn and Juan Pierre came to embody the great sabermetrics debate back around the turn of the century. At first (and second) glance, they look about as different as can be. The stats guys loved Dunn, the huge but nonchalant power hitter who struck out a lot, couldn’t run and couldn’t play defense but drew a lot of walks and hit a lot of home runs. The baseball traditionalists preferred the passionate Pierre, the speedy singles hitter who played decent defense. But if we look at them today using more current measures (such as wins above replacement, a metric that hadn’t been invented at the height of the debate), it appears that these two very different players were of just about equal value in the aggregate. Dunn was a much better hitter, and hitting was the focus of much early baseball research. But Pierre was a better baserunner and fielder, and he played a much more valuable position, all areas that sabermetrics only began to analyze properly and thus value appropriately more recently. “We now know that both sides were right in one sense and wrong in another,” Lindbergh wrote, which is a polite way of saying that nobody knew nearly as much as they thought all along.
As I have said before, on our best days, when wearing the right sort of spectacles, squinting and tilting our heads just so, we can be observant, efficient, loyal, assertive truth-tellers. However, on most days, all too much of the time, we’re delusional, lazy, partisan, arrogant confabulators (even the best rationalists among us).
That unfortunate reality is becoming more and more accepted by analytics-types all the time while, in some respects at least, Bill James knew about it all along. When James filled out an All-Star ballot that included a disproportionate number of Red Sox players, his son asked, “Dad, is that because you really believe it or because they’re Red Sox?” James replied, “I really believe it because they’re Red Sox.” James is also cognizant of the value of the “outside view,” the perspective he has long tried to provide, and seeks to apply his outside view to the stat-heads and to the “traditional baseball men” alike.
Yet as compelling as the behavioral story is – which is really compelling and one I tell often and passionately – it’s not the whole story. Not by a long shot. We must also consider the world as we experience it and have come to understand it. Despite our best efforts to make it predicable and manageable, that world (the investment world/the baseball world) is too immensely complex, chaotic and chance-ridden for us to do so with nearly the precision we’d expect. We can often get to (an often data-mined) correlation, but correlation does not imply causation.
As with investing, James recognizes that finding baseball anomalies – potential advantages for players and teams, such as on-base percentage in 2002 – that exist and persist can be exceedingly difficult. Accordingly, “a wide variety of supposed ‘skills’ of baseball players were actually just random manifestations of luck, and many other people have done the same.” However, some of the conclusions drawn about these supposed skills (or non-skills) are deemed true because of an inability to support them rather than, in a good scientific manner, having falsified them. Thus the “zero persistence equals luck” type of study is inherently dangerous. In baseball, as in investing, “[i]t is [really, really] hard to distinguish the luck from the real skill” because “randomness is operating on a vastly larger scale” than most statistical analysis can accommodate. “In effect, we are asking a Volkswagen engine to pull a semi.” This is Nassim Taleb’s “fooled by randomness” applied to baseball.
This is a crucial point with respect to misinterpretation of data despite a real commitment to it. “We ran astray because we have been assuming that random data is proof of nothingness, when in reality random data proves nothing.” Put colloquially, James is saying that the absence of evidence is not evidence of absence (a frequently uttered phrase for which we should perhaps thank William Cowper, Martin Rees, Carl Sagan, Donald Rumsfeld or somebody else). But James offers an even better description.
“In a sense, it is like this: a sentry is looking through a fog, trying to see if there is an invading army out there, somewhere through the fog. He looks for a long time, and he can’t see any invaders, so he goes and gets a really, really bright light to shine into the fog. Still doesn’t see anything.
“The sentry returns and reports that there is just no army out there — but the problem is, he has underestimated the density of the fog.”
To be clear, James heartily disagrees with those who discount things like leadership, team chemistry or heart because there isn’t data to measure them (yet?). “[S]kepticism should be directed at things that are actually untrue rather than things that are difficult to measure.” Therefore, and crucially, “Let’s look again; let’s give the fog a little more credit. Let’s not be too sure that we haven’t been missing something important.”
It is axiomatic that the more strongly one is convinced of something, the harder it is to change his mind. Accordingly, scientists – despite their ostensible commitment to the idea that new or better evidence can always suggest a different conclusion – can be as prone to holding on to a mistaken view as any ideologue. Their determination that their interpretation of the evidence is objectively established (together with standard issue optimism bias, confirmation bias and the like) can make them as hardened against reality as any other zealot. As the great physicist and Nobel laureate Richard Feynman stressed, “The first principle is that you must not fool yourself — and you are the easiest person to fool.” I am not espousing any sort of relativism here. There are a very large number of things that science knows with a high degree of probability and a large list of things that science readily admits it doesn’t know (right now, at least). The problem is that the demarcation line between those two categories is much less definitive and far wider than scientists will readily admit.
As Feynman described it, “In the South Seas there is a Cargo Cult of people. During the war [World War II] they saw airplanes land with lots of good materials, and they want the same thing to happen now. So they’ve arranged to make things like runways, to put fires along the sides of the runways, to make a wooden hut for a man to sit in, with two wooden pieces on his head like headphones and bars of bamboo sticking out like antennas – he’s the controller – and they wait for the airplanes to land. They’re doing everything right. The form is perfect. It looks exactly the way it looked before. But it doesn’t work. No airplanes land. So I call these things Cargo Cult Science, because they follow all the apparent precepts and forms of scientific investigation, but they’re missing something essential, because the planes don’t land.”
As with science generally, there are surely plenty of “cargo cult” conclusions in both sabermetrics and investing — approaches, models and systems that are said to work somehow without adequate analysis, testing and safeguards. A data-mined but insufficiently authenticated “solution” will almost surely be a disaster for everyone but those collecting a fee for it (sadly, integrity can usually be purchased).
The problem here relates both to good logic and good practice. Good logic requires our recalling at all times that correlation does not mean causation, as the islanders above saw (if not quite learned). It requires recalling that the absence of evidence is not evidence of absence – the problem of induction. It demands we remember that meaning is expensive, even for those most committed to and immersed in the available data.
The world we live in is profoundly complex and is much more difficult for us to navigate than we usually think or assume. According to Nobel laureate Daniel Kahneman, “[w]e systematically underestimate the amount of uncertainty to which we’re exposed, and we are wired to underestimate the amount of uncertainty to which we are exposed.” Accordingly, “we create an illusion of the world that is much more orderly than it actually is.” It’s that illusion to causes many mistaken conclusions, conclusions we hold onto far too long.
The best data analysis won’t necessarily mean winning the pennant. Data isn’t everything, but good investing and good baseball management are impossible over the long haul today without its careful analysis and use. Good analytics is necessary if not sufficient for ongoing baseball and investment success, as the philosophers would have it. Luck can work in the near-term.
When I was a kid in the 1960s, obsessing over baseball players and stats, plenty of people were telling me to question everything, but the implicit (and erroneous) suggestion was that I reject everything. Instead, I suggest honoring the past without being bound by it. Sabermetrics doesn’t eliminate the need for old-fashioned scouting. Both sides have important insights to offer. Consistent with Robert Hagstrom’s idea that investing is the last liberal art, we should always explore and learn, combine thoughts from multiple sources and disciplines, try to think nimbly because the need for new approaches is ongoing, and we should test and re-test our ideas. That idea applies to baseball too.
If we are going to succeed, we’re going to have to ask hard questions, keep asking questions, and question our answers aggressively even (especially!) when we think they’re right. Data won’t give us all the answers, but all of our good, objective, actionable answers — upon which we should build our investment (and baseball) beliefs — will be consistent with the data. Thus our processes should be data-driven at every point. Smart investing is evidenced-based and reality-based. So is good sabermetrics. But we must not underestimate the density of the fog either. As James Thurber (and later Casey Stengel) would have it, “You could look it up.”
2 Epstein discovered James’ research when he was still in elementary school, went to Yale and was the sports editor of The Yale Daily News. But he never played college or professional baseball.
3 Not coincidentally, prominent political analytics guru Nate Silver got his start with sabermetrics. Alphabet (Google) even uses baseball analytics to illustrate the data-driven decision-making process to which the company aspires.
4 At the institutional level, analytics adoption has been universal for a long time, but often to poor effect by trying to salvage a lousy process, to avoid a better process or to justify excessive fees (see footnote 6 below).
5 For many, EBI focuses on indexing (of various potential sorts) because of the poor track records of active investors. For many of those (and others), because certain investment factors (such as size, value, quality, low volatility and momentum) repeatedly crop up over time in various global markets and different market conditions as indicators of investment success, they will be sought out for investment. Low fees will be sought out as the best indicator of good performance. Diversification will also be required on account of rampant randomness and our inherent human limitations and weaknesses. Asset allocation will trump security selection both because it has more influence on performance and because it enhances a portfolio’s risk/reward characteristics. And investment choices based upon predicting the immediate future are avoided because it simply can’t be done. That is the science of investing, broadly construed (I generally agree with the above despite a few qualifications and adjustments), which is why I would prefer to describe the EBI approach as science-based investing.
6 If only business realities weren’t in on the conspiracy. To quote Tadas Viskanta (and myself), investing successfully is really hard. But we can see generally what works and what doesn’t work. That we see and don’t do (or try to do) what works is partly due to poor analysis and partly due to cognitive biases that limit our success, but it’s also partly a commercial judgment. In the words of Upton Sinclair, “It is difficult to get a man to understand something, when his salary depends on his not understanding it.” Sadly, scientific reasoning isn’t often practiced (or is often ignored) in the investment world because there isn’t always money in it. The money follows and is in magical thinking (“You can outperform in every environment!”). Since the primary goal of the vast majority of people in the investment industry is to make money (irrespective of what they tell their clients and customers), magical thinking carries the day.