Underestimating the Density of the Fog

The story has essentially attained the level of holy writ, at least to those committed to data and evidence, such that it now seems almost too good to be true. The quick-and-dirty version of the tale is that stats geeks with computers, like those former player and broadcaster Tim McCarver called “drooling eggheads,” outsmarted and outmaneuvered the stupid yet arrogant “traditional baseball men” who ran our most traditional sport at the professional level and who thought they knew all there was to know about the game. Thus, it is said, everything the old-time baseball men thought they knew about evaluating players and teams has been found wanting, not that those whose day has passed, committed to wizardry and witchcraft as they are, have recognized it.

This revolution – as shocking as it has been comprehensive – is said to have brought about the ultimate revenge of the nerds. The geeks now run the baseball show, having moved the level of analytical precision involved in running teams and evaluating players from zero-to-sixty in a flash. The new breed of “baseball men” aren’t grizzled scouts looking for “five tool guys” but, rather, Ivy League educated experts in computer modelling and statistical analysis who use those skills to determine who to scout, who to sign, who to play and how to play. The prevailing narrative describes this new contingent as dominating professional baseball at every level, down to the depths of the independent minor leagues.

Is the analytics overhaul of baseball proper as complete and comprehensive as the telling claims? No. The real story is much more interesting and enlightening than that.

Baseball is particularly amenable to the use of statistical analysis because it offers large sample sizes, discrete individual-performance measures (such as plate appearances, pitches, and the like), and ease of identifying positive results (such as winning, home runs, and the like). However, when humans are involved – and baseball is as human as can be – interpretation of the underlying data is highly complicated.

Great interpretation of difficult data sets, especially those involving human behavior, involves more sculpting than tracing. It requires great skill, imagination and even a bit of whimsy as well as collaboration as to whether the various interpretive choices are acceptable (not to say the right) ones. That’s why we understand reality better with respect to the natural sciences than in the social sciences. As ever, information is cheap but meaning is expensive.

To lead off, let’s recall that if it seems too good to be true, it usually is. To see what I mean by that in the context of our story will require some in-depth analysis of its own, starting with more than a bit of background information and history.  Continue reading