At first glance, the idea that teams that get crushed won’t win it all — perhaps they have a key weakness that might be exploited or problems with certain kinds of match-ups — isn’t a bad one. That it seems to be backed up by the data props it up even more.
But there are some crucial problems with the hypothesis. For one thing, we may not really be talking about the same team. For example, Ryan Kelly was injured when Duke got rolled by 27 points in January against Miami and all of Duke’s four losses came with Kelly out of the line-up.
But the biggest problem with Chairusmi’s “analysis” is his use of data. As regular readers are more than well aware, I am a big proponent of data-driven analysis. If one’s hypothesis may be subjected to analytical scrutiny, it can and should only stand if and when the data supports it. Significantly, however, the data needs to be used both accurately and fairly. Chairusmi’s data is accurate as far as it goes, but it is hardly fair because it is so incomplete.
Note that Chairusmi cites data “since 1994.” That 19 years — an odd data set. That made me suspicious. And, as it turns out, I was suspicious for very good reason.
In 1993, the year before Chairusmi’s arbitrary data cut-off, North Carolina (ugh
!) won the championship
by defeating Michigan in the title game. But back in January of that year, the Heels got rolled
by Wake Forest, 88-62. That’s a 26 point deficit. And in 1991, just two years earlier, my beloved Blue Devils won their first title
, paced by Christian Laettner, Bobby Hurley and Grant Hill, by upsetting the seemingly invincible UNLV Runnin’ Rebels in the semi-final before besting Kansas in the final.
However, in the last game of the regular season, Duke was trounced by Carolina
!), 96-74, a 22 point margin. Clearly, Chairusmi’s hypothesis isn’t nearly as powerful as his article suggests. I suspect (without checking) that looking back further into history will disclose other champions who suffered big defeats earlier in the season.
When trying to ferret out a new idea in the investment world, it makes sense to mine the available data to see where opportunity may reside. However, correlation is not the same as causation (past performance may not be indicative of future results) and the data mined may not tell the whole story.
Data is imperative for good investment analysis, of course. But data must be interpreted to be useful and, as I often emphasize, data is cheap while meaning is expensive. It should be axiomatic that the data sets used need to be fully representative and that any argument based upon data needs to disclose any difficulties with the data. When a fuller data set is examined, Chairusmi’s argument is shown to be deceptive at best. This is a classic example of shaping the data to support an argument rather than following the evidence wherever it leads. If March Madness results in Indiana or Louisville winning it all this year, Jim Chairusmi may well crow about how his data-driven analysis nailed it.
Just don’t you believe it.