Mathematical Malpractice

march madnessThe Wall Street Journal is promoting what it purports to be a “hot tip” for the upcoming NCAA basketball tournament.  According to the Journal’s Jim Chairusmi, second-ranked Duke (full disclosure: I’m a Duke alum) won’t win the national championship this season. Neither will No. 5 Georgetown, No. 6 Michigan or No. 7 Kansas. That’s because all four of these teams had one game this season where they got demolished, and that rarely happens to championship teams. As the Journal points out, only one team since 1994, the 2001-02 Maryland Terps, lost a game by more than 20 points and went on to win the title. Moreover, three of the five most recent NCAA champions went through the season without even losing a game by double digits — third-ranked Indiana and No. 4 Louisville fit that mold this season.  If this holds true again, that’s bad news for the Blue Devils, Hoyas, Wolverines and Jayhawks and potentially good news for the Hoosiers or the Cardinals.

Maybe you should fill out your NCAA bracket next week accordingly.  But then again, maybe not.

At first glance, the idea that teams that get crushed won’t win it all — perhaps they have a key weakness that might be exploited or problems with certain kinds of match-ups — isn’t a bad one.  That it seems to be backed up by the data props it up even more.
But there are some crucial problems with the hypothesis.  For one thing, we may not really be talking about the same team.  For example, Ryan Kelly was injured when Duke got rolled by 27 points in January against Miami and all of Duke’s four losses came with Kelly out of the line-up.
But the biggest problem with Chairusmi’s “analysis” is his use of data.  As regular readers are more than well aware, I am a big proponent of data-driven analysis.  If one’s hypothesis may be subjected to analytical scrutiny, it can and should only stand if and when the data supports it.  Significantly, however, the data needs to be used both accurately and fairly. Chairusmi’s data is accurate as far as it goes, but it is hardly fair because it is so incomplete. 
Note that Chairusmi cites data “since 1994.”  That 19 years — an odd data set.  That made me suspicious.  And, as it turns out, I was suspicious for very good reason.
In 1993, the year before Chairusmi’s arbitrary data cut-off, North Carolina (ugh!) won the championship by defeating Michigan in the title game.  But back in January of that year, the Heels got rolled by Wake Forest, 88-62.  That’s a 26 point deficit.  And in 1991, just two years earlier, my beloved Blue Devils won their first title, paced by Christian Laettner, Bobby Hurley and Grant Hill, by upsetting the seemingly invincible UNLV Runnin’ Rebels in the semi-final before besting Kansas in the final. 
However, in the last game of the regular season, Duke was trounced by Carolina (ugh!), 96-74, a 22 point margin. Clearly,  Chairusmi’s hypothesis isn’t nearly as powerful as his article suggests.  I suspect (without checking) that looking back further into history will disclose other champions who suffered big defeats earlier in the season.
When trying to ferret out a new idea in the investment world, it makes sense to mine the available data to see where opportunity may reside.  However, correlation is not the same as causation (past performance may not be indicative of future results) and the data mined may not tell the whole story. 
Data is imperative for good investment analysis, of course.  But data must be interpreted to be useful and, as I often emphasize, data is cheap while meaning is expensive.  It should be axiomatic that the data sets used need to be fully representative and that any argument based upon data needs to disclose any difficulties with the data.  When a fuller data set is examined, Chairusmi’s argument is shown to be deceptive at best.  This is a classic example of shaping the data to support an argument rather than following the evidence wherever it leads.  If March Madness results in Indiana or Louisville winning it all this year, Jim Chairusmi may well crow about how his data-driven analysis nailed it.
Just don’t you believe it.

3 thoughts on “Mathematical Malpractice

  1. Pingback: Read of the Day: Mathematical Malpractice | Fifth Estate

  2. Pingback: Probability, Baseball and the WSJ | Above the Market

  3. Aw, this was a very nice post. Taking a few minutes and actual effort to generate a superb article…
    but what can I say… I procrastinate a lot and don’t seem to get nearly anything done.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s