# CFA Conference: Nate Silver

The Signal and The Noise

Nate Silver runs the political website FiveThirtyEight.com, where he publishes a running forecast of current elections and hot-button issues. Formerly published in The New York Times and recently relaunched in partnership with ESPN, FiveThirtyEight.com has made Mr. Silver the public face of statistical analysis and political forecasting. His latest book is titled The Signal and The Noise: Why Most Predictions Fail—But Some Don’t. Before coming to politics, Mr. Silver established his credentials as an analyst of baseball statistics, developing a widely acclaimed system that predicts player performance, career development, and seasonal winners and losers. He has written for ESPN.com, Sports Illustrated, Slate, and the New York Times. Mr. Silver received his BA in economics from the University of Chicago.

• Overfitting: The reason noise is mistaken for signal
• Closing the gap between what we know and what we think we know: One giant Bayesian leap—and some small steps—forward
• Improving forecasts and models: Think probabilistically, stop and smell the data, know your biases, and add a dose of humility

My session notes follow. As always, these are contemporaneous notes. I make no guaranty as to their accuracy or completeness.

• Big data isn’t “the” answer – it requires interpretation; failures of prediction are inevitable; forecasts of impossibility or certainty often wrong; we forecast poorly overall
• 2012 election – his method was pretty simple: average the polls; count to 270; account for the margin of error
• That his election analysis was so controversial shows how badly we misunderstand data and how fractured our political system is
• We tend to ignore and/or misunderstand the margin of error
• Media wants certainty and big day-to-day changes; more certainty = greater error
• Problem #1: Bid data…big bias – only 5% of Fox viewer are D; only 1% of MSNBC viewer are R; perverse incentives to skew or misapply the data; more data increases signal-to-noise ratio problem; lots of variables = lots of difficulty (correlation does not imply causality)
• Problem #2: Desperately seeking signal – more data = more disagreement; we take random data and build narratives; limits of artificial intelligence
• Problem #3: Feature or bug? – There’s always someone on the other side of the trade (and s/he’s often really smart); Bayes theorem offers a way out based upon probabilities, but it isn’t pure: (1) think probabilistically; (2) know where you’re coming from; (3) trial and error testing.
• Think probabilistically – Grand Forks flood prediction of 49’ with 51’ levee…BUT… 9’ margin for error; almost 40% chance of big problem, but NWS didn’t want people to disregard the prediction because of uncertainty
• Know where you’re coming from – Pearl Harbor was as shocking at the time as 9.11; Japan carrier group went right through the “blind spot” of our vantage points
• Try, and err – By getting halfway decent you can look really, really smart (often not too hard to get to 80% success but really hard as you keep moving up); still competitive advantage is made near the top; fitting data to past samples isn’t the same as making successful predictions
• The road to wisdom is plan, err and err and err, but less and less and less

Q&A…

• Miami Heat looks good, especially to get out of the east (East is weak and lots of good teams in the West)
• Financial crisis – not enough data (using only 20 years in the models and a relatively stable period) and misinterpretation of the data
• Often difficult to convey uncertainty; data visualization can help
• Economic data is really noisy, especially in real time
• Chinese data is too smooth, makes him suspicious that some of the data is fake
• Threat detection teams should be diverse; different perspectives vital to noise reduction
• Move from NYT to ESPN – after Senate prediction, it was like bizarre-world after 2008 as Harry Reid called him an idiot and Rush Limbaugh was giving shout-outs; diversifying away from politics; mix of fun and interesting stuff; the “audience” is an overlapping series of people; nobody gets news in one place anymore
• People react to one another in ways machines don’t; leads to fat tail risks
• Reaction to data-driven journalism – need to look at competition at least a bit; but key is to look at the data carefully and well; contrasts data journalism and explanatory journalism
• What intrigues him about the US economy? He tries to avoid forecasting; concerned about rise in billionaires and certain assets (e.g., increase in sports franchise values); believes in mean reversion
• 20 people at 538; his job is different from when he was a one-man-band; wants people with work ethic and smarts; looking for upside; programming ability; trying to improve every day; 2/3 time producing content, 1/3 managing and coaching
• Data that’s too messy or noisy? Obamacare; the rise of inequality – not sure a daily publication can deal with that; that leads to other publications leading with their beliefs; will need to be tackled at book length
• Remain skeptical of the evidence; internet view – 15% BS; 5% insightful; 80% in the middle
• In light of Picketty, much more discussion of inequality; perhaps a retreat for the Clinton neo-liberal consensus