Readers may recognize Silver's name from his rise to prominence as the stat dude who called Obama's presidential victory accurately while more established pollsters screwed up. To be fair to Silver, his site, fivethiryeight.com (now owned by ESPN), is one of the more literate domains in American sports and politics.
Silver's team makes efforts towards transparency; in the case of the UK election, fivethirtyeight published a useful postmortem, acknowledging they had "missed badly" before dissecting the reasons. But of greater interest is Silver's latest piece, Polling Is Getting Harder, But It’s A Vital Check On Power, which admits that despite our sophisticated whiz-bang tools, predicting political outcomes is getting tougher. But why?
"Random sampling" becoming a problem
As Silver tells it, he recently got back from the American Association for Public Opinion Research, where much hand-wringing occurred. Evidently the tools of the polling trade have fallen on hard times. The problem Silver sees is "simple but daunting." The key to accurate opinion research is the ability to poll a random sampling of the population. In the U.S., that's getting harder to do.
Response rates to telephone surveys continue their decline, and are frequently in the single digits, even for high quality polls. Silver posits that "The relatively few people who respond to polls may not be representative of the majority who don’t." New FCC guidelines in automated dialers might make polling more difficult.
Can Internet polling make up the shortfall? Silver isn't optimistic. Best practices for online polling aren't established. "It’s fundamentally challenging to “ping” a random voter on the Internet in the same way that you might by giving her an unsolicited call on her phone," says Silver. That may be why online pollsters typically use a panel approach, claiming the panel represents a broader population.
But here's the curve ball: none of these obstacles are particularly new. Silver explains that despite these flawed methods, pollsters were relatively accurate in the American election cycles of 2004, 2008, and 2010. The Obama election polls missed, but not by a wide margin. But since then, pollsters have been whiffing:
- Polls of the U.S. midterms last year badly underestimated the Republican vote.
- UK election polls (most conducted online) predicted a dead heat for Parliament instead of a Conservative majority.
- The polls also performed badly in last year’s Scottish independence referendum and this year’s Israeli general election.
The pundits and talking heads are not a predictive substitute - in the case of the UK election outcome, it seems only Mick Jagger knew what he was talking about.
Herding, fat tails and other mathematical misbehaviors
In addition to research gathering flaws, another disturbing trend may be contributing: a conformist tendency known as herding, where poll results start resembling one another, particularly at the end of election cycles. Silver cites a recent UK example:
The U.K. pollster Survation, for instance, declined to release a poll showing Conservatives ahead of Labour by 6 points — about the actual margin of victory — because the results seemed “so ‘out of line’ with all the polling,” the company later disclosed.
Lest we dismiss these issues as an inconvenience for political cable channels, Silver also points out that numerous industries rely on random-sample polling, from the Census Bureau to the Center of Disease Control and Prevention to economists that publish monthly jobs reports and consumer confidence figures.
So does the solution lie in better algorithms? Silver warns that moving away from random sampling might bring with it new error distribution problems that can skew the results, such as fat-tailed error distributions . As the Wikepedia entry gently puts it:
Traumatic 'real-world' events (such as an oil shock, a large corporate bankruptcy, or an abrupt change in a political situation) are usually not mathematically well-behaved.
Indeed. One way to compensate for random sampling limitations is through heavy demographic weighting of those who do respond. But that can create a mathematical misbehavior, a.k.a. a big miss. In a prior piece, Silver points out that demographic weighting doesn't compensate for random sampling's exclusion of cell-phone-only voters, whose voting behavior does not seem to correspond with their demographic.
In retrospect, this happy blogger's 2012 review of How Nate Silver won the election with Data Science reads more like a cautionary tale than a triumph. The piece extolls the "victory for Data Science versus 'gut instinct', 'philosophy' and 'fundamentals." And: a victory for the statistician over the bloviating talking heads.
The author gushes over Silver's statistical approach, including his use of many sources of data, his incorporation of historical data, and his extraction of data from "every source," rather than excluding sources based on bias.
But what the author underestimated is that even if Silver had superior/broader data sources, how those sources are weighted in Silver's algorithms make all the difference (When Silver assessed his failure in the UK general election, he primarily blamed which sources were selected for his predictive model).
Final thoughts: enterprise lessons
Some enterprise lessons are obvious here; I'd put them in two buckets. On the one hand, enterprises are trying to make sense of the market based on external surveys, articles and reports. In the enterprise world, the surveys we rely on are often prone to flawed results. The usual culprits include Silver's caution on the limitations of online polling (most enterprise software surveys are conducted online).
Other problems include small sample size (getting a large group of enterprise decision makers to take a survey is an expensive undertaking), and then the "follow the money trail" problem, where most surveys with a large sample size are invariably funded by vendors with a vested interest.
This does not mean the surveys are useless or the results irrelevant. But it does require us to take "factual" presentations (such as sponsored benchmarking tests) with several grains of salt. We would also be wise to take the statements of enterprise gurus/pundits with a disclaimer, and always keep a wary eye out for the "herding" Silver refers to. The enterprise media has its own herd fashion, anointing some vendors as cool and some as "legacy". But the herd can't lead you to a business result.
Then there's the challenge of the predictive industry as a whole. We now have the technical chops to crunch massive amounts of data based on algorithmic parameters. Usually when we get an inaccurate result, we question the veracity of the underlying data. Or, we adjust the algorithm in search of a more accurate weighting. We always assume there is a glorious business benefit on the other side. But data sources with weighted scores contain one other variable: an assumption. That assumption is whatever applies the weight to that score. And that assumption could be dangerous for our predictive accuracy.
As Vijay Vijayasankar warned in the Slippery Slope of Predictive Analytics:
Predictive models make assumptions – and these should be explained to the user to provide the context. And when the model spits out a result – it also comes with some boundaries (the probability of the prediction coming true, margin of error. confidence etc). When those things are not explained – predictive analytics start to look like reading palms or tarot cards. That is a disservice to Predictive Analytics.
Vijayasankar aruges that it's not about blaming or idealizing predictive: it's about giving predictive its own proper weighting and context in the decision-making process:
The additional context provided by predictive analytics enhances the manager’s insight and over time will trend to better decisions. The idea definitely is not to over rule the intuition and experience of the manager. Of course the manager should understand clearly what the model is saying and use that information as a factor in decision making.
Approaching statistical problems in a dispassionate manner that includes rigor on method and a diversity of sources will yield a more accurate result. Whether that can be turned into a business outcome remains to be seen, and probably varies by industry. When variables are more controllable, such as in the case of fraud behavior patterns or machine maintenance, predictive may be a business asset. But if our profits come down to predicting the weather, I don't like our chances.
Image credit: mann schlägt verzweifelt die hände vors gesicht © Picture-Factory - Fotolia.com
Updated with a few cosmetic improvements (no text changes) early morning, June 4.