Spreadsheets have been consistently found to be costly, error prone and liable to significant financial burden. Research on this topic, which goes way back to the late 1970s consistently shows this characteristic. The recent case where Goldman Sachs screwed up a spreadsheet used to calculate a share valuation when TIBCO was acquired cost stockholders $100 million. That was a rounding error in a $4 billion deal but it is still a large sum. What's the problem?
The Panko dimension
In a 1998 research paper (updated 2008), prof Panko of the University of Hawaii said this:
In general, [spreadsheet] errors seem to occur in a few percent of all cells, meaning that for large spreadsheets, the issue is how many errors there are, not whether an error exists. These error rates, although troubling, are in line with those in programming and other human cognitive domains. In programming, we have learned to follow strict development disciplines to eliminate most errors. Surveys of spreadsheet developers indicate that spreadsheet creation, in contrast, is informal, and few organizations have comprehensive policies for spreadsheet development.
[All emphasis mine]
In short - the spreadsheet is a development tool that should be subject to the same level of testing rigor as any other programming language or development environment. Except that it isn't. In reality, managing spreadsheets is much worse than Panko describes and continues to cause a variety of problems, the outcomes of which are evident from the TIBCO case.
There are numerous reasons why spreadsheets are error prone but in my experience, the principle problem lies in a failure to understand the need for detailed documentation that adequately explains the code used to build a number and/or an inability to understand what a specific formula means.
The next question comes - how bad is the incidence of error? In 2003, Louise Pryor pointed to aggregate Panko research that showed error rates of 86-100%, depending upon the study in question. She said:
It’s difficult to know how to interpret these results. They are certainly very high numbers, and send a chill down my spine.
One of the outcomes of these research was the formation of EuSpRIG - an acronym for the European Spreadsheet Risk Interest Group. It has just concluded its 16th annual conference. If you want to read some of its record of horror stories then this link is instructive.
The practical problem
I first became interested in the topic around 1991-2 when I was preparing cashflow forecasts for funding operations. At the time, there were no tools available other than the spreadsheet and it used to drive me nuts that I'd have to fiddle about with templates that then had to be checked and rechecked for integrity. There were no linking mechanisms back to financial statements so there was always a risk that errors would creep in. So when I came across my first budgeting, planning and forecasting tool it was like arriving in heaven. I successfully dumped the spreadsheet - mostly - in favor of a tool that was build for the purpose and became much more efficient and confident in the information I was producing.
When the business intelligence industry took off in the mid 1990's my interest was re-ignited as someone who was testing the big tools of the time: Cognos, Brio, Hyperion, TM/1 and many others which have since gone by the wayside or been acquired. Fast forward t0 today and we see a landscape where there are many new tools, each of which is trying to find a fresh take on analytics. This time though we have the added nuance of predictive capabilities.
However, none of that prevents the spreadsheet from rearing its ugly head, with many tools having the capability to suck in a spreadsheet induced CSV file. Many of those same tools attempt to ring fence the spreadsheet so that the kind of integrity required to provide assurance is guaranteed. I see that as a kludge but I 'get' why it happens.
For my part, I wish that business leaders and CFOs in particular would simply mandate the outlawing of the spreadsheet for anything other than for pure ad hoc work. Alas, and despite all the historical evidence and weighty discussion, I'm not winning that argument.
Vijay's arguments - the 'defeated perspective'
For his part, Vijay Vijayasankar offers what I term 'the defeated perspective,' arguing (among other things) that:
What is the most common data source used by the new generation BI tools ? Excel ! People dump data from other systems to excel , add formulas etc, and put a nice visualization on top via the slick BI tool. Just that they don’t talk about excel in the scenario explicitly.
On the facts, he is right of course saying that convenience trumps functionality and concluding:
When you see customers who say “we have displaced excel” , at best it means one department for one use case has been using a new tool instead of excel . What would be great to know is if there are entire companies who have completely gotten rid of spreadsheets as a BI, EPM and ETL tool. I have not seen any in the nearly twenty years I have spent in this field.
The smart thing in my opinion is to find the best co-existence strategy for excel and all the other tools. Spreadsheets are invaluable when used reasonably – please just don’t paint it evil with broad strokes.
The problem I have with Vijay's argument is exactly the same one I have regarding email.
Continued and ubiquitous spreadsheet use in the face of long run studies that consistently show high and dangerous levels of error does nothing to address the real problems in the same way that email is an incredibly inefficient and costly time sink for internal communications. In short - there is nothing other than a shrug of the intellectual shoulders to support arguments that may be rooted in fact but which have no intellectual or logical merit.
One point Vijay makes that is worth emphasizing: for all the claimed gorgeousness of the old and new classes of BI application, many if not all have roots that go back to the spreadsheet. In that sense, some vendors are claiming freedom from spreadsheet hell when in reality all they're doing is putting lipstick on a pig. Cynical perhaps but a reflection of a failure to look at root causes and solve for those.
Over arching all this though is the fact that whenever I discuss this topic, almost no-one seems to have understood the problem and have scant knowledge of the research or what it means. I guess that's what happens when you have been brought up to believe a lie to the exclusion of alternative truths.
Solving for spreadsheet hell
Is there a solution for the problem of spreadsheet hell? Yes.
There is nothing fundamentally wrong with the spreadsheet metaphor of row and column presentation. But, you have to go back to first principles of business application design to understand what it means to articulate that for reporting and everything that flows from that activity. It is pointless attempting to build an accounting solution if reporting and BPF are an afterthought yet that's what the current state of the industry reflects.
So for example one of the main reasons I have been so impressed with Workday over the years is that they designed the system with analytics at the center, not as an afterthought. As far as I know, they're the only vendor that has done so at scale. Even then it has proven to be a very difficult and costly problem to overcome. Workday's investments in both developing BI/BPF and investing in predictive analytics bodes well for the future of its customers. Whether it gets them out of spreadsheet hell remains to be seen.
Disclosure: Workday is a premier partner at time of writing.