Excel and COVID-19, a marriage made in hell

Profile picture for user gonzodaddy By Den Howlett October 5, 2020
Summary:
Spreadsheet error hits the headlines - but it remains widely misunderstood.

failure
(Pixabay )

WARNING - some finance types might get offended by this story and there are sections that are decidedly not safe for work. I don't care - this is too important to kick the can down the road one more time. 

I've been writing about spreadsheet error since 1996. It's a favorite rant of mine which I normally pen when there's nothing better to do these days. Why? Because despite the long history of cock-ups using spreadsheets, the vast majority of those who swear by them - mostly finance types - don't care enough to take the problem seriously.

Paradoxically, the UK government's botched COVID-19 reporting might just be the equivalent of a burning platform that changes the view about where and, more importantly where not, to use a spreadsheet. What happened?

On Sunday 4th October it emerged that the UK government had under-reported the number of COVID-19 cases by some 16,000 and some. Ars Techinica reported the story in some detail but as is often the case with spreadsheets the story was both confusing and downright wrong - not in its reporting - but in the variety of explanation. 

At first, Public Health England (PHE) put the error down to an 'IT glitch.' Next, the Guardian reported about problems with Excel row limitations, but it was the BBC that finally got to the bottom of what happend, reporting that:

PHE had set up an automatic process to pull this data together into Excel templates so that it could then be uploaded to a central system and made available to the NHS Test and Trace team as well as other government computer dashboards.

The problem is that the PHE developers picked an old file format to do this - known as XLS.

As a consequence, each template could handle only about 65,000 rows of data rather than the one million-plus rows that Excel is actually capable of.

And since each test result created several rows of data, in practice it meant that each template was limited to about 1,400 cases. When that total was reached, further cases were simply left off.

As you might expect, Twitter was well ahead of the game on this debacle. Matt Parker said:

When I first heard that cases were missing I thought “can’t be an Excel problem, the rows run out at 1,048,576 which is bigger than even total cases”. I never thought they’d have a case per COLUMN. Unbelievable. And yes: Excel columns end at 16,384 aka “XFD”.

My response to this is not safe for a family audience so I will paraphrase:

What f*^king muppet did this and how much were they paid?

Here's the problem I've been yammering on about since 1996. It comes down to two basic problems that are nothing to do with spreadsheets per se but the reality of what they are:

  • Spreadsheets are general purpose programming environments.
  • As such they need to be tested and documented.

That almost never happens. In this case, whoever said muppet designed the spreadsheets clearly didn't understand spreadsheet first principles. In all the years I've looked at spreadsheet failure - and yes, there used to be conferences on this topic organized by the major accounting firms - this is one for the ages. Confusing rows and columns is spreadsheet 101. If that is indeed what happened then...you just can't make this shit up. 

As recently as 2016, PWC reminded us of a 2008 report that said:

More than 90% of spreadsheets contain serious errors, while more than 90% of spreadsheet users are convinced that their models are error-free. As the spreadsheet is the favourite software tool of a great many accountants, this could be a recipe for disaster” – ACCA: Talking Technology

Much earlier, Prof Ray Panko of the Shilder College of Business undertook foundational work on this topic, documenting cases and circumstances where spreadsheet error could occur. Earlier still, I recall speaking with Dan Bricklin, the inventor of Visicalc the first commercially available spreadsheet on a microcomputer. He readily acknowledged that in the wrong hands, spreadsheets could be dangerous.

For my own part I've documented many cases, perhaps the most spectacular being a spreadsheet error by PG&E which saw said firm make a multi billion dollar bidding error it tried to roll back through the courts but without success. (Lesson - you screw up, you pay.) I remember the case of a large European brewery whose name starts with an H compounding a monthly interest calculation to the tune of millions per month, in error, following a change of personnel (and no documentation.) Then there was the case of the UK building society that passed on a mortgage book it saw as over valued by £135 million that a since defunct society willingly picked up. My interviewee felt they'd dodged a bullet. It turns out they were right. The mortgage book was almost 100% subprime and near worthless. If you want to see a rogues gallery of these issues, check out this from Oracle. I particularly like this:

Fannie Mae. While in the midst of changing its accounting system, the company’s finance team relied on spreadsheets to make some needed calculations required by a new accounting standard. One problem. The spreadsheets contained errors that skewed results by over $1.1 billion. The company discovered a $1.136 billion error in total shareholder equity due to “honest mistakes made in a spreadsheet used in the implementation of a new accounting standard," and it had to restate its 2003 third-quarter financials.

And just for fun and giggles - how about this?

London Olympics. When a staffer accidentally inserted “20,000” into a cell instead of “10,000, the London Olympic Committee ended up selling 10,000 tickets for non-existent seats at four minor heats of synchronized swimming. When the mistake was caught, the committee had to upgrade ticket holder to tickets for major events—at a loss.

There are so many more of these stories. The Oracle report dates from 2019 and yet, as we see, and despite the protestations of beancounters determined to keep a grip on spreadsheets, the crap keeps a'coming. 

In the COVID-19 case we're talking about a reporting point that doesn't have direct or obvious financial consequences - well maybe it does - in another spreadsheet. Instead, it points towards something far more visceral - life and death. Is this the spark that will finally get people off their addiction to the spreadsheet - or at least consign its creation to those who actually know WTAF they're doing?

In the meantime, I confidently expect the UK's self declared Chief Tech Wonk aka Matt Hancock to take what seems to be his role in life - a daily beating for lauding technology about which he knows Jack shit.

PS - there are oodles of better ways to do this sort of analysis...ping me if you'd like to know more - den [at]diginomica[dot]com