I originally wrote a version of this article in 2006. With the pace of digital technology in the ensuing seventeen years, one would assume that the issues and suggestions would be hopelessly dated, but the French writer Jean-Baptiste Alphonse Karr wisely observed in 1849, “Plus ça change, plus c'est la même chose" – the more things change, the more they stay the same.
Re-reading this piece, I was surprised how little things have changed in what we now refer to as analytics. Today, while the focus of the enterprise is wrapped around the axle of digital transformation, data mesh, cloud, lakehouses, AI and chatGPT, the work that people do in organizations to understand events, plan and evaluate is more vital than ever. Just before I wrote this article, Tom Davenport published a related article, Competing on Analytics. He subsequently a book of the same title, popularizing the use of the term "analytics."
Rather than repeating the entire article, I’ve chosen excerpts to which I added current comments.
Shadow IT, those people performing IT functions but not part of the mainstream IT organization is as much as 78% of the size of the total official IT staff [as of 2004, CIO.com]. The existence of Shadow IT implies a failure on the part of IT to provide all of the services to meet their client's needs, and the problem is universal. This is especially acute in Business Intelligence (BI), the largest segment of Shadow IT. Most knowledge workers perform their modeling, reporting, number crunching and sharing with spreadsheets and personal databases, often despite IT-sponsored BI efforts.
Shadow IT doesn't usually attempt to do the core IT processes, like networking or security, or even the core applications, like ERP or CRM. For the most part, Shadow IT groups fill in the blanks left by IT, such as reporting, specialized modeling, and data capture from external sources. Shadow IT also covers analytics, especially budgets, planning and other aspects of what has come to be called Business Performance Management (BPM) and BI. These areas have been historically neglected by IT and enterprise systems software vendors, who focus more on good architecture and scalability, often to the exclusion of functionality at the knowledge worker level. The evidence of this disconnect is the unabated proliferation of spreadsheets despite considerable investments in enterprise systems. The same result can be seen with data warehousing, where BI tools have had little impact on spreadsheet use.
This is a key point: data warehousing and BI never replaced spreadsheets for entering data not found elsewhere and constructing models and metrics. BI tools, which relied on data warehouses, were confined to reporting and dashboards with only limited capability to collect data and “write-back” data, derived values, scenarios and models. Spreadsheets filled in the blanks, and to a large extent, still do.
A coherent architecture is a desirable trait to technologists. Often, without articulating why, users will shun these enterprise solutions and good architecture in favor of the ability to get their work done. Still, knowledge workers are interested in tools that work and can perform for them. Queries per day, metadata and APIs are all peripheral issues to knowledge workers. It is important to them to save time and do so in an environment they can easily navigate. The requirements of time and simplicity translate into this: does the application have a graphic interface that is simple and consistent across all of the functions, and does it have the smarts and the horsepower to solve complex problems end-to-end?
It is clear that what is best for the organization long-term and what works for the people daily are very different. Employing a spreadsheet, namely Microsoft' Excel, is the convenient answer to that need. Excel resides on most users'' devices and is the natural environment for filling the gap known as Shadow IT.
Excel use is declining slowly, but is still widely used. An article published in the Enterprise Times reported that about 60 percent of businesses in the U.S. are still relying on Excel spreadsheets, while 21 percent are moving toward other spreadsheet software solutions such as:
- Zoho Sheet.
- WPS Spreadsheets.
- LibreOffice Calc.
- Google Workspace.
- PDF Tables.
- Apache OpenOffice Calc.
There are over 150 million business users of Excel worldwide.
It remain impossible to say how many because of various licensing and bundling schemes and significantly pirating.
Many Excel users are devoted, at least part of the time, to entering data by hand, extracting data manually from other systems and functioning as report servers. Despite intensive efforts to provide centralized BI solutions, the uptake of BI as a replacement for manual methods like spreadsheets has been insignificant. Today's BI tools are designed to operate as part of a data warehouse strategy and have inherited data warehousing's read-only approach. They are suitable for creating reports and analyses of historical data, but they are weak in modeling, data entry, robust calculations, and iterative and incremental calculations. Furthermore, they cannot handle real-time or right-time data. Spreadsheets pick up the slack in all these areas, but the solutions are often not very elegant.
Part of the problem is that BI is often performed in a perfunctory manner as part of data warehousing initiatives, focusing more on the data management and database aspects and giving only minimal thought to analytical work processes. From a user's point of view, there is little incentive to disassemble years of application development in personal tools only to migrate to a new environment that offers less than 100% of the replaced functionality. Translation: spreadsheets work for people even if they do not comply with standards, best practices or the mandated approach. In addition, IT organizations have been focused on data management from an enterprise perspective. They have not effectively understood the workflow and informational needs of knowledge workers.
This last part is changing, at least from the POV of industry analysts, vendors and media, though it is questionable how much IT is up to speed.
So why is this a problem? One reason is that collaboration, reuse and uniformity are challenging to implement because the tools are designed to be used differently. A broad range of skills and proficiency is possible with spreadsheets, meaning power users can create structures that others cannot maintain. The relatively unskilled users can create devastating errors that are difficult to trace. There is no concept of 'version control' or any other feature of a professional software engineering environment, even though the tools are being applied as development tools. This leads not only to potential error but, most often, long set-up times as new versions or scenarios of the models are introduced, such as copying and renaming files, rolling time periods and modifying reference information, such as hierarchies.
Progress is evident, though. There are software tools, including within the spreadsheet software itself to address some of these issues, but for functional applications that are complex, it is still a chore to keep things running smoothly. As recently a last year, I learned of an actuarial department with a massive Excel application that was published once per month, but required four weeks work to roll it over and proof it for another month.
Maintaining information and especially reference data (look-ups, hierarchies, product and customer masters) in tools that are, for example, positionally organized (spreadsheet notation AA221D$21) rather than referentially organized (WHERE date is 11/04/2004 and CUSTOMER_NAME is 'Yahoo') is devilishly difficult. This problem is complicated because people tend to be more cognizant of the cost and risk of development than ongoing maintenance. The continuing effort to support a complex system built with components never designed to operate collaboratively is significant.
Efforts in the enterprise to devise a more comprehensive data strategy for analytics have alleviated some of this effort.
All these factors affect the cost of Shadow IT in organizations, a problem measured in tens of millions of dollars or more.
Qualities of spreadsheets that make them indispensable include: expressiveness, inexpensive, universal, autonomous, fast and portable. However, each of these qualities carries with it some serious drawbacks:
Spreadsheets offer the broadest range of features and manipulations for defining calculations and models of any tool ever invented. In a single day, any literate person can learn how to create a spreadsheet of rows and columns, enter numbers, define calculations and follow wizards to format the output. The range of models possible in a spreadsheet is virtually limitless. Drawback: It is precisely because of their expressive qualities that spreadsheets are so often applied to problems beyond their scope, especially those that require systematic properties beyond the models themselves, such as security, integrity, version control and workflow, to name a few.
An entire industry of enhancements and extensions to baseline spreadsheet functionality emerged since I wrote the above. Tight integration between Excel and PowerBI provides visualization and presentation capabilities that are industry leading. Third party software add-ins in like PowerOn (an Insight Software company) fill many of the gaps of collecting, modeling, planning and forecasting.
Inexpensive start-up cost: Depending on how you look at it, the start-up cost of a spreadsheet is zero unless you include the developer's time, which may not seem significant until you evaluate it in total. Drawback: The problem is that these spreadsheet applications' long-term maintenance and support costs are pretty high. Forcing what is essentially a personal productivity tool into the role of departmental or even inter-departmental modeling, analysis and reporting tool causes the complexity of the application to rise rapidly. The cost of using spreadsheets is very low when the complexity is low but increases exponentially when the complexity increases.
Universally used: Spreadsheets have a very shallow learning curve, meaning it is easy to get started and use them for something of value. This is the reason they are so ubiquitous. Drawback: Because the functionality of spreadsheets is so vast and enhanced by third-party add-ins to expand the functionality, there is a great variety in the level of skill and technique in how they are used, making it extremely difficult to maintain standards.
Autonomy: the desire to operate apart from the organizational systems, free from restrictions and limited flexibility, is powerful. A spreadsheet provides considerable functionality for creating and distributing models, reports, data and commentary. Despite best intentions, there are always things that need to be done that either can't be done in the official workflow or are too time-consuming. Drawback: Because of its autonomous nature, any spreadsheet application, report, data or extract must be viewed suspiciously, as it exists beyond the reach of the operational security, standards and controls. Adding these controls is a complicated, time-consuming and imperfect process. In addition, because of the autonomous state, data extractions into spreadsheets are often hard to maintain, slow and unreliable.
As mentioned previously, much of the effort to move data into a spreadsheet has been relocated to departmental or even enterprise implementations of data management and governance. Most BI vendors now incorporate some level of assistance too.
Fast: there is no faster way to build an analytical model and perform calculations than with a dedicated spreadsheet. Drawback: The speed of creating spreadsheet applications has to be evaluated against the time it takes to prep and set up the models to run. Copying and renaming files, performing searches and global replaces on names in macros, and manually re-keying data take considerable time.
If anything, the problem is more acute now with the volume and variety of data available, but that problem is evident in every analytical system.
Portable: it can operate even when disconnected from the network. Drawback: The issues listed above in the section on Autonomy apply here. In addition, it is even more likely to involve duplication of work, handling of files and directories, out-of-date information and manual data entry.
“Disconnected from the network” is probably an absurd concern today.
Spreadsheet software was designed to be easy to use and expressive, allowing anyone to enter text, numbers and formulas. A drawback to this all-in-one architecture is that it is virtually impossible to expose, for example, just the model logic. Auditing a large, complex spreadsheet application, especially one of linked spreadsheets, is a manual, tedious process and a powerful motivator for performing poorly. Reports and analyses from a spreadsheet can only be validated with great effort; the validation has to be endlessly repeated because there is no version control.
The implications of spreadsheets in Shadow IT
Wasted time: Shadow IT adds a substantial hidden cost to organizations, composed mainly of non-IT workers in finance, marketing, human resources or sales administration, who spend a significant amount of time wrangling data. With the effort of manual keying, building and maintaining spreadsheets and personal databases, and filing, searching and disseminating data, these professionals may spend only a fraction of their day doing their 'actual' job. The rest is consumed by data activities that the data warehouse and BI should have largely eliminated.
Inconsistent business logic: Because each spreadsheet application encapsulates its own definitions and calculations, inconsistencies will likely arise from accumulating minor differences from one version to another and from one group to another, as spreadsheets are often copied and modified. In addition, large and small errors that occur from either lack of understanding of the concepts or incorrect use of the spreadsheet frequently go undetected due to a lack of rigorous testing and version control.
Inconsistent approach: Even when the definitions and formulas are correct, the methodology for doing analysis can be distorted by the arrangement and flow of linked spreadsheets, or the process itself can be wrong. For example, one department may feed assumptions into one set of variables while holding others constant, while another does the opposite. The results will differ even though the model was correct. This is often an inadvertent error but one of the worst because it appears correct when it is not.
Wasted investment: Shadow IT applications drain ROI from investments in systems designed to perform the functions Shadow IT assumes. This is most prevalent in reporting and analysis applications, such as data warehousing and BI, where projects are initiated with good intentions. Still, the propagation of BI in the organization never gains critical mass.
When we were implementing BI and data warehouses at the time of this article, one question universally asked by clients was, “Where is the ‘export to Excel button?”
Inefficiencies: Shadow IT can hinder innovation by blocking the establishment of more efficient work processes. The reluctance of spreadsheet users to quiesce their applications to make way for an IT-led initiative is well-documented. In the long run, it serves to postpone or avoid productivity gains.
That’s why it’s called Shadow IT.
The barrier to Enhancement: spreadsheet proliferation can be a brake on adopting new technology. Because spreadsheets are deployed to fill critical needs, they must be replaced carefully. But needing adequate documentation, controls and standards, that process is slow and error-prone. Initiatives with well-defined ROI are often held up because downstream spreadsheet operations may be adversely affected. Only a painstaking manual process of sorting through the spreadsheets and cataloging their flow and processing can ensure that the new technology can assume their function or that their operation will not be affected.
And that’s why IT still refers to everyone else as “the business.”
Enhancing the value of spreadsheets
Those aspects of spreadsheets that make them indispensable to users and the organization can be applied to solving problems systematically and consistently. There is a solution to the problem, which involves embracing spreadsheet use but adding an element of control. What needs to be improved is the application of a controlling environment that seamlessly provides all of the other services that spreadsheets require.
That was wishful thinking. The better approach is the embedding of spreadsheet functionality in a managing and controlling application, thereby preserving the unique and useable functions but relieving the organization and spreadsheet developers of the headaches.
A promising solution is the application of capabilities such as persistent connections to data, security, logging and centralized management of query logic. In addition, models, calculation rules, naming conventions, report templates and many other reusable objects can be managed at a governed level, ensuring adherence to standards while reducing the workload for IT and business users.
The benefits of spreadsheets are undeniable. However, the drawbacks can be significant. Optimizing the situation requires retaining as many benefits as possible while reducing or eliminating the drawbacks. On the plus side, spreadsheets provide superior capabilities as a user interface, a distributed calculation engine, read/write data access, personal exploration and declarative model building. They are often applied in areas where they are weak, however, such as application development, collaboration, report engines, databases and information integration and transport.
By separating the valuable qualities and replacing and upgrading the weak ones, a hybrid architecture can give rise to a whole new set of hybrid applications, such as the examples below:
Shadow IT is a barometer of how enterprise applications serve the organization's needs. The broad and deep extent of Shadow IT clearly shows that enterprise computing only partially meets all requirements. It is open to discussion whether or not these gaps should be addressed by more enterprise system functionality or by adapting the existing workarounds, which are devised primarily in spreadsheets and other personal productivity software. We firmly believe that logical partitioning of spreadsheet functionality, retaining its strengths and migrating weaker services to needed governance is the best decision and one that will succeed in the marketplace because of its perfect fit.
Once again, plus ça change, plus c'est la même chose
Shadow IT is a hidden productivity killer. It creates the false illusion that IT expenses are low while penalizing the Shadow IT departments with higher budgets and masking the extent of IT work they are forced to do. Shadow IT is inherently inefficient and brittle and poses a real threat to an organization's agility. Until now, deploying services separately across a network was not feasible, but today's technology and practices make the decision practical and imperative.
One final thought, though. Consider spreadsheet skills as an extension of a knowledge worker’s thinking and creativity, rather than a troublesome enterprise software issue,