I'm feeling sentimental about OLAP; you should too

Profile picture for user Neil Raden By Neil Raden May 7, 2019
Summary:
OLAP was big once; it still could be.
Nostalgia ain't what it used to be

In the fog of the Big Data era, OLAP (On-line Analytical Processing) gets little mention from the press or the industry analysts. The reality is, OLAP is quite vital today for a simple reason: most people in business still think about their business in OLAP terms: Hierarchies and aggregations, drill-up, -down and-across data, pivoting dimensions to see this thing by that thing filtered this way.

But deploying and using OLAP has changed dramatically in the past few years. The key word here “navigation.” OLAP gave people the ability to interact visually with the data and explore it using the multidimensional models of attributes, metrics, filters and calculations, at minimum. Without code.

To understand how OLAP is important, it helps to understood its development over the past few decades. OLAP had lots of drawbacks, but many of them have been resolved by the advance of computing resources. A mid 90’s OLAP implementation may have been running in 1/100,000th the resources routinely available now. 

How did we get here?

From the middle of the 20th century until nearly its end, computers in business were mostly consumed with the process of capturing operational transactions for audit and regulatory purposes. Reporting for decision-making was repetitive and inactive. Some interactivity with the computer began to emerge in the eighties, but it was applied mostly to data input forms. By the end of the century, mostly as a result of the push from personal computers, tools for interacting with data, such as Decision Support Systems, reporting tools and Business Intelligence allowed business analysts to finally use the computing power for their analytical, as opposed to operational purposes. 

Nevertheless, these tools were under constant stress because of the cost and scarcity of computing power. The repository of the data, mostly data warehouses, dwarfed the size of the operational systems that fed them. As BI software providers pressed for “pervasive BI,” so that a much broader group of people in the organization would actively use the tools (and the vendors would sell more licenses of course), the movement met resistance from three areas

  1. Physical resources (CPU, RAM, Disk).
  2. IT concerns that a much broader user community would wreak havoc with the established security and control.
  3. People themselves who, beyond the existing users, showed little interest in “self-service” so long as there were others willing to do it for them.

Spreadsheets broke the ice. Once people saw they could do much more with a spreadsheet than their limited reporting tools, a new kind of analytical work quickly evolved. It became apparent before long, however, that spreadsheets had some glaring deficiencies. The 1980’s saw the emergence of some powerful tools that provided functions lacking in spreadsheets such as connectors to databases, hierarchical/multi-dimensional modeling, “what-if” analysis and scenario modeling and even statistical libraries. 

The term OLAP was coined in 1993 under controversial circumstances, because it was defined to fit the characteristics of just one product, but the name stuck. In point of fact, the first real OLAP product was Express, developed by Management Decision Sciences at MIT in the 70’s, sold to it IRI and it endured for decades until Oracle finally retired it not too many years ago. 

What OLAP really means

As noted above, OLAP is an acronym for On-line Analytical Processing, a phrase that has little relevance today. In 1993, On-line meant interactive usually with a mainframe. Today, of course, everything is “on line” so it’s hardly descriptive. Processing? Who actually thinks about “processing” as a feature of BI anymore. Analytical is the only part of OLAP that tells some of the story, but the rest is missing. At its essence, OLAP features are:

  • Multi-dimensional model at its core: dimensions, hierarchies and attributes.
  • Drill: at least to drill into detail and drill back up if not across dimensions.
  • Cross-dimensional calculations.
  • Filters.
  • Collapsible browsing.
  • Flexible definitions of time.
  • Sparse matrix.
  • Read &WRITE, though many OLAP tools cannot do the later.
  • Query generation, no coding.
  • FAST.

You may have heard the OLAP models referred to as “cubes.” This is actually a misnomer, for two reasons.

Firstly, a cube is a three dimensional object. An OLAP model is multidimensional, meaning it can be any number of dimensions (though more 10-20 can become cumbersome and there are not many real-life business models that require more).

Secondly, multi-dimensional models are not symmetrical. Some dimensions can be very large and some very small. Another thing that differentiates OLAP from other forms of BI is dynamic calculation. Suppose you manipulate the model I such a way that certain values would have to be calculated (sums are a good example, especially in filtered queries). Many OLAP tools allow for very complex calculation rules to derives values that do not exist in the source data.

That’s really where OLAP begins. Piled on top are a multitude of features and each vendor’s version is a little different. In some cases, OLAP is part of a broader BI package and in others, more common BI feature are grafted to a core OLAP engine. Perhaps you’ve heard the terms ROLAP, MOLAP, HOLAP and even VOLAP?

OLAP was delivered in quite a few versions, but the two major differentiators were:

MOLAP: Multi-dimensional OLAP, actually persistent data structures and access protocols, mostly on disk, but a few hybrid arrangements where the “cubes” were both physical and virtual or transient in memory. The drawbacks of MOLAP are a lack of agility as the cubes, real or virtual, must be defined in advance. For persistent cubes, a lingering issue of data explosion was always a concern. A MOLAP cube of size X could expand to 1000X with fully computed values, aggregations, etc.

ROLAP: Relational OLAP which relied on SQL to retrieve data from relational databases. Other than cached some dimensions and pick lists in memory, an analyst would use a graphical interface to create a query based on a mapping of source data (operational database, data warehouse, etc.) through a lens of a semantic model that was familiar and consistent. SQL would be generated to satisfy the request dynamically, in some tools, with almost amazing ingenuity. It provided the ability to interactively query vast data sources with limited resources. The drawback was speed. MOLAP software was blindingly fast, ROLAP could be quite slow, though ingenious workarounds were developed. 

Which OLAP for today?

Today, the clear winner is ROLAP only, the “R” is extended well beyond relational databases to Hadoop (Hive, Impala, SparkSQL, etc.). Anything that can be accessed with SQL (or other methods) can be the feeder system for the OLAP models which can in turn be accesses by any of the dozens of tools on the market that provide BI, including OLAP, services. This includes legacy BI tools, visualization tools, reporting tools among many others. 

The whole point is to avoid all of the problems with scaling and consistency of persistent OLAP cubes by creating virtual objects that can be queried natively by third-party tools. In the past, modifying OLAP models was tedious as the data had to be shuffled or even reloaded to fit even minor changes. With virtual cubes, the changes are instantaneous and are ready for analysis in real-time. 

My take

One final thought about the need for OLAP: all of the data science in the world can’t match the capability of OLAP to explain the status of an organization, because at the end of the day, someone still has to count the beans.

In a way, the concept of OLAP is still the same, but technology has made it possible to deploy and use it more efficiently and more effectively. The key to this is sematic modeling that makes it all possible. 

 The “Father of the Relational Model,” E.F. Codd published a paper detailing the twelve rules of OLAP (a term not heard before) only to be exposed as having been written by the founders of Arbor Software who paid Codd to publish it. The twelve “rules” were designed to fit Arbor’s product, Essbase, precisely. It nearly destroyed Codd’s reputation.