How we access and analyze data has been undergoing a revolution in recent years, as we become more digitally connected. The end result may be a form of Business Intelligence (BI) that actually works for business users, after more than 30 years of false starts. This is enabled by a convergence of technologies that have only become available in the past few years, according to GoodData CEO Roman Stanek. I caught up with him recently to get his take on this emerging trend in data analytics, why big data is shrinking, and why business people shouldn't build their own BI.
The big change enabled by today's connected technologies is that it's no longer necessary to create a copy of the underlying data before doing the data analysis. Prior generations of BI have been limited by technology that wasn't powerful enough to support multiple concurrent users, and therefore the transformations and analysis had to be done on copies of the data. Often these copies went through several iterations before ending up in front of the business user. As Stanek explains, this led to low confidence in the accuracy of the results:
By the time you get the data as an end user, it's been copied six times and been transformed six times, so you have zero trust in it — because you have no idea who made the copy, who made the transformation, who left something out and so on ...
The biggest problem that we had was always the very low trust in data, because people didn't see the original data, no one was able to actually find some lineage, find who massaged it and so on.
Recent cloud advances have completely changed this scenario. He continues:
Now we don't have a problem with low concurrency. You have another user, you add another CPU from Amazon ... This dream of one copy of data, everyone sees the data in real time, and so on, it's actually getting more and more real. But it was not possible only five years ago, because of the limitation of hardware concurrency.
New headless BI architecture
Taking advantage of this new landscape, there have been big changes at the data analytics vendor since diginomica last spoke to Stanek — who I first met twenty years ago, in his guise as founder of Systinet, a startup that specialized in enterprise governance of Service Oriented Architecture (SOA). After more than twelve years as a cloud-based provider of embedded BI, last year GoodData went 'headless', releasing an API-first BI platform made up of composable, cloud-native microservices, along with a UI toolkit for creating no-code modular blocks that users can assemble into their own BI dashboards.
This new BI architecture consists of four layers. There's a messaging system, such as event stream platform Kafka, that collects and combines the data. It's then stored in a business repository, such as cloud-based data warehouse Snowflake. The transformation and analytics are then done on-the-fly by a service such as GoodData, which presents its results headlessly via APIs. Finally, the results are shown to users in a presentation layer, which can be a dashboard, a mobile app, or embedded in an application or a workflow. This means that, instead of operating the whole stack, GoodData can focus on just the data analytics. Stanek says:
We leave hosting to AWS and database to Snowflake and most of the basic technologies to open source. So it's a very different world than it was five years ago, and maybe 10 years ago, when we had to own it all, because there was nothing really there.
Every participant in this interdependent ecosystem has to play its part, because if one component fails, the whole edifice stops working. Yet at the same time, there's less visibility into what demands may come up. So it's important for the provider to be ready to perform reliably in various scenarios. Stanek sums up:
You cannot predict the load, you cannot predict who's going to be on. You need to be always-on, it always needs to be highly available, highly secure, and so on, because people will be picking and choosing and combining solutions.
Building blocks of composable analytics
The outcome is a much more flexible form of BI — one that delivers on the long-awaited promise of SOA — in which business users can assemble ready-made building blocks to deliver the analytics they need to make decisions. Stanek says:
I actually think that headless is another manifestation of SOA. Gartner calls it composable analytics, some people call it low-code/no-code, we call it headless — it's all the same.
It's all about giving the end user the freedom for business innovation, so they can actually take and combine these Lego blocks. But at the same time, these Lego blocks have to be highly available, they need to be flexible, they need to be secure, they need to be authenticated.
The key to making this work is to keep the transformation and analytics separate from the underlying data. Instead of copying the data and then manipulating the copy, this new approach applies the transformation each time a request is made through the API. Stanek says:
It's a completely opposite mindset. Instead of me pulling the data, I actually call some APIs. It's very new in the whole data space ...
We're redefining data, because now we are looking at like, how can we do the in-place transformations? How can we get data in real time to as many people as possible? How can we actually stop this copying? How can we provide this API so that people can combine it? That's the composability, that's the headless BI.
Since the API is a shared service, there needs to be careful tracking of different versions. The principle is similar to how different versions of software code are tracked in repository services such as GitHub. Stanek explains:
We have to bring in some versioning system and lifecycle management. People have dealt with it in coding — we know how to use GitHub and check-out and check-in code and so on.
With data, and especially metadata, it's still new. Our vision is that there will be one copy of the data only, and everything else will be done through API manipulation.
But the biggest shift is actually we will not need to move the data from one system into another.
A single version of the data
While streaming data is an important part of the architecture, the focus is on having a single version of the data, rather than on having data that's right up-to-date. Stanek believes the use cases for analysis of real-time data in decision-making are limited. He explains:
If I'm looking for my results for the last 12 months, I don't need it a minute old. I actually think that if I look at my data for 12 months old, and it has been six times transformed and combined and transformed again. I look at it and I say, 'Well, I don't recognize this. This doesn't look real, someone made a mistake in some transformations.'
I actually think the use cases for real time are still somehow limited. People still don't make decisions in real time, it's still kind of limited use, it's more for observability. Because when you make a decision, you want to see how it was it a month ago or a year ago.
At the same time, there's no need to dig back into huge volumes of historic data. Events of the past two years have demonstrated the limitations of big data, he believes.
Now, you look at any data set older than two years, and it's like, 'Oh, it's pre-COVID, we cannot look at it this. This was a different business, this was a different industry.' So I actually think that data is now having a shelf life of two years or something like that.
This new approach provides an alternative to data visualization tools such as Tableau, PowerBI and Domo that have been adopted in recent years by power users. Stanek says it shouldn't be up to the head of sales to build their own mobile data app. Nor should BI go back to being something that can only be created by specialists with a PhD in data science. Instead, he argues for a more collaborative approach in which developers, data scientists, business analysts and visual designers work together on a set of composable building blocks that are easy to embed in applications for business users. He cites rideshare app Uber as an example of what this type of BI looks like — it pulls together all the data about vehicle locations and journey times and presents the result visually. This is how BI should be presented, he argues:
It still tells you all the information you need to know, on the map in real time with all the metrics, with all the visuals and so on. It's still BI, but it's actually embedded in your application.
Stanek's vision for BI is in line with my own view of how IT is evolving in a microservices-based, API-first world. I call this a Tierless Architecture, because rather than having data mediated by applications before it can be accessed, an API-first approach means that it can be accessed directly through the same platform that hosts functionality. Then it's up to developers — or the user if it's a no-code platform — to blend the data and functionality they need. A headless approach, in which the data and functions are all APIs that can called by any front-end, means that the results can be presented either standalone or embedded in some other app, such as a sales dashboard or via a chatbot in a messaging app.
Alongside this new architecture, application development becomes much more of a collaborative effort between IT and business users, along the lines that Stanek describes. There's a lot of talk about low-code and no-code development, but, as Neptune Software's Matthias Steiner wrote just a few days ago, this shouldn't be seen as shutting out professional developers. Instead, the pro-coders and low- and no-coders should collaborate — I prefer to call this co-coding — with IT providing the basic building blocks, governance and tooling that business users can then work with to create prototypes and ad hoc solutions. IT can then continue to work with them to refine these initial solutions for routine use, and stand ready to assist when modificiations are needed to adapt to new circumstances.
All of these changes are combining to reshape Business Intelligence to provide the real-time information, rapid decision-making and constant adaptability that is characteristic of Frictionless Enterprise. This is a case study in how applications must adapt to the new digitally connected world.