Confluent wants to unite the two sides of the enterprise data house - operations and analytics - with Tableflow

Derek du Preez Profile picture for user ddpreez March 20, 2024
Summary:
CEO Jay Kreps laid out his plans in London this week for how Confluent Cloud can now bring together operations and analytics estates into one platform, unifying data products for the enterprise.

An image of Jay Kreps, Confluent CEO, on stage
(Image sourced via Confluent)

Confluent CEO Jay Kreps took to the stage at Kafka Summit London this week to explain how the company’s cloud platform will now be able to unite the two sides of the enterprise data house - operations and analytics - with the introduction of Tableflow. The aim is to unite operational and analytical data in a way that creates usable data products for the entire enterprise, without worrying about data pollution or breaking things downstream when changes are introduced. 

Tableflow essentially transforms Apache Kafka topics and the associated schemas to Apache Iceberg tables, with just a single click, making it easier to supply data lakes and data warehouses. This historically hasn’t been a simple task and Confluent sees it as a key enabler for then being able to process data as a whole - using Confluent’s Flink system - which in turn, Kreps argued, is key to building a central data nervous system for organizations. During his keynote, Kreps said: 

In every company there's an operational state - all the applications that run the business - and there's an analytical estate - where we do our data analysis, data crunching behind the scenes, and the data warehouse. I want to talk about how these areas are built and how they come together. 

On the operational estate, the area that Confluent has primarily focused on, Kreps added: 

This is that big data mess…all the SaaS apps, microservices and databases. And all the hard wiring that knits it together. But there is an emerging de facto standard - Kafka - which is an open way of integrating all these things, bringing them together around data streams. 

Data streams is the abstraction that unifies the operational estate and Kafka is an open standard for data streaming. 

And on the analytical side of the house, there is a similar challenge. Kreps explained: 

What's happening over there? Well, it's a similar mess. There's a bunch of technologies, there's every type of data warehouse, data lake, new AI product, reporting layers, and vertical SaaS systems. 

How do they come together? Well, you can definitely feed these off-streams of data to Kafka - but the actual way that systems here are integrated isn't as much around streams, it's around shared tables. 

Ultimately, that's what you have in the Data Warehouse, there’s a bunch of tables that are populated with data. But today, in most companies, these tables are fragmented across a bunch of different technologies. 

Confluent believes that there is a better way of standardizing these tables. He pointed to the increasing trend of putting these tables in some kind of shared object storage - typically S3 - and making them available across all systems, so that the analytical estate can access the same datasets. 

And increasingly, Kreps argued, there is an open standard for this as well - Apache Iceberg. He said: 

This is an open source project and a technology that exists to unify the analytics world…to make it so that you can have shared tables of data, stored in S3 or other object storage. It maintains the typing and structure of the data. And makes that available to the full ecosystem of analytics tools, They can all access the same share of tables of data.

Tying it all together

So on the one side of the house, the operational side, you (in theory) could have Apache Kafka tying together all of your data through data streams. And then on the other side of the house, the analytical side, you could (in theory) have all of your data sitting in tables, pulled together by Apache Iceberg. 

But that still doesn’t solve the problem of uniting Apache Iceberg and Apache Kafka. As Kreps said: 

So in each of these worlds now, there's some way of sharing data across the systems that need it. But how do you actually connect them together? How do we connect the operational world and the analytical world? Well, this happens all the time today. The easiest way to do this is pump those streams of data from Kafka out into the data lake to fill up these Iceberg tables. And of course, this works. But there are some drawbacks. 

This is ultimately just kind of a surface integration. For each of these streams of data, we have to manually map it into some table. There's often some kind of hand maintained job for each one of these that tries to extract the right schema and match up column by column and field by field where it goes. 

Keeping all that running is a lot of work. Somebody changes something upstream…and everything downstream breaks. Or even worse it silently breaks and you don't know until some days later when all the data everywhere is polluted.

That’s not really the final form we want. Surely there's something better we could do. Well, what if we truly united these things? What if we really brought them together? What if we actually made this more like one system that was unified? 

This is what Confluent Cloud is hoping to achieve, with the introduction of Tableflow, which turns topics and schemas into Iceberg tables in one click to feed any data warehouse, data lake, or analytics engin for real-time or batch processing use cases. Tableflow works together with the existing capabilities of Confluent’s data streaming platform, including Stream Governance features and stream processing with Apache Flink, to unify the operational and analytical landscape.

Using Tableflow, customers can, according to Confluent:

  • Make Kafka topics available as Iceberg tables in a single click, along with any associated schemas

  • Ensure fresh, up-to-date Iceberg tables are continuously updated with the latest streaming data from your enterprise and source systems

  • Deliver high-quality data products by harnessing the power of the data streaming platform with Stream Governance and serverless Flink to clean, process, or enrich data in-stream so that only high-quality data products land in your data lake

Commenting on the impact of this, Kreps said:

That allows us to do something really big. It allows us to actually unite these two worlds, in a really natural, native way. The operational estate and the analytical estate can all be united around these universal data products. 

Data products that are available across the entire company, that are available in the native form, from both sides of the house. The operational applications use real time streams that populate the data systems in that world, that trigger actions in the microservices and applications there, and in shared tables out in the lake or warehouse. So this is a really big deal.

And the key focus is usability and ease of use. Kreps told media this week that whilst every organization aims for real-time data use, the reality of this for enterprises has been too difficult to maintain. With Tableflow, combined with Flink, Confluent believes that organizations now have a data layer - a platform - that builds data products that are usable across the enterprise, without tonnes of maintenance, or where things constantly break. Kreps said: 

For our customers, everybody wants data to be fresh and real time and in-sync. Everybody wants to build in this real time way. But it's really just a question of: is that easy to do? And so as there are platforms [Confluent Cloud] that bring those capabilities, it kind of becomes a no brainer to build that way.

We have an opportunity to really be the kind of central nervous system that all the real time data of the business flows through.

My take

Watching Kreps and Confluent over the past few years, it has been clear that this is a vendor that has slowly, methodically been pulling together the pieces of a jigsaw puzzle that enables a streaming data platform for the whole enterprise. Tableflow, alongside Flink, goes a long way to making the ‘data streaming central nervous system for the enterprise’ conversation a reality. 

Now, I do think as Confluent has a fuller platform to go to market with, it needs to perhaps start engaging more upstream with business users and explaining the business impact on how these use cases can make a difference. I heard a number of customer stories this week that are interesting - it would be helpful to showcase what an enterprise-wide data streaming platform means for the overall business’ strategy. However, that’s just a case of time. It couldn’t do that before it had a few pieces in place - and it is in a much stronger position to do so today than it was 12 months ago. 

Kreps is passionate about building this platform and creating a new breed of data streaming enabled businesses - and the strategy is cogent and compelling. We look forward to hearing from customers using Tableflow at the vendor’s Current event in Austin, Texas later this year.

Loading
A grey colored placeholder image