Confluent aims to make it easier to set data in motion across hybrid and multi-cloud

Profile picture for user ddpreez By Derek du Preez November 11, 2021 Audio mode
With the addition of cluster linking and reduced infrastructure mode for Confluent Health+, organizations should be able to better manage their ‘data in motion’ across all environments.


Image of a hand touching a cloud
(Image by Tumisu from Pixabay )

Confluent's primary ambition in the enterprise is to set ‘data in motion' - where its Apache Kafka platform connects real-time streaming data to drive insights and action. We've seen how the vendor is aiming to make this easier for buyers through its data governance features, but for those in hybrid or multi-cloud environments, some complexity remains. 

With its pitch to become the ‘central nervous system' of the enterprise, Confluent is recognizing that not all companies are yet fully in the cloud, or operate out of multiple cloud environments. This creates complexities when it comes to moving data and limits some of the benefits that Confluent clusters in a single cloud environment may enjoy. 

However, Confluent is aiming to make this easier for companies with some product updates this week and is seeking to bridge the gap between multiple environments, making ‘data in motion' a more integrated experience for buyers. 

With this in mind, Confluent has announced Cluster Linking on Confluent Platform 7.0, which can be used in any environment, or anywhere an enterprises' data and workloads reside. 

We sat down with Addison Huddy, Director of Product Management at Confluent, who explained that whilst replicating data between two environments isn't a new concept, making use of batch processes or using Kafka Connect to make that happen has its limits. Commenting on using something like Connect to do this, Huddy said: 

In its crudest form, it's a consume/produce loop. It consumes, produces, consumes, produces the data. But there's a lot of moving parts in there. There's all these things to handle - things like offset translation, things that deal with consistency and failure states, all this stuff. It's very complicated.

It's really difficult to keep consistency because actually you're pulling the data, you're copying it, and some metadata actually changes. There's a concept of offsets in Kafka. Those change and are not preserved. And offsets to me are the most important thing, they're what make Kafka, Kafka. 

It's how you rationalize the state of your system. So as soon as you lose that guarantee, it's really difficult to architect your whole system. With cluster linking, we simplified it. So there's nothing in the middle. I now don't have to worry about all those crazy failure states and consistency woes that have plagued the typical way of doing it. You run a few commands, that's all you have got to worry about, then you just go off and write your application.

And the direct benefit for customers should be clear: a system that seamlessly replicates data across all environments, where that be in one cloud, multiple clouds, or on premise. Huddy said: 

So just like Kafka unifies communication as a central nervous system in one data centre, it does the same thing between two data centres. It's one thing to deploy, one thing to secure, one thing to monitor, and it can now bridge all these different environments. So it can become that backbone, that central nervous system, not just in one data centre, but between two data centres.

Monitoring metadata

Another update from Confluent this week that ties directly into the ambitions behind cluster linking, is the introduction of Reduced Availability Mode for Confluent Control Center, which centrally manages and monitors key components across the Confluent Platform to increase visibility and reduce disruption and downtime. 

However, historically the tool required data for monitoring clusters to be stored locally. And as clusters scaled, those storage requirements soon became expensive and burdensome. 

Reduced Infrastructure Mode offloads monitoring to Confluent Health+, eliminating the need to store monitoring data on premise. Confluent claims that this can reduce a company's infrastructure monitoring costs by up to 70%. But equally, it once again bridges a gap between on-premise and cloud capabilities. As Huddy explains: 

One of the challenges that we had to solve, is that the metadata about all these clusters that we're running is sometimes larger than the data in the cluster, right? That's a typical distributed systems problem, right? Your metadata can be orders of magnitude larger than the data itself. So we've got all these metrics coming off of these. So we have a very, very robust, and very high scale, metrics pipeline, that you're feeding all these metrics in, and it's doing all these calculations, developing insights, alerts. And that's how we manage Confluent Cloud. So we want to give that same benefit to our customers that run in their data centers. 

And what you can do now is you can send us your metrics. We monitor it for you in our metrics pipeline. Because deploying a giant metrics pipeline in your on prem data space is a very difficult problem. So, leverage the cloud where you can. Then all of the actual metrics, the time series graphs which we are watching, the alerting, we take care of that and we feed it back to you. This is, I think, a very powerful trend, in that the whole system itself is becoming hybrid. It's your cloud and then an on prem deployment working in a very symbiotic relationship. 

And it really helps accelerate a customer's journey to the cloud. It's not going to happen overnight, this is a stepping stone for our customers that want to get there. Or that maybe want to run, but they've strategically chosen to keep an on prem presence and are moving on from there.