Kafka Summit - Confluent CEO Jay Kreps gives the lowdown on providing high-quality data streams
- Summary:
-
A careful blend of functionalities is the best way for businesses to ensure they can unlock the value of data they hold.
Big enterprises are now alert to the importance of data streaming but they still need to ensure they have the right strategies and tools in place to keep the quality of information high.
Speaking with diginomica at Kafka Summit London, Confluent co-founder and CEO Jay Kreps said executives at an increasing number of organizations understand the importance of getting to grips with real-time data streams (or ‘data-in-motion’), although there’s still some work to do, especially when it comes to the finer details of the process:
It’s often the case that our customers find themselves doing this work at scale, and then they're just starting to think about how to do it right and how to take advantage of the technology across the organisation. I think that's probably true of most technologies. But that's the role that we play – to steer towards the outcome.
Kreps says he’s already seeing a transition. In the early days of the company, many of the challenges that Confluent’s customers faced were focused on infrastructure and processing transactions quickly and at scale. Now they’ve seen at first-hand how Confluent can help nail some of those issues, customers are focusing on other data-related concerns, particularly in relation to cultural change and ongoing modifications to streams and processes:
Governance is an area that I think becomes very important as these platforms get to scale in organisations. If I'm producing data out to a number of other applications, how am I allowed to change that as my environment evolves? That’s a critical question to get right. And if you get it right, then the parts of the organisation can actually move pretty quickly and independently, and if you get it wrong, then you can create a mess.
Ensuring data quality is high
Kreps recognizes that governance itself is a complicated area due to the ever-increasing volumes of data that businesses collect and the range of technologies that produce and consume information, whether that’s the Internet of Things or Artificial Intelligence systems:
A lot of the concerns around governance tend to happen when data moves from system to system or environment to environment or geography to geography. So, it’s very much related to the streaming problem. We've tried to provide facilities, so we can help support a larger strategy.
At the Summit, Confluence announced new Data Quality Rules, which allow organisations to resolve data-quality issues and to deliver high-quality data streams, using customisable rules that ensure data integrity and compatibility. It was revealed at the event that almost three-quarters (72%) of IT leaders believe inconsistent integration methods are a major hurdle to effective data streaming, according to Confluent’s new 2023 Data Streaming Report.
Kreps says Confluent’s focus on governance is driven by discussions with customers, who are telling him the functionality they need:
When we talk to customers, we hear all this stuff. And you start to listen, and you're like, ‘OK, governance is a big part of the challenge.’ It’s an area we put a lot of work into, but I think there's a lot of work left to do. Ultimately, any of these problems that touch on people in software-engineering practices are kind of gnarly. But I think it's actually pretty cool what we've enabled so far.”
Governance isn’t the only area that’s important. Confluent announced a range of other new functionalities at the event and Kreps said he wants to build an ecosystem of joined-up services. This functionality includes an early-access program for managed Apache Flink, which is an open-source stream-processing and batch-processing framework. Kreps says the aim across all these updates and releases is to create an integrated experience for customers:
There is a bit of an ecosystem developing and I think that's been true in stream processing for a while. I think that's inherently maybe true in any emerging technology space, but I think things converge over time as technologies get adopted. And we do see that happening with Flink, which is increasingly the de facto standard in the streaming space.
Unlocking the value of data
More generally, Kreps recognizes that moving on-demand isn’t always straightforward. Going to the cloud is still a big step for many enterprises, such as when they adopt Confluent Cloud, which is a fully managed streaming data service based on Apache Kafka. Kreps says Confluent wants to help companies that are going all-in on the cloud:
When we're approaching the cloud, we do think it makes sense for us to take a no-half-steps approach. You want a real cloud service – if you're going to do it, you want something that's done for real.
Kreps says systems integrators can help and Confluent also puts a lot of effort into the lessons it learns from the customers it works with. What’s important to recognise is that going with the Confluent Cloud doesn’t have to mean a clean break from the past.
I think that we have a real advantage in our space in that we're not telling people to just delete everything that they've built and rebuild it with us. I do think by and large it's about connecting the stuff you have and connecting into the older mainframes in the older environments, connecting into the older applications and relational databases, and then bridging into the cloud and some of the newer environments.
When it comes to the future, Kreps says there’s a lot of buzzwords that dominate the technology industry, such as data mash and data contracts. However, he believes these constructs can be helpful, especially in an organisation that has lots of teams with different types of data. He says Confluent is aiming to create a position for its customers where it's easy to get access to the real-time stream of anything happening in the business:
That stream is trustworthy, it's well governed, you have access to the stuff you're supposed to, and you can trace it back to where it came from. A lot of organizations, as you pull the thread, realize that the data strategy is at the root of a lot of their big challenges, whether that's modernisation or customer experience. And if you can get that strategy right, I think that's a big unlock.