Data-streaming specialist Confluent has announced a slew of new capabilities that give customers the surety of knowing data is trustworthy and can be shared and processed securely.
In a keynote speech at Kafka Summit London in London, Confluent co-founder CEO Jay Kreps outlined a range of new features – including Data Quality Rules, new Customer Connectors and Stream Sharing functionalities – and explained how his company, which is a commercial provider of the open-source Apache Kafka movement, continues to develop its Confluent Cloud platform.
Kreps said exponential growth in the Kafka community has been driven by key technology trends, such as increasing use of data and the cloud, and the recent shift towards AI and automation. He suggests this shift in technology trends is occurring alongside a change in the way people think about software:
I think increasingly, if you zoom out a little bit and you think about all the parts of software in a company, they're starting to become connected, and we’re moving from software as a single-celled organism to something multi-cellular, something where it's a sophisticated animal that actually connects all the parts of the company.
Kreps argued that data streaming is increasingly the connective tissue that brings organizations and their applications together. He sees Confluent Cloud as a platform that helps organizations shift away from data at rest and batch-processing tools, towards real-time data streams, with an added layer of connectors, governance and processing, as part of the Confluent Cloud:
What people want out of a cloud service is actually a very different experience. It's something where you can get this fundamental protocol and set of capabilities on-demand whenever you want it, as elastic as possible, where you can have flexibility and think about how that service scales.
Ensuring the trustworthiness of data
If companies are going to use their data to drive business decisions, then they need to know they can trust the data that they rely upon. However, 72% of IT leaders believe inconsistent integration methods and standards are a challenge or major hurdle to their data-streaming infrastructure, according to Confluent’s new 2023 Data Streaming Report.
Help should come in the form of data contracts, which are formal agreements between upstream and downstream components that include rules or policies that ensure data streams are high-quality, fit for consumption, and resilient over time. Yet Kreps said there’s work be done:
A lot of people are just starting to think about governance for any data at all, let alone streaming data. But for some reason, it's more important in the world of streaming, and I think it's ultimately because these streams go between systems, and if that data is not well-governed, if there aren't schemas and there aren't guarantees, and if you can't reason about compatibility, things are going to break.
Confluent’s new Data Quality Rules, which are a feature in Stream Governance, address the need for more comprehensive data contracts. Data Quality Rules enable organizations to resolve data-quality issues and to deliver high-quality data streams across the organization, using customizable rules that ensure data integrity and compatibility:
This is the ability to take your schemas and extend them beyond just simple, semantic rules. There's a set of functionalities that we're releasing here that starts to broaden this notion of how you work with streams and that makes it easier for organizations.
Reducing the operational burden
The provision of accurate data relies on streaming-data pipelines that connect services across the globe. Many organizations have unique data architectures and need to build their own connectors to integrate homegrown data systems and custom applications to Apache Kafka.
The problem for many organizations is that custom-built connectors then need to be managed, which requires manual provisioning, upgrading, and monitoring. The result is that IT professionals spend more time on operations and less on other business-critical activities, explained Kreps:
Connectors are humble, but mighty. To get the data that's going to be replicated around the world in Kafka, you ultimately have to connect to something, and that's a problem that many people think is easy to solve. But to do it right is often a lot of work to actually get the scalability and the fault tolerance and really get the system right.
Confluent offers 70-plus fully managed connectors, but wanted to offer more for its customers – and that’s where its new Custom Connectors come in, said Kreps:
This is the ability to take any connector that you have and run it in Confluent. And this has the same ability to ensure the availability to drive high performance, and just take away the operational burden of connecting two systems, but it now opens it up to a much larger ecosystem.
Sharing data with enterprise-grade security
Businesses need to constantly exchange real-time data internally and externally across their ecosystem to make informed decisions, build seamless customer experiences, and improve operations, said Kreps. At the event, he announced Confluent’s new Stream Sharing functionality, which provides a secure way to share streaming data across and between organizations:
Everything that you have you can now share. You can now opt to open data up and securely share this with your partner organizations.
Today, many organizations rely on flat file transmissions or polling APIs for data exchange, resulting in data delays, security risks, and integration complexities. Stream sharing offers an enterprise-ready alternative, said Kreps:
This is something that is fully in your control. You can do this in a way that's safe and well-governed. It will use the same schemas that you apply to your data internally, you can check the load that this will put on your cluster. And we think that this allows the world of streaming to extend beyond just the internals of your organization.
Kreps also announced that Confluent is creating an early access program for managed Apache Flink, which is an open-source stream-processing and batch-processing framework. Now, select Confluent Cloud customers will be able to test the fully managed service, said Kreps:
This is a technology which I think has an incredibly important role in the streaming world. I think there's a role for technology that makes it easier to build a scalable application that is correctly fault tolerant, that works with streaming data and I think, in many ways, Flink can be the database layer for streaming.