Since CEO Chet Kapoor took the helm at DataStax just over two years ago, the company has been on a mission to redefine what it means to work with distributed Apache Cassandra open source databases.
Cassandra has long been a database darling for companies that wanted internet scalability. But for a time it also grappled with a reputation that developers didn’t find it the easiest to use, nor the most manageable.
Whilst Astra enables developers to scale database resources up and down on demand to match application requirements, Stargate is a data gateway that allows developers to use any data store for apps.
And now the evolution of DataStax continues with the announcement of ‘change data capture’ (CDC) for Astra, a new capability that is powered by Apache Pulsar, which processes and delivers database changes in real-time via event streams. This essentially makes real-time data available for use across data lakes, data warehouses, search, artificial intelligence and machine learning.
But beyond the technical announcement, the move signals a future direction for DataStax - where the company is putting a stake in the ground as the database to beat for real-time event streams. Event-driven architectures in the enterprise are fast becoming a popular operating model, as companies recognize that responding to data in real-time should be the de-facto standard for digital business.
However, legacy architectures hold this back, as data often sits at rest, and insights are based on historical information and batch processed only every so often. DataStax recognizes as the world moves towards real-time insights, it will need systems and tools to enable that.
This week diginomica got the chance to sit down with DataStax CEO Chet Kapoor, who said:
It's all about real time. People sometimes get into ‘is it about nanoseconds and microseconds?’. No, it's about real time. Whether it's web apps or whether it's mobile apps, it doesn't matter. It's about apps and it's about doing things in real time. And you cannot do real time without streaming.
If you think about our heritage, it is all about a database-as-a-service or databases, which is data at rest. But what was very clear to us after about six months of being at DataStax is that developers don't just care about data at rest, sitting inside a database, they care about the stream of data coming out as well. And so we had to get into that business.
DataStax describes its current strategy as having five core pillars. These are:
High growth applications
Limitless data stack
Market leading unit economics
Open source data
With Astra, Stargate, and now CDC for Astra, it’s easy to see how a future of ‘real-time’ encapsulates all of these.
Kapoor said that DataStax did initially assess Apache Kafka, which does have a thriving ecosystem and is fuelling the growth at companies such as Confluent. But he explained why DataStax ended up pursuing Apache Pulsar, a cloud-native, distributed messaging and streaming platform that was originally created at Yahoo!, but is now an Apache Software Foundation project. He explained:
We had to get into [the streaming] business for three reasons. One was, we have this beautiful Cassandra database, but you want to put stuff in and out, how do you do that? You need CDC, right?
Our first thought was, why shouldn’t we use Kafka? We have a Kafka sync, we’ve had that for a long period of time, a lot of people use Kafka for CDC with Cassandra, which works really well.
But what we realized was there were two problems. From a total cost of ownership and resources perspective, Kafka is very expensive. It’s almost 50% more expensive. But also, Kafka was not built to be multi-tenant. So it is not cloud native, per se. Architecture is like DNA, you pick one for a decade.
So we thought really hard about it. And we said let’s look at Pulsar. It doesn’t have as big a community as Kafka does, but we are picking technology for the next decade, so we are picking Pulsar. We already have CDC for Cassandra available, but what we’re making available is CDC for Astra, which is our cloud product.
However, Kapoor has bigger ambitions for his ‘real-time’ ethos at DataStax, beyond getting information in and out of databases. In particular DataStax sees an opportunity to go after enterprises that have been using vendors such as Tibco for messaging. He said:
There are also two other use cases. There are large messaging environments in the world and it is now time for a replacement or refresh of those. A modernization of those. We are working on some very large implementations, where we are going to become the de facto streaming standard for the entire corporation.
Being the event fabric in the enterprise is definitely our intent. Just to be clear, JMS (Java Message Service) and Tibco have been around for a very long time, they have very large implementations. We absolutely plan to go and work on those…to replace them.
Then the third one is data science. How do you take those feeds and actually send them to a data scientist so that they can run their models on top of this, not just streaming, but stream processing. We wanted to go after all those use cases and make them happen, but we are starting with CDC.
Building CXO understanding
Kapoor acknowledges that part of DataStax’s job is educating organizations - specifically the C-Suite - on real-time data driven operating models, using event streaming. This is why Kapoor uses the term ‘real-time’ instead of ‘event processing’, as it feels less esoteric and more tangible. He said:
I end up talking about real time to more CXOs and board members, than I do CIOs. CIOs intuitively get it. They have been doing it for a while. It’s the CEO and the board that doesn’t get it. Real time data is a lot like agile, it’s a mindset.
If you do not bring that mindset to an enterprise-wide initiative, you will never get there. You have to make sure you’re cool with being iterative, because you’re not going to get it right the first time. Number two, you have to make sure you have an enterprise-wide initiative. You have to have a data ops group that goes across the whole company.
The third thing, the most important one, you have to have what we call data pods. These are cross-functional teams. You have a business person, a tech person, a data scientist and you put them all together and say: you will live and die by this initiative.
And so we end up spending a lot of time talking to customers about things like this, because frankly, the tech is actually the easy part. Having the teams together - the tech person, the data scientist, the business person - all looking at the same data all the time, in real time, and saying ‘here's how we make the changes’.
It feels like the tide has turned at DataStax. That’s not to say that it was struggling in the past, it had a loyal customer base and an impressive set of use cases. However, with Kapoor’s new ‘real-time’ strategy, and the pieces that have been implemented over the past couple of years, it certainly feels like there is a much bigger opportunity in front of it. Astra has driven significant growth and now with the added real-time capabilities, a whole host of new use cases will emerge. Not only that, but with stream processing and event-driven architectures, this lends itself to DataStax taking on much broader implementations. But, as ever, execution is key. We look forward to speaking to customers that have embraced this idea over the coming year.