Confluent and the confluence of legacy, real time microservices, the cloud and `sympathetic reinvention’

Profile picture for user mbanks By Martin Banks October 29, 2020
Summary:
Confluent creates collaborations between established legacy applications and massively scalable, real time cloud services. Now it holds out hope of extending that capability out into the world of edge computing

Image of the world from space, surrounded by data
(Image by Pete Linforth from Pixabay )

When a company uses a phrase like `the sympathetic reinvention of legacy applications' even a cynical old hack is likely to say `eh?? OK, I'll bite'. So I did. 

What I found was not just what the company involved, Confluent, wanted to tell me about - how event streaming using Apache Kafka could solve an emerging problem of giving old but still highly valuable legacy applications a new and broader lease of life in the widening world of working in real time with the growing army of cloud native applications - but also how it could utilise the same basic approach to provide the real time collaboration link between existing IT environments and the coming world of edge computing.

As many will know, Kafka started life some 10 years ago as a message queuing system within LinkedIn, which then pushed their development out to the open source community, where it soon developed into a message and event streaming system. 

Confluent then started to work with it to build a distribution that added a range of community, commercial and operational features - as well as applications development capabilities, particularly for applications operating at massive scale.  

According to Ben Stopfold, the lead technologist at the office of the CTO at Confluent, the base use case for the company was as a tool for getting data into Hadoop and that is still a major application today. But new use cases are now emerging  such as connectivity to data lakes and a range of different data warehousing tools, as well as stream processing, which allows users to form aggregations by summarising data on the fly, as well as joins and other SQL-like operations. There are also a lot of use cases emerging around microservices, both as request response interfaces and those that do back end tasks.

You see the same kind of patterns in banks where you have this rich split between event based microservices, and those that run your online business. So I think that the sympathetic reinvention thing is really about that. If you're an enterprise business, you've actually got a lot of value locked up in the systems that you've spent decades building. Often those systems do a pretty good job.

Mainframe as archetype

The obvious example here is the mainframe, which does an excellent job even though many software engineers would prefer someone else to program them. But there is a need to evolve these legacy systems forward, and in Stopfold's view, event streaming is a really good way to do that because the microservice approach allows developers to evolve away from the mainframe monolith, maintaining useful functions of the legacy application functionality while extracting data and using the events to drive processing.

He mentioned, as an example of what was now possible, UK challenger bank Monzo, which is based on an event-driven architecture similar to that of Netflix.

So this is taking existing established legacy applications that businesses still rely on, don't want to change, but they can make more of them, develop off the back of them in different areas and add more to their own business in some way.

In his view, this makes a good alternative to the more traditional solution of `rip and replace', especially as for many businesses they find they cannot quite complete either phase, and end up with two systems. Here, the legacy system continues doing what it does best, and the new systems are an external extension, working with data sourced from the legacy systems. 

So the mainframe has got one database, but the extracted data can then be used in different databases in different locations around the world that offer better caching, localisation, ease of access and high scalability.

The important underlying capability here is the way Kafka can be exploited to build new and more complex ways in which data can be shared. In most companies, the data is locked up in many different databases, and often difficult to access. The Confluent goal is to allow users to extract data from multiple databases and put the results into a messaging system that can also store data for as long as required: permanently if required. The data can be extracted at will, and it is also possible to process that data in situ.

In practice, the challenge is that any larger company is likely to have thousands of databases and what is required is the ability for a new application to get hold of real time and historical data sets that have originated and are held in different parts of the company. The archetypal use case for this is a bank.

If someone is correctly permissioned, they can go to Kafka, get a stream of customer accounts, and put that straight into a database. If any of the accounts change, that should be updated inside the database. But triggers can then be added for, say, unusual payment changes which can start a fraud detection algorithm. This can be done at the application level or the database level, in both cases at high scale. This opens up a wide range of possibilities for search algorithms to be applied to the data stream.

Meanwhile, out at the edge

That is the sympathetic reinvention part of what Confluent can offer, and it should prove to be of real value to CIOs looking to generate more value from legacy data - and in particular when held in mainframe systems where the data can now be re-used and exploited without getting to grips with wrangling the existing applications code. But it may well prove to be only part of the story, for those same capabilities of building new applications around events triggered by streamed data could open up the `how to…' of the next big thing - edge computing.

This approach presents an opportunity to build new applications, services and microservices that diffuse the traditional roles of legacy applications around the virtualized business infrastructure. They can be triggered by both extracted data plus data generated and processed at the edge, and return processed `results' and `reports' data back to the legacy applications as the keepers of the current state of business truth. The legacy applications' value is then maintained and the need to re-engineer them to work in this new environment is greatly reduced or eradicated.

I think you're bang on. If you were to tell that story, I would just add to it that when you're talking about virtualizing functionality the most common example of that is extracting functionality and virtualizing it somewhere else is to take functionality that runs on premise and virtualize it on the cloud. 

All our streaming does is basically make you location-agnostic. Once you have your data in these event streams, you're done. No matter where you are, you can be running an application that's right next to it; you can be running it in Tokyo, or wherever it might be running on a cloud. You still end up with data in databases. But you often end up with databases in different places or different repositories that have different views of data that suit particular use cases.

If there is one underlying goal for Confluent, it is to make data available, self-service, to different users across a business so that new applications can be developed to exploit that availability in real time. And because edge computing has, at its heart, the virtualisation of traditional datacentres, it also provides the tools needed to build an event-driven process management environment that can work in real time, at scale, with large numbers of different databases. In that context Stopfold sees edge computing in a similar way to a retail chain of shops.

Taco Bell is a good example. It has a Kafka cluster in every single store that manages their system, and manages the movement of data to and from a store. And if a store is disconnected, it can still operate. It can't do everything because it can't talk to the central servers, but it is autonomous.

My Take

The idea of collaboration between widely different applications and services - old and new, singular or in their thousands, small or scaled out massively, old-style batch or cloud-native real time - becomes possible when the data, and the events that implicitly caused their creation, also becomes the communications medium. So it becomes possible for highly valued mainframe legacy applications to enhance and extend their value to a business without recourse to difficult and expensive engineering.

And exactly the same approach can be used to build the necessary bridges between the `back office' and the increasingly important edge where, day-by-day, ever-more of the primary processing workload will be carried out.