KOR is hoping to change the way that regulators and the derivatives market thinks about trade reporting and compliance, thanks to its technology-driven approach. The data repository platform recently received CFTC approval and has only been live for a couple of weeks, but KOR is confident that its use of Confluent Cloud will lead to quicker and more compliant trade reporting.
Speaking with Daan Gerits, KOR’s Chief Data Officer, at Confluent’s user event in Austin this week, he said that for the first time the derivatives market will have sub-second actionable analytics on submitted data, on-demand reports of full history, and insights to improve compliance processes.
Gerits in particular believes that KOR’s use of Confluent, which has enabled the team to create a persistent state store, or put another way, an immutable log of data, means that the platform will provide superior insights compared to what is currently on the market. He explained:
At KOR we have a very specific problem that we are trying to solve, which is collecting trading information for regulators. And we decided to do it in a totally different way to the way that most people are doing it. Where others would be using data storage or big data technologies, we decided to go all in on Kafka. We are building our system to store 160 petabytes in Confluent Cloud and then work on top of that. We don’t have any other database. So it’s a long retention use case.
When you look at how organizations work, there is one common denominator - something happens over here, you react over there. So we have been event-driven from the start, but we have become obsessed by just a state of things and being able to just roll up whatever happened, the behaviour that happened. And just look at the consequence and keep track of the consequence.
That is interesting to see because that has been the driver for the last 40 years. It’s not about the consequence, it’s about what led up to it? So we started asking ourselves questions like: we have this state of organization here, but how did we get to this point? So that’s when we started hiring data scientists, to investigate how we actually got to that point.
KOR uses self service reporting, whereas its competitors traditionally rely on big data technology tools - such as Spark - which may require someone going away to produce a report on the trade data from a moment in time. But more importantly, the reporting that KOR is able to do, because of its event-driven architecture, is more sophisticated. Gerits explained:
If you look at the other companies it’s not that different from what was happening in 2011. There is no conceptual main difference in how they’re treating data, they’re still storing it, they’re still treating it, they’re still doing ETL. But if you start looking at it from a totally different view, and you are able to look at it from an event driven mindset, then things change. Because all of a sudden you can actually answer questions that are very hard to tackle if you look at it from a state point of view.
For example, you can have a trade and then three months later they can ask, what happened at that point? They can then say ‘that isn’t correct, it needs to be this’. But you need to be able to report in both ways. You need to be able to report at this moment in time, but also at that moment in time. So you have to take the correction into account for some reports, but you also want to know what it was without the correction. That’s very hard to do in a traditional system, because it would mean that you need to have an event log or persistent transaction log, being able to go back in time and rematerialize the state as it was at that specific moment in time.
Challenges of using a new model
KOR is the first CFTC approved platform of this kind to operate fully in the cloud. However, Gerits admits that working with data streams in this highly regulated market hasn’t always been easy. For instance, he said:
There are some challenges when you have long retention and mild throughput, which is different from the usual use case of Kafka being used as a shipping vessel to get data from one place to another and then doing some analytics in between.
It tends to be more short term, you don’t have to take care of a legacy of schema changes. What we need to do is keep the data for up to 40 years, which is a huge challenge when it comes to evolution. We are making contributions to Kafka itself, we are building new technologies and open sourcing them, because the whole system isn’t really on that point yet. There are still gaps.
However, the biggest challenge - which is something we have heard time and time again at Confluent’s user event this week - is adjusting the organization’s approach to getting used to working with data streams. Gerits said:
I think 90% of going into streaming is a mindset problem. It’s not a technology problem. Getting people to the point where they understand that they are dealing with immutable data - if you have written something, you can’t just go in and change it. That is something that is strange to them.
I’m not just talking about business people, I’m talking about developers as well. They can’t just go into the database and do an update to the table. You can’t do that - you have to do a counter operation, etc. We spent a huge amount of time thinking about how we were going to deal with schema evolution - if a new field is introduced, what should happen? What are the side effects? And it is more complex in that context, thinking about it in that global scope.
But the processes by themselves become very, very straightforward, because the applications that we build are very simple. And it allows you to have a very clean architecture.
And this clean architecture has its benefits, once the organization is on board with working in this way. Gerits said:
We have built the platform to have flexibility. We are a start-up, the chances of having to do a pivot down the line are real. So we wanted to make sure that whatever happened, we will always have the flexibility to move in one direction or the other. That’s really crucial for us. But also if we develop new services, we want to be able to plug those services into the main platform.
That’s very different to a large organization that introduces a new product, it takes four months trying to figure out how it’s going to impact all the other applications. We don’t care. It’s just a matter of hooking into the right streams and detecting it.
If you want to do fraud detection, or analysis of your trades, or performance dashboards - all of that is so straightforward because the only thing that you need to do is listen for the events.