How Confluent helped Rodan + Fields cut data latency from 30 minutes to seconds

Profile picture for user pwainewright By Phil Wainewright October 14, 2021 Audio mode
Summary:
Beauty brand Rodan + Fields worked with Confluent and Google Cloud to tackle data latency with an event-based architecture. Here's its story and lessons learned.

Jason Mattioda (center) Rodan + Fields - screengrab from Cloud Next '21
Jason Mattioda (center) Rodan + Fields (Cloud Next '21)

The last day of every month is like Black Friday at skin care and beauty brand Rodan + Fields, as its army of 300,000 independent sales consultants rush to post their orders and hit monthly commission thresholds. The peak load posed an engineering challenge for its IT team, with data taking 30 minutes or more to update across all its systems. Jason Mattioda, Head of Enterprise Platforms & Data Engineering took to the virtual stage at this week's Google Cloud Next '21 event to set out the challenge and explain how they solved it. He says:

The business value we were trying to solve, first and foremost, is how do we get data to our consultants faster? As I mentioned, on the last day of every month, it's a really key time for them. They're trying to close sales, as many sales as they can, and enrol future customers ... They have till midnight to get everything in, and they need data to know how to operate.

In the old world, it could take anywhere from 30 minutes to an hour or more, from the transaction occurring on our website, to the time that a consultant could actually go into their back-office portal and see that data and know which actions to take next.

Held back by batch

The problem was a previously on-premise architecture that still had a lot of batch processes for passing data around. Rodan + Fields had successfully moved its SAP workloads and e-commerce applications out of a traditional data center and into Google's public cloud, but the architecture still had a lot of database replication, data hoarding and batch interfaces. This back-end "spaghetti ball" was holding them back, as Mattioda explains:

At the end of the day, no matter how agile we thought we were, or how much we were trying to improve front-end experiences, we had this back end — it was an Achilles heel that was slowing us down — which was our data architecture. We knew we needed to change that.

The solution the team settled on was an event-based architecture, running on the Apache Kafka platform. Rather than build and maintain their own Kafka setup, they turned to Confluent, which provides a managed Kafka service running on Google Cloud. As well as running the service, Confluent provided valuable expertise to help put the architecture in place to meet the company's goals, says Mattioda. The result has eliminated those month-end bottlenecks, as he explains:

Now we can get accounts and order information almost real-time — sub-second movement of data across our platform. Even on the data that requires more conceptual transformation — maybe it requires aggregates or KPI calculations — we're certainly getting that data to our consultants sooner, and also to our business partners internally, just using the power of BigQuery and Google Cloud and other modern data technologies that surround Kafka.

Internally, the data warehouse now updates frequently instead of just twice a day as previously, while data flows directly into the CRM platform to trigger targeted customer actions. Next up, the team plans to move to more of a headless commerce architecture to be able to do commerce anywhere, driven by microservices that will call on the data layer now in place. Mattioda highlighted four lessons learned from the project.

1) Build on your existing knowledge

It was important to harness the team's knowledge of how the existing systems worked, while relying on advice from Google Cloud and Confluent to help them become competent in the new technologies. He says:

This project was only going to be successful if we leveraged the existing team that had been around and knew all the ins and outs of our current data flows and our databases, and where all the skeletons were buried. So they had to come up to speed quickly on all these new technologies.

2) Be wise to governance

An important element of adapting to Kafka's data-in-motion environment is the need to do data governance on the fly. He explains:

When you move from this batch-oriented world to a real-time world, there's no time, literally, in the data latency to cover up your data quality and data integrity issues. So you really have to have the foresight of how the systems and your business processes are outputting data, because that's the data you're going to have to work with if you want to present it real-time.

Sure, you can clean it up and you can patch it up, but anything you do to touch that data is going to incur some sort of latency cost and we wanted to avoid that as much possible.

3) Don't be afraid to fail and start over

The unfamiliar environment of an event-based architecture means that you won't know all the answers. So don't be afraid to try things out until you find what works best. He explains:

Most of these guys come from relational database backgrounds and traditional data warehouse and ETL backgrounds. As you move to the cloud, and you start working with these modern data technologies, it is a paradigm shift for a lot of folks. The way that you would have solutioned in the past is not how you would solution in the cloud.

So try things out. If you don't get the results you want, you can guarantee there's ten other ways that you could try it, and that might be better. Learn on the fly, and don't be afraid to back up a sprint and try again.

4) Do your performance engineering from the get-go

The whole approach to building this kind of system means addressing key elements at a different stage than you would have done in a classic project. He explains:

In the days of old for us, performance engineering was always the last step of our project. We'd get the code done, and then we'd go optimize the code to be efficient. You're going to be much better off to set those performance SLAs and do the performance engineering upfront in your project.

My take

Businesses everywhere are having to speed up their operations, and joined-up data is a huge part of that. Batch processes really have no place in this new world of frictionless enterprise, where data needs to be accessible anywhere, on-demand, in real-time, and in a context that's change-ready and supports collaboration. That's particularly true for consumer-facing brands like Rodan + Fields, which has grown massively and expanded internationally into Canada, Japan and Australia since we last looked at the company in 2013.


For more information on this week's Google Cloud Next '21 event, click here. Or for diginomica's full coverage from Next ‘21, see our dedicated events hub here.