Can big data pay off for retailers? Only if you get personal, says Rubikloud

Profile picture for user jreed By Jon Reed January 5, 2017
Big data can be a very dull conversation. But extracting data value from verticals is a lot more interesting. Kerry Liu of Rubikloud told me why he's betting big on retail intelligence - and what he's learned about retail's data struggles. Spoiler: cloud, machine learning and personalization all factor in.

I'm not a fan of the "big data" PR pitch. But I made an exception for Rubikloud. Instead of a sleep-inducing big-data-is-wonderful message, they wanted to talk about why big data can change retail for the better. With a monster retail show looming in New York City shortly (NRF), I took them up on it.

I soon found myself on the phone with Kerry Liu, CEO of Rubikloud, to find out why he left other ventures to roll the dice on retail analytics.

Why do retailers struggle with data?

Though he started his career as a PWC consultant, Liu moved to Strangeloop - a web acceleration company for e-commerce retailers. This gave Liu a bird's eye view of the issues retailers face. So why do retailers struggle to derive value from data? Liu:

I would go into these meetings with our clients - all big e-commerce retailers. Every single one of them would ask us, "What the hell is this stuff? What do we do with it? What's Spark? What's Hadoop? What's cloud infrastructure? Why did Twitter buy this random company called BackType? Should we be using it? What's going on?" Suddenly whole world was changing for these e-commerce companies. That was the genesis for Rubikloud.

Rubikloud's first goal? Productize data streaming technology for e-commerce commerce. Things moved quickly. Though they started out crunching a Google Analytics stream, Rubikloud soon realized that external data held the key:

As we got more deeply integrated with these large retailers, we realized that the gold mine - and the holy grail of predictability - lied in the old traditional offline data.

So Liu and team re-architected their entire system to be able to automatically clean, move and add offline data. They integrated machine and human-readable variables, including offline point of sale (POS), CRM, and merchandising data. Soon they were dedicated to retail, a $1 trillion market:

We became this very vertically deep data company around the actual database layer. We basically moved all of this old traditional legacy database stuff into either Google, Microsoft, or Amazon, and put the Rubikloud product on top of it. That's how we got into the state we're at now.

Rubikloud bills itself as a "retail intelligence platform," with the intent of helping companies turn retail data into revenue. Their smallest client is a few hundred million in revenue, with fifty stores. Their largest client? A $50 billion retailer, tracked in Rubikloud with five years of historical data. Liu figures there are between 750 and 1,000 retailers globally in that general range, so they won't be moving downmarket anytime soon.

Retail and big data - good approaches and bad

What has Rubikloud learned about retail data so far? Or, as I put it to Liu: how do you avoid getting bogged down in retail data without ROI? Liu cites two principles:

  1. Combine data that you own and control with data from external sources (e.g. product reviews, Nielson or weather data).
  2. Avoid the trap of grappling with all of your data. In most cases, you don't need all of it.

Liu explained:

If it's external data, we'll grab it in and we'll deal with it ourselves through our system. But if it's data that you control, our big push back to the retailer is: we really don't need all of it. It's already duplicated in a hundred different positions.

Liu gave two examples of streamlining data:

  • e-commerce data - Rubikloud doesn't need data from all the analytics providers (Google Analytics, Webtrends, etc). "If you just give us access to your actual e-commerce platform, that's all we'll ever need. Everything else is just a replica, or a derivative, off of that."
  • POS data - "If you gave us access to the POS systems data, that's good enough. We don't need access to the centralized databases that connect your POS system to your store associate scheduling data. That's not relevant."

Big data should still be selective data:

There is this tendency to say data overload is good, but in reality, if you are very targeted about the important data sources that you need, the rest of it is a waste of time, money and energy.

Okay, so we know the data sources we need - but that doesn't get you to ROI. How do you get there? For Rubikloud's customers, it's about moving to individualized triggers. Liu used the example of a loyalty program, where many shoppers might be dormant:

We always thought that if you organize the data properly, and you didn't just show people interesting analytics on their loyalty program, but you gave them personal recommendations, we'd see results. How do you active those people who are largely dormant, but are still regular shoppers, on an individual level? What if you could use machine learning algorithms to actually give you a tactical campaign, and a tactical recommendation for each person? We'd see a significant uplift in the status quo.

Many companies have invested in loyalty programs, but the customer experience ends up feeling "dumb" and impersonal - like those frequent flyer generic emails. That's a big lost opportunity:

Loyalty programs are an underutilized asset. Your loyalty members have already gotten to a level where they care about the brand, and if nudged, or incentivized, or treated in the right way, they would reward you with a much more comprehensive purchasing process, and even more important, data. Today, loyalty programs are largely utilized in a dumb way.

Just for clarity: Liu is not calling retailers dumb. He's saying that the systems we use to engage our most valued customers have not been smart enough:

If you think about the last email you got from a loyalty company that, a brand that you care about, it probably wasn't that good, or the last in-app push, or in-store associate offer, it probably wasn't very good either.

Customer example - migrations and forecasting

I asked Liu to flesh this out with a customer example. He cited a "multi-billion dollar US mass beauty retailer." The company has high loyalty activation rates, with five million loyalty members. Liu told me Rubikloud solved three problems for them. The first was cloud migration:

They had spent about a year trying to migrate data into Microsoft Azure. They were getting nowhere with it.

This customer had burned through a lot of money with systems integrators and Azure. The cloud sheen was wearing off:

I believe they were thinking, "We need our data in the cloud. Everything's going to be great once we do it." The cloud is easy. Well, a year later, maybe cloud wasn't so easy."

The first step for Rubikloud: moving the customer data into the cloud. They moved five years of historical data, online and offline, into the cloud in two weeks. After the data migration, Rubikloud implemented their loyalty solution, achieving the 10 percent + uplift. They also implemented price forecasting. There were discounting limitations, in this case, dictated by the beauty suppliers. Other forecasting options were utilized:

We had to play with a lot of other predictive levers, like how much to order, which stores should start carrying it, and pricing decisions. Our product around predicting that, helped their buyers and their category managers on a daily basis. The forecast was 9.3 percent better. More importantly, it saved them a lot of time.

The wrap - data issues include legacy systems

I'm a fan of a vertical data focus, but that doesn't mean smooth sailing. Liu acknowledged as much, pointing out that a good deal of data today is "stuck" in traditional on-premise systems from the likes of IBM, Oracle, SAP, Teradata, Informatica and so on. Liu believes that data is "very hard to to predictions and machine learning on." Why? Because cloud is key to what happens next:

It's not elastic. To use cloud computing terms, I can't spin up a model on your entire data set. If all you have to offer me is the Oracle or SAP machine that data sits on. I need to move it into an elastic environment, so I can literally spin-up a Spark job, and spin-up a thousand servers for eight minutes, and spin them back down.

He says retailers are waking up to this now:

Retailers have realized that a lot earlier than we thought. Their data has to get replicated at the very least - if not one day replaced - with a cloud-based ecosystem of that data. Otherwise, their internal teams, their internal data scientists, their internal consultants, companies like Rubikloud, they can't even get started.

Liu says Rubikloud can ease those transitions with a migration product to Amazon, Google, and Microsoft clouds, with more cloud options to follow. That solves for about half the data retailers will care about. As for the other half:

They still need to find a way to move it into the cloud themselves. Not doing that means they have a failure to start with a lot of the machine learning stuff.

Sounds like a big and audacious road ahead.