How AdStage moved beyond startup scale with Apache Cassandra as a data service

Jon Reed Profile picture for user jreed November 27, 2017
Startup growth is never a bad thing - but the strain it puts on an IT team is real. Jason Wu of AdStage told me how AdStage addressed its data scale issues, and why Apache Cassandra-as-a-Database service was the right choice.

Sometimes you choose a startup; other times it picks you. That's how it was for AdStage co-founder and CTO Jason Wu.

While doing his homework for an entrepreneurship class at Carnegie Mellon, Wu interviewed Sahil Jain, who was to become a fellow co-founder at AdStage. As Wu told me, the interview took a surprising turn:

Sahil thought I was an MBA candidate, or some kind of business major, so he just kind of pushed me off for a bit. Then when we got on the call, I asked him all the questions that I had. Then he started asking me about myself. Once he realized I was actually an engineer, he just changed gears completely, and started pitching me on this idea that he had for AdStage.

AdStage and the the problem of disparate ad campaigns

A month later, Wu made the startup jump. Today, AdStage is an established ad campaign management platform with 300+ live customers. What makes AdStage different? It's about integration. AdStage bills itself as "All your advertising data in one place. Bring your paid search, paid social, web analytics, and custom metrics together." Sounds cool enough, but as Wu told me, managing disparate campaigns is a tough marketing problem:

Sahil was actually the CMO of another startup. He was spending so much time and effort trying to manage all  these different campaigns across these different networks, and still at the end of the day, wasn't really figuring out how to best increase his ROI, wasn't really figuring out tricks to optimize across different networks, because they're all very different, in terms of targeting, in terms of how to bid, and things like that.

To solve this issue, AdStage combines campaign management and reporting on the same platform. But there's one more component: education. They want to help smaller/growing businesses make sense of the complex ad placement options:

A big part for us was also education. We really wanted to help solve that problem, especially for small to medium businesses, of getting ramped up on these different networks. So pretty much everywhere in the product, you have tool tips that explain the options you are working with, or what a certain metric means - with links to or blog or tips and tricks.

The problem of massive ad data

Wu tells me AdStage is the only platform that allows companies to manage campaigns across all five biggest ad platforms: Facebook, Twitter, LinkedIn, Google and Bing. AdStage's data volume has pushed Wu's team. Apache Cassandra Database-as-a-Service has proven crucial to the scale they've achieved. Wu's data scale problem started several years ago, as they began to ingest customer data:

Our platform is designed for reporting and automation, so as part of that, we actually need to import a lot of our customer data, on a very regular basis, because we're constantly wanting to update data, to make sure that we automate campaigns successfully: on the latest number of clicks, or the latest ad costs, and we also want to be able to send out reports on a regular basis, with very up-to-date data.

That means big data pulls throughout the day:

Every two hours or so, we actually pull in all of the campaigns, basically account structure data for our customers, so if they have a Google AdWords account, we'll pull in the account, the campaigns, the ad groups, the ads, the keywords - all of that. Then we also pull in the metric data per day, and there's a lot of metrics too. Facebook and Twitter have hundreds of metrics combined.

AdStage ingests about four months' worth of data for each trial user. For customers, they pull in two years of data for historical reporting. Those campaigns add up:

If you think about the number of keywords that people can have, for example I think the max amount in AdWords is something like four million, that can end up being a lot of data over two years.

Time for a change:

Our original relational database setup just wasn't working anymore. We were using Postgres at the time. The write workload itself was just bringing our one Postgres instance to a standstill, and querying the data out was also very slow during some of those heavy import times.

Why Cassandra-as-a-Database-Service?

So in 2013, Wu launched a search for better solutions. They evaluated several, but Cassandra jumped to the fore:

The NoSQL aspect of it came up because Cassandra is well-known as a great time series store, which is basically what we're trying to store for our customers. At the same time, it gave us the ability to horizontally scale out, so as our customer load continued to grow, we could just bring up new nodes, add them to our clusters, and then continue to scale out write and read throughput, with the write being a little bit more important for us, especially with those heavy imports.

Though their participation in the Datastax startup program, AdStage got more exposure to Cassandra. In late 2013, they made the switch, running Cassandra clusters on Amazon. But they weren't done yet. Their new Head of Operations, Gordon Worley, met up with Instaclustr at the Cassandra Summit, and learned about Instaclustr's Cassandra-as-a-Service offering:

We basically had a trial of their service, and really enjoyed working with them. They even helped us with some of our data modeling, and suggestions on tuning, so it ended up turning into a pretty great partnership.

So why DaaS instead of running their own clusters? Reason one: uptime pressures.

With the reporting and automation we're doing, we really want very high uptime. We don't want our customers to be without their data. We don't want our automated tasks that are actively running to optimize their campaigns, to fail for whatever reason. So that was certainly one part of it. We wanted a team, we wanted someone who really had our backs, who could watch out, give us guidance, and give us proactive alerts, before we actually ran into issues.

It was also an IT philosophy thing. AdStage wants to use its IT resources to differentiate, not to administrate:

Another aspect of it is that it took a lot of work off of Gordon. I think internally, we really want to focus on building our products, to differentiate ourselves from other solutions in the space. We don't necessarily want to have to rebuild monitoring tools for Cassandra, and things like that.

The wrap - beyond servers to results

AdStage went live with Instaclustr in September 2014. Since then, their main cluster has grown to 70 nodes, with more clusters being added for new initiatives.

I asked Wu: what is the "aha moment" when AdStage customers realize the value they're getting? He says on the reporting side, it's when they get the visuals rolling:

If they're using exclusively our reporting tools, it's really once they get some dashboards set up, that really have a lot of the data that they need on a daily or weekly basis, internally or for clients, or if they're using our automation tool, sometimes even instantly, they'll just see, "Oh, wow, I can save so much time using this tool."

We talked about some of the problems facing the ad industry. Wu agreed that issues of data privacy are heating up, with stronger European regulations upping the stakes. At the same time, consumers still seem to be of two minds with data privacy. Many seem to accept the tradeoffs between data sharing and convenience/value.

Wu brought up MoviePass, a booming $10 a month service that lets you watch a movie a day. The model doesn't work without a data-for-value swap:

Advertisers are going to know every movie that you watch, and where you're watching it, and maybe even more than that, and people are willing to give that up for $10 a month, because it's so cheap.

That's the tightrope those in the ad business will have to navigate, hopefully with transparency and relevance rather than the dreaded YouTube "pre-roll" ad that seems designed for everybody - and thus nobody.

Wu's team has new offerings in the mix, including a "universal API" access that is proving popular with AdStage's enterprise customers, who want to ingest AdStage data for their own dashboards and metrics. I look forward to an update.


A grey colored placeholder image