Data expires in the real-time web – RethinkDB’s co-founder on the art of open source business

SUMMARY:

RethinkDB sees a chance to change the database market through a real-time, NoSQL approach. Here’s how they use an open source ecosytem to drive their business model.

rethinkdb-artRethinkDB is now the number one document database on Github. Ranked 51 on db-engines.com’s database popularity list, RethinkDB has surged from number 82 a year ago.

Billed as “the open-source database for the realtime web,” this NoSQL upstart took an unconventional path to prominence, relying on community-based development. As of today, the Mountain View-based company has 17 employees, but they support a community of 100,000+ developers, including 150+ global product collaborators.

I recently caught up with RethinkDB co-founder Michael Glukhovsky to get his take on the real-time web, and why he believes RethinkDB has what it takes. We discussed real world use cases, and the advantages – and perils – of staking your business to an open source community.

RethinkDB uses the ReQL query language, an internal language that is officially available for Ruby, Python, and JavaScript developers. Given that RethinkDB is a document-based database backed by JSON, it sounds pretty similar to MongoDB. Is that true? “Yes,” says Glukhovsky, “but the similarities pretty much end after that.” So what’s the difference? That answer can get a bit technical, but the rationale behind it is simple: RethinkDB is designed for the real-time web.

Fair enough – but why does real-time matter? Glukhovsky:

Cisco put up a really interesting report on real-time data. When I first read it, I thought “That can’t possibly be true.” Their claim was that by 2019, about 98 percent of all data is going to be essentially real-time in nature. That’s a crazy number to think about. When you dig into the numbers, it turns out that most of it is IoT traffic from devices.

But Glukhovsky sees a problem. When it comes to business value, data has an expiration date:

Data has very transient value that expires in the moment. If you act on it, it has enormous value to your business. The thing is, the tools don’t exist yet to be able to deal with this. If you look at most databases, if you write a query and get a response, and you want to be able to update it in real-time on that query,you have to constantly pull the database. You have to ask it every five milliseconds, “What’s changed, what’s changed?

The real-time imperative – “we’ve inverted the traditional database architecture”

RethinkDB was launched in 2009, but the big push came in November 2012, when the first open source version was released. Add a viral response on Hacker News, and RethinkDB was on its way. But it’s the Github community that has made the difference. That means building a database in the open. Glukhovsky:

Github is essentially the social network for developers. We decided just to put it on Github and work out in the open. We don’t have any private issue trackers. We do all the development in public. We interact with each other, as you would on any open source project. It’s just that we happen to be paid by the company. We care about the company, but we also care about the project. That’s a really powerful thing, because now we have hundreds of people who watch every interaction we have online.

Mike Glukhovsky
Mike Glukhovsky, co-founder

But RethinkDB isn’t the only NoSQL database with an open source play. For Glukhovsky, the difference is the real-time architecture:

What we’ve done is invert that traditional database architecture, allowing you to open a stream on any query. What that means is that you can set up a number of streams – you can set up tens of thousands of streams in parallel on a single machine, and you can scale it out to other machines if you need. Whenever something gets modified, updated, added or removed, the database will send you a little push, saying the data’s changed. What this means is that you can build scaleable real-time architecture by just telling the database “Here are the things that I care about,” just pushing the updates as they come.

Glukhovsky believes we are heading towards a loose, real-time stack, comprised of tools on the middleware (Node.js), front end (Angular.JS from Google, React from Facebook, etc) and database level. The end result? Empowering developers to build “dynamic” applications that can react to data as it comes in, “reshuffling” the UI as needed to present new information. Or, in buzzword bingo terms, end-to-end, asynchronous, event-driven development. Glukhovsky sees RethinkDB as the missing piece: a database designed to do real-time push.

But that doesn’t necessarily mean “rip and replace.” Glukhovsky sees plenty of scenarios where RethinkDB could work alongside existing SQL and NoSQL databases, as well as Hadoop clusters:

If you look at most business environments right now, one database fits them all is usually not the way people approach problems. They’ll have their transactions database, sometimes SQL, sometimes NoSQL. They’ll have their analytics database – this will be Hadoop or other analytics systems. Then they’ll have their graph database for graph data. In this case, we’re targeting the use cases where things are being updated in real time, and you need to have lots of devices connected. You can do MapReduce queries on RethinkDB, and it works great. You’ll be able to receive real-time updates on those aggregations.

Real-world use cases – real-time alerts and response

Real world use cases could include click-through rate comparisons. Example: you may want to compare two versions of the same campaign. If the click-through rate for a campaign drops below a certain threshold, you could be alerted and take an action such as an image swap or email follow-up. Vending machine notifications are another scenario: instead of receiving a ping every time a drink is sold, set a real-time alert to be notified when a particular item goes below a certain threshold.

Glukhovsky shared two live use cases:

  • airport security system alerts – in six U.S. airports, RethinkDB is used to monitor the system security software. When the equipment goes down, a mechanic is dispatched in real-time (otherwise, they’d have to pull the system and troubleshoot).
  • snow guns alerts – a company in Utah has deployed geo-sensors across their slopes. RethinkDB supports these geosensors, monitoring snow levels. When the numbers go too far down, someone is dispatched in real-time with a snow gun to refill the slope.

The art of open source business

We talked in detail about the pros/cons of working openly on product in a community setting. Github has proven to be an ideal platform for RethinkDB, but they didn’t get there without some knee scrapes. Eventually, RethinkDB “issue etiquette” guidelines were created by co-founder Slava Akhmechet. These guidelines cover project communication protocols, such as owning up to your delays (.e.g. “Inform everyone if you slip“)

Community at RethinkDB warrants a separate post, but the bottom line is you have to allocate time for listening, and responding in an effective way. Examples of community tactics employed by RethinkDB:

  • Projects are organized by discussion threads that contain hundreds of comments. Often, community members take the lead.
  • When people hit the low end of the “community contribution curve,” it’s up to RethinkDB to reach out and figure out how members got stalled out.
  • Shirts for stories – a program where those who write about how they’re using RethinkDB receive a t-shirt in the mail.
  • Local meetup groups – RethinkDB has encouraged the formation of local meetup.com groups for networking.
  • Give out the full version of your software. Glukhovsky doesn’t believe in withholding features for a “premium” version of the database at an extra price. (“You don’t provide value by withholding features. You want to provide value by actually providing value.”)

rethinkdb-drawing2Last but not least, RethinkDB has an in-house illustrator, Annie Ruygt (the illustrations in this article were provided by Ruygt). Glukhovsky on the power of art:

Just like software is a vehicle for ideas, art is also an amazing vehicle for ideas. She’s able to engage with the community in a very deep way, because she just creates art around it non-stop, and it gets people to a very emotional place very quickly. Also, it’s great because it builds a lot of accessibility for the project. The documentation is full of characters, life and color; every release has a movie poster that goes along with it. People really get engaged when you have art become part of the process.

Glukhovsky advises that external collaborators be perceived as part of the company’s internal culture. The Github issue etiquette guide was one step towards that:

When someone comes on to our community, and they violate one of those rules, we can just say, “You violated this issue etiquette guideline.” By putting the rules upfront, you’re able to establish the ecosystem, the nutrients in the soil. It requires thinking a bit differently, but once you do, it’s amazing because everyone just ends up interacting with each other in an authentic and kind way. Which is what you really want when you’re trying to build something as a product.

RethinkDB still has mountains to climb. It will be interesting to see how their open source culture scales with their growth.

Disclosure: diginomica has no financial ties to RethinkDB. I was approached by their PR team and found the story interesting.

Image credits: illustrations by Annie Ruygt provided by RethinkDB.