Hadoop, Cassandra vendors see NoSQL applications emerge

SUMMARY:

Today’s experimentation provides some clues to the enterprise NoSQL applications that will emerge built on Hadoop, Cassandra and similar platforms.

© kostin77 - Fotolia.comBilly Bosworth, CEO of DataStax, took his audience on a trip down memory lane as he opened last month’s Cassandra Summit in London. He recalled that his career choice as a computer science major in 1992 had been between skilling up for the IMS database running on an IBM ES9000 mainframe or a Sun SPARCstation running Oracle: “I chose the route of the Oracle world and that was fantastic.”

Three decades on, Oracle has replaced IMS as the legacy choice and the NoSQL generation believe it’s their turn to inherit the future. Said Bosworth:

It’s not that RDBMSs are bad, it’s just that they’re antiquated.

A radically connected world requires a distributed transactional database like Cassandra.

A day earlier I had met Herb Cunitz, president of Hadoop vendor Hortonworks, which the following week achieved a billion-dollar valuation in its NASDAQ IPO. He also referenced the emergence of SQL databases as a historical precedent for today’s Hadoop market [corrected: an earlier version of this paragraph incorrectly implied that Hadoop is a NoSQL database]:

This market will progress very similarly to how the database market progressed 20 to 30 years ago.

In that progression, we are currently in the early days of experimentation, when well-funded enterprise IT teams custom-build their own applications of the technology. If history is destined to repeat itself, later on we’ll see the emergence of packaged applications and development tools built on these new NoSQL-generation platforms.

As the market matures, it will converge on specific classes of applications that have been proven to work well with the technology.

Emerging applications

Bosworth identified three types of applications currently being developed by the DataStax customer base on Apache Cassandra:

  • A fifth are legacy relational applications rearchitected for the platform.
  • Another fifth he described as “game-changing” applications that find completely new uses for Cassandra’s time-series data, such as analyzing data streams from hospital patients to find early indicators of sepsis.
  • The remaining sixty percent he said are Internet enterprise applications that bring the nimbleness of Web players to traditional businesses. For example, a retail app that guides a shopper through a supermarket on the optimal path to collect their weekly shop.


Cunitz said there are two “very common” patterns of Hadoop use emerging among Hortonworks customers:

  • Applications that take advantage of real-time data, for example credit card fraud detection. The more data that a card issuer can analyze in a short time window, the faster it will be able to shut off fraudulent usage.
  • In-the-moment predictive analytics. There are examples of this across telecoms, retail and manufacturing. Analyzing real-time data streams from a jet engine, for example, makes it possible to detect patterns that indicate an imminent failure.

He added that a number of customers are seeking to take capabilities they’ve developed and market them as a service to others in their industry:

If as a bank I become best at predicting fraud, I can provide a service to other banks and monetize it. That is another trend we’re seeing in industry after industry.

Hadoop for the enterprise

Herb Cunitz, Hortonworks
Herb Cunitz, Hortonworks

Hortonworks’ aim is to guide Hadoop into becoming the universal data platform of the NoSQL generation, said Cunitz:

Let Hadoop be the data storage layer and resource management layer that unifies the industry.

The key to this vision is the implementation of the YARN resource management layer in Hadoop release 2 towards the end of 2013. This builds on the original capabilities of MapReduce to support a wider range of data access methods, making it possible to work with other data processing resources such as Apache HBASE, Hive, Pig and Spark. The other notable component is the Hadoop Distributed File System (HDFS), which adds to scalability and reliability.

Hortonworks was founded in 2011 as a spin-out from Yahoo, where Hadoop was originally created. Its aim was to guide the continued development of the Apache standard for the enterprise market, and the team includes many of the key figures in the Apache Hadoop community. Cunitz explained:

It’s very difficult to influence an open source community unless you have committers. The reason Hortonworks spun out of Yahoo was so Hadoop would flourish as a standard in an independent company.

Hortonworks (which is named after the Dr Seuss character Horton the Elephant) follows a similar business model to Linux and JBoss vendor Red Hat. It derives revenues from providing support and professional services to the enterprise market for Hadoop. Unlike other Hadoop vendors such as Cloudera and MapR, it does not develop or market its own proprietary extensions.

Cunitz said the founders decided on this business model as the best way to build confidence in the platform’s future and encourage multi-vendor collaboration.

The best way to de-risk is to bring all the vendors into the family to drive and build the platform.

We felt we could drive this faster by bringing everyone into the fold rather than competing with them.

My take

When I think back to the 1990s, I can recall the bitter struggles for attention between Oracle, Informix, IBM, Microsoft and other SQL database vendors. I also recall the bewildering array of middleware and application server vendors that came in their wake. One of the big challenges that enterprise buyers faced then was knowing which platform to select. If you made the wrong choice, you could end up saddled with a product that was no longer supported.

Today the NoSQL market presents a similar challenge — and the fun has barely begun. Wait for a year or two and there will be a plethora of NoSQL-centered analytics toolboxes and applications competing for your attention and budgets. Selecting the likely winners will again be a tough call — with the added wrinkle, unheard of in the 1990s, that many of them will be provided as a service.

These promise to be interesting times.

Disclosure: Oracle is a diginomica premier partner.

Image credit: Elephants © kostin77 – Fotolia.com, headshot courtesy of Herb Cunitz.

    Comments are closed.

    1. Phil Wainewright says:

      Shortly after I published this article, I had a note from Herb Cunitz’s PR rep to point out that I had somewhat misrepresented him by implying that he had described Hadoop as a NoSQL database, This was sloppy of me as Hadoop is in fact a filesystem and framework that enables massively parallel computing but is not limited to NoSQL and supports pretty much all manifestations of ‘big data’. 

      Evidently I need to drill into this a bit more. My immediate reaction is that we clearly have a case of ‘horseless carriage’ syndrome going on here in that NoSQL is clearly defining the new technology in opposition to what came before. Instead of describing these new platforms as (to coin a phrase) ‘SQL-less databases’ perhaps we should come up with a new term that more accurately portrays their potential. 

      1. says:

        Well – NoSQL is something of a red herring to the uninitiated. Not only SQL is a lot easier to understand. When I first came across it, I asked: ‘Well if it ain’t SQL then what is it?’ Natural reaction you’d think.