SAP HANA gets some Spark from Databricks

Den Howlett Profile picture for user gonzodaddy July 1, 2014
Databricks certifying SAP to run Spark? Wow - that's one for the books which opens up interesting questions around SAP HANA, where it goes, how it fits and the future of development in a mixed open source, closed source world.

sap hana spark
I would be remiss in discussing how open source is getting a higher enterprise profile if I didn't mention the announcement made by SAP's Aiaz Kazi at the sold out SparkSummit event that SAP HANA has a ready to go certified Databricks distribution.

Coming on the same day that Alteryx and Databricks announced a co-innovation and go to market strategy, this additional announcement provides an interesting backdrop against which to assess the progress of the open source 'big data' analytics movement alongside the part that established application vendors can play.

Ion Stoica, CEO of Databricks positions this announcement as:

“SAP HANA is both an incredibly powerful and fast analytics engine, as well as a repository for some of the most valuable enterprise data by virtue of the enterprise applications that it helps run. This integration will help enable the large and growing community of Hadoop and Spark developers and applications to harness these capabilities immediately via Spark as well as extend the reach of SAP HANA.”

SAP for its part provides examples:

For example, they can span data domains, such as applications that integrate inventory analysis with social media trends for retailers; combine sensor data with billing systems to deliver personalized resource and cost-saving recommendations for utilities; or converge patient data with epidemiological information to construct better staffing decisions for healthcare providers.

I see a slew of immediate observations and questions:

  1. SAP has implicitly conceded that open source is the way to go for large datasets. Hadoop set the stake in the ground, the open source ecosystem has piled in. There is no way that SAP can now take those technologies to itself without playing nice with the open source community. This is a clear break from the 'not invented here' past.
  2. Databricks Spark can work in any configuration the customer chooses - on-premises, in cloud - the customer decides. This meshes with SAP's core message of deployment choice but for all practical purposes, this means on-premises in the vast majority of current use cases.
  3. Since Databricks supports multiple languages, this solves the developer 'bring your own language' concern that HANA does not currently address. That in turn should mean that more developers can access HANA data via the integration. This is already showing up:

As a free download with a relatively easy integration there is no reason why existing SAP HANA shops should not dip their toes into the large data set world. The question is not so clear cut for those SAP shops that have yet to bite the HANA bullet. In answering my question about the value proposition of SAP HANA for non-HANA shops in the context of SAP HANA licensing and hardware costs, Mrinal Wadhwa said,

That in turn opens the question about ongoing SAP HANA use cases.

Quo vadis SAP HANA?

To date, SAP has variously positioned HANA as an Oracle database alternative, accelerator for its Business Warehouse and developer platform. I currently argue this announcement may well extend HANA usage but at the expense of applications which will be built where the data resides.

If the business critical data is considered to live in Databricks Spark AND it supports multiple languages with a relatively easy SQL on ramp then SAP HANA goes back to supporting BW and the Business Suite as its primary use cases.

Longer term, that means SAP HANA commoditizes itself in the face of open source development because that is where the developers will naturally go. They will therefore tend to see SAP HANA as just another data source.

SAP BW and Business Suite use cases will continue for some years to come but there will come a point where SAP will have some difficult decisions to make.

If, as seems likely, real-time or rather right-time decision making becomes the norm then it will be very difficult for SAP to maintain SAP HANA as what is now a premium product with champagne pricing to match. From conversations in the field this is already a major inhibitor among the bulk of SAP's customers.

In fairness to SAP, this is not just a software cost issue but a hardware cost problem over which SAP has no direct control. If SAP was prepared to bite the software cost bullet then it becomes increasingly difficult for hardware manufacturers to maintain a premium price position without the specter of someone piling in with commodity price offerings. In this scenario, SAP HANA gets much more widespread adoption and should therefore be much more attractive than is the case today. Taken with today's announcement, that would still represent an attractive proposition for non-HANA SAP shops and may even hold the potential to extend beyond to non-SAP shops.

Disclosure: SAP is a partner at time of writing.

Images via Databricks and Mike Olson

A grey colored placeholder image