I can't help wondering, will these two worlds ever converge?
Forklift to the cloud
Speaking at the SIIA All About the Cloud conference in San Francisco last week, Adrian Cockcroft, director of architecture for the Netflix cloud systems team, acknowledged that most organizations — and IT vendors — are still putting their energies into 'forklifting' existing applications into the cloud. "Most of the money is in that 70-80 percent of people trying to get forklifted stuff to work," he said. "But we think the future is in that 10 percent of cloud-native."
Is it realistic to imagine that one day, even hardcore enterprise applications will share the same architectural principles that Netflix espouses?
HANA but not cloud
Even as a 'forklifted' effort, SAP HANA Enterprise Cloud is barely off the ground. Putting cloud in the name is frankly a misnomer. It is a managed service, what would have been called an application service provider (ASP) offering, back in the days when ASPs were still fashionable. The fact that it runs on partially virtualized infrastructure using a sophisticated IT automation platform merely reflects good practice in any modern datacenter. The fact that it is hosted across a global network of (soon to be) seven datacenters makes it a suitably sophisticated service for SAP's large enterprise customer base.
There is nothing wrong with it as a managed service designed to help customers get started faster with their HANA implementations. In all these characteristics, it resembles many of the 'private cloud' implementations its customers are already building in their own data centers. Like them, it is barely recognizable as cloud to those who are steeped in public cloud architectures.
Cockcroft in his presentation outlined what he saw emerging as the typical architecture of a non-native cloud application when forklifted to the cloud. At the client layer, the architecture has been tweaked to support what he calls the "agile mobile mammals" of iOS and Android smartphones and tablets. In the middle tier, there's a "cloudy buffer" of application servers that handle the interactions with clients and hand off the results to the "datacenter dinosaurs" of the final tier — whether it be MySQL databases or legacy apps. Substitute HANA at that tier and you've just described SAP HANA Cloud Platform, the PaaS development and operating platform that runs on top of SAP HANA Enterprise Cloud. At least this platform looks more like a cloud offering to those who consume its services. But the trouble with this non-native architecture is that (because of its heritage) it's tightly coupled. "Typically, if anything goes wrong, you've lost the whole thing," said Cockcroft.
Architected for imperfection
Compare that to the Netflix architecture, which is architected to tolerate imperfection — in Cockcroft's words, "a highly agile and highly available service constructed from ephemeral and often broken components." It is a service-oriented architecture built out of micro-services, none of which are essential to the operation of the whole. When a user initiates an interaction with the Netflix application, that action typically invokes hundreds of connected services in the infrastructure. If any of them fails, that piece of functionality simply isn't offered — for example the customer may get slightly worse recommendations for that instant but still continues to use the service. Netflix constantly tests for resilience: "We have chaos engines that are killing these services off to show that they're resilient."
The software is written to run across three separate Amazon datacenters, and will tolerate the loss of any one. To ensure data integrity, there's no master-slave arrangement: all reads and writes have to run in two out of the three datacenters using a quorum process. "We can lose a third of our infrastructure without our customers noticing and calling customer services," said Cockcroft. It's no idle claim; Netflix even tests this aspect of its infrastructure. A few weeks ago the team deliberately killed one of the three zones, knocking out 3000 servers in one fell swoop, "just to prove that we could do it," said Cockcroft. "Unless you test this kind of thing in production, you don't know that it will work."
This approach may work for online movie recommendations and rentals, but enterprise application architects will shudder at the thought of randomly killing services during mission critical operations. The model needs further evolution before it can be trusted with, say, financial transactions. Would it do, for example, to tolerate a component failure that temporarily resulted in slightly less reliable credit rankings or stock-on-hand quantities while processing online credit card transactions? These and other trade-offs and granularity decisions would have to be considered. But that's not necessarily a reason for ruling it out for all time.
The trade-off that Netflix has made is to embrace chaos and imperfection in exchange for greater agility and responsiveness. "What we really want to do is think up a code feature and ship it in days. When we want some hardware provisioned, we do that in minutes. When something goes wrong, we need to detect that something went wrong in seconds and fix it in seconds," said Cockcroft.
Running on Amazon's platform rather than a Netflix datacenter is a similarly pragmatic choice. "We could run our own datacenters but we choose not to," said Cockcroft. "We'd much rather commission a new TV show rather than sinking our money in a new datacenter that we won't be able to use for another 18 months. We prefer the flexibility of pay-as-you-go infrastructure."
Disclosure: At the time of writing, SAP is a Premium Partner of diginomica.
Photo credit: © fergregory - Fotolia.com