Kubernetes and the misconception of multi-cloud portability

Profile picture for user kmarko By Kurt Marko November 21, 2019
The height of the fall event season coincides with the height of Kubernetes and multi-cloud hype. It's time to bust a few myths and misconceptions about multi-cloud portability.

myth vs fact

Container news is flowing hot and heavy this week with the Linux Foundation KubeCon event, now 12,000 strong, serving as the backdrop for no less than 70 vendor and foundation announcements by my count of the pre-event press packet.

Most of these are feature updates and enhancements to a product’s container support, i.e. routine vendor news piggybacking off a major conference to magnify their reach.

However, the overriding theme of the event is the expanding penetration of containers in general and the Kubernetes management software in specific as an application platform.

Indeed, as Martin Banks’ recent column on VMworld Europe illustrated, much of the Kubernetes enthusiasm comes from enterprises seeing it as an alternative to full VMs, particularly now that VMware has given its imprimatur by tightly integrating containers into its ubiquitous infrastructure management software.

There are several incentives for the transition from VMs to containers, including more efficient resource usage, the availability of sophisticated workload management software like Kubernetes, a robust and growing software ecosystem (as evidenced at KubeCon) and a more rapidly scaled platform. However, one of the oft-cited reasons for container adoption, easy workload portability between cloud platforms, seems based more in theory than in practice.

Banks states the commonly-held container portability case this way:

But in practice, the Kubernetes/container movement has already created an environment where it is possible to package up an application and its associated data and move it to a more suitable platform. In future years that is likely to become the common approach, a move made without even thinking or, perhaps, not knowing it has happened.

I contend that such transparent, incognizant workload movement is only possible on vanilla container platforms for the simplest of applications and that in actuality, the dream of automated, multi-cloud application migration depends on moving the platform lock-in risk up a level of abstraction, from infrastructure environments to managed container services and their accompanying workload management systems.

That is, Kubernetes, even with its associated ecosystem of cloud-agnostic add-ons, won’t be enough to provide transparent multi-cloud portability, particularly given the seduction of using managed container platforms and cloud-specific platform and application services, along with the friction of multi-cloud data movement and security policy enforcement.

The devil of container portability is in the details

With the understanding that most analogies are imperfect, here goes since it illustrates an underlying concept of implementation-specific complexity: Containerized workloads on Kubernetes are portable across clouds the same way Unix source code is portable between systems. As anyone who has ported application code between Unix platforms in the era before the ascendency of Linux on x86 can attest, there are plenty of devilish details to iron out before ‘make install’actually works.

When it comes to container/Kubernetes usage for real-life applications, I see the following issues all thwarting the goal of transparent platform portability:

  • The use of managed cloud container/Kubernetes services.
  • The spread of composite, microservice-based application designs that augment containers with cloud-native services using proprietary APIs. Indeed, some of the most attractive new application services are the least portable, such as:
    • Serverless functions
    • Managed databases, particularly of the globally distributed variety (think Cloud Spanner and Cosmos DB).
    • Packaged AI services for things like image and speech recognition, natural language transcription and translation, recommendations, personalization, content analysis and moderation and anomaly detection and machine learning workflow (development, optimization, deployment).
  • Data access, particularly for on-premises databases for which users haven’t yet provisioned network connections to an external cloud platform.
  • The difficulty of federating user identity and security policies across platforms and similar problems replicating or sharing directories and policies between cloud IAM services.

Banks acknowledges one of these issues, namely how data gravity promotes platform lock-in via following the path of least resistance when he writes:

The issues of extracting data and applications at the end of a [cloud service] contract and the possibilities that a move to another supplier will involve some degree of re-engineering – or at least re-optimising to suit the new environment – all threaten the possibility of an additional cost burden in making such a move, adding to the possibility of remaining locked in being seen as the safest option.

However, data friction is one of the easier problems to solve, and not what I see as the chief source of lock-in.

Lock-in sources abound: The ties that bind

Data movement and replication can be costly, but there are known solutions and thus, isn’t the most forbidding problem, particularly for the new generation of cloud-native applications. Indeed, the term “cloud-native” captures a larger lock-in threat, once we clear up some confusion. Many people conflate the term “cloud-native” with containerized applications, particularly those using a microservice, i.e. disaggregated design. It’s a constrained definition that I dispute since the primary advantage of cloud services is the opportunity to offload the implementation details of commodifiable functions to a service provider.

Such services naturally started at the lowest logical layers with infrastructure services like compute instances, object storage containers and network file shares, but have continually moved to higher levels of service abstraction; first to infrastructure applications like load balancers and nameservers, but later to application components like databases, message queues, notification systems, event-driven functions (serverless) and AI-based components.

Each of these cloud-specific features is the Lilliputian ropes of lock-in tying Gulliver, the enterprise developer, to a particular service provider and implementation by using proprietary APIs and other non-portable cloud features. The point isn’t that the same functionality couldn’t be implemented on another cloud, it could, since any clever feature introduced by one is quickly mimicked by the others. It’s that the implementations are different and thus require significant effort by both developers and cloud operations teams to change.

The notion of seamless Kubernetes container-based application portability requires:

  • Strict discipline by developers to encapsulate all application or microservice functionality within containers.
  • Use of standard Kubernetes implementations and configurations while carefully avoiding both cloud-specific container-as-a-service (CaaS) features and other cloud services using proprietary APIs.

Alternatively, it requires shifting the platform and vendor lock-in to another layer by adopting a multi-cloud PaaS or meta-container implementation that abstracts the management control plane from the infrastructure implementation.

Cloud convenience is already winning over container neutrality

Think my scenario is a stretch and that the Kubernetes cognoscenti have more discipline than that? Think again. Datadog used Kubecon as the backdrop for an update to its container orchestration and Docker research reports and while brief, it has some relevant insights. Notably, Datadog found that of the 45 percent of organizations running Kubernetes, those doing so on cloud platforms (likely, most), are gravitating to managed Kubernetes services.

On Google Cloud, more than 90 percent run GKE, while on AWS, about a third use EKS. What’s the problem here, it’s standard Kubernetes you say? Consider this from the Pulumi blog , developers of a multi-cloud development platform, which summarizes the portability problem of CaaS products (emphasis added):

Kubernetes clusters from the managed offerings of AWS EKS, Azure AKS, and GCP GKE all vary in configuration, management, and resource properties. This variance creates unnecessary complexity in cluster provisioning and app deployments, as well as for CI/CD and testing. “Additionally, if you wanted to deploy the same app across multiple clusters for specific use cases or test scenarios across providers, subtleties such as LoadBalancer outputs and cluster connection settings can be a nuisance to manage.

Irrespective of whether developers invoke cloud-native services from an application container, each managed container environment has different settings, cloud network interfaces and management interfaces. Sure, once you get them all set up it might be possible to move workloads between them, but what happens when you need to create a new cluster in a new region? Manual work recreating the configuration; that is unless you’ve taken the initiative to develop some automation scripts on each cloud to do most of the drudgery.

Kubernetes services by cloud platform

Source: Datadog survey; 8 facts about the changing container landscape

Datadog also found that almost a fifth of its AWS users run containers on Fargate, its managed instance service that eliminates the need to provision EC2 instances as cluster nodes. Indeed, Fargate usage has almost quadrupled in the past year and for good reason. Services like Fargate are incredibly convenient, but what happens when you want to shift workloads to a new cluster on Azure using Azure Container Instances (ACI)? How transparent is that going to be?

AWS Fargate adoption

Source: Datadog survey; 8 facts about the changing container landscape

Finally, Datadog’s survey found that 70 percent of Kubernetes users turn to NGINX for cluster traffic routing, but again, how will that change when DevOps teams get comfortable with service meshes and start using cloud services like AWS App Mesh, Azure Service Fabric Mesh and GCP Istio? How easily will routing policies and configurations port between implementations, since each is based on a different software platform and has different features?

My take

Many of the products announced at KubeCon address the portability issues identified above. For example, Datadog announced multi-cloud performance monitoring for Kubernetes clusters, Yugabyte released a distributed database that works across multi-cloud clusters and several companies updated multi-cloud configuration management and automation tools to support Kubernetes. Indeed, there’s a swarm of companies racing to solve the problem of multi-cloud infrastructure and application management, typically by introducing another level of software abstraction and dependence to handle the meta-level configurations.