Meltdown and Spectre underscore the ongoing need for infrastructure automation
- Summary:
- The Meltdown-Spectre incident demonstrates, the need for rapid, repeatable infrastructure updates.
The cause is rooted in hardware design decisions made years ago to improve performance, meaning that a conclusive fix likewise requires redesigned hardware. However, the threats can be effectively eliminated through software patches to firmware, operating systems, libraries and compilers. Thus, following the unexpected and premature public news of the vulnerabilities’ existence, hardware, OS and cloud vendors issued a flood of announcements detailing their response and mitigation steps.
I've summarized these steps here, along with links to the original research detailing the threats and methods for neutralizing them via software patches.
Following an understandable period of fear and loathing, the next, crucial step for IT organizations is hastening the job of fixing the mess. Martin Banks said it well here in writing (emphasis added),
It is obviously important that CIOs ensure that all relevant patches are installed as soon as they are available, not least because now this story is out in the public domain, every cybercriminal will be considering an exploit or two before patching is completed. This may be one of those occasions when patching takes precedence over any other work in the IT department, including running normal production workloads.
Patching needs to be integral to the deployment process
While the extent of the Meltdown and Spectre vulnerabilities is unprecedented — to varying degrees affecting every server, PC and mobile device in use — significant holes in infrastructure and software security are so common that patch fests have become routine events. The fact that these predictable circumstances are still fire drills in many organizations is as much a failure of management as it is product quality and best summed up by the adage, "A lack of planning on your part does not constitute an emergency on mine." The chip flaws should be a wakeup call for organizations to build automation into their infrastructure to minimize or eliminate the overhead of software updates, because we all know these won't be the last, nor likely the most serious security patches IT organizations will face this year.
Of course, patch automation software that essentially scripts manual processes has been around for years and remains the best method for traditional enterprise servers and PCs. However modern infrastructure comprised of virtualized hardware resources and service abstractions like container clusters, public cloud and other RESTful services offer a much better way of incorporating software updates into the fabric of application platforms and IT operations by using infrastructure-as-code systems like AWS CloudFormation, BOSH (Cloud Foundry) Puppet or Terraform or continuous integration and delivery (CI/CD) pipelines like Concourse (Cloud Foundry), Jenkins or Travis CI.
Indeed, the immutable infrastructure deployment model incorporates patch updates into the application release cycle so that each time an application is updated, it automatically inherits security changes and bug fixes to the core OS. Immutable proponents sometimes call it the “made to order pizza” model, (in contrast to the pre-made “frozen pizza model” of a VM image for each application) where any change to an application module or dependency triggers an automatic rebuild of the entire stack from known good components using a few golden OS images, often provided by the cloud service which ensures that running instances are always using the latest patched system software.
PaaS stacks like Cloud Foundry lead the way
Having to manage millions of machines, hyper-scale cloud vendors have pioneered the use of uber-automation through such software as Google’s Borg. Learnings and technology from these internal cloud systems have made their way into cluster managers like Kubernetes, PaaS stacks such as Cloud Foundry and many other open source CI/CD products. Indeed, Cloud Foundry’s release management tool, BOSH, exemplifies how to manage software releases and OS updates in the era of modular, distributed, cloud-native applications.
BOSH uses the concept of a stemcell, namely a versioned, bare bones OS with a few commonly-used utilities, as the foundation for application deployments. A strict demarcation between OS and application code not only enables Cloud Foundry applications to be portable across infrastructure, easily moving between on-premise infrastructure and various cloud services, but to rapidly and repeatably incorporate, test and deploy OS updates without disrupting production applications. For example, when updating Windows servers, instead of building patches into VM images for each application, BOSH facilitates incorporating them into a base stemcell used for all Windows applications. As Pivotal describes its release process,
This allows release creators to certify that their software runs as expected with the newest patches. For us [IT infrastructure admins], the workflow is easy as changing the stemcell version in our manifest, and running `bosh deploy`. The security updates will roll through the system just like a release update — new, patched machines will be deployed, and old, unpatched machines will slowly transition their workloads over and disappear.
Shared PaaS services like Azure App Service and Google App Engine (GAE) go one step further and insulate the user from any need to manage the underlying OS or application environment. Indeed, in announcing its measures to thwart any potential Meltdown and Spectre attacks, Google stated that its PaaS and SaaS (G Suite) products were already protected.
Relationship to CI/CD pipelines
Immutable infrastructure uses continuous integration and delivery development processes and automation software to provide structure, governance and consistency to application updates and deployments. In the Cloud Foundry scenario, these are embodied by BOSH to automate the infrastructure resource, namely VMs, container clusters, virtual storage and networks, configuration and deployment and Concourse for the development pipeline.
Together, these enable organizations to rapidly and consistently patch all applications using the PaaS environment. For example, patches can be applied within hours of availability, often, as Ford Motors does, during business hours. Indeed, as this presentation illustrates, a Concourse pipeline can even be used to upgrade the underlying Cloud Foundry PaaS stack, no matter if it's on an internal VMware server farm or AWS. As I detail here, because CI/CD tools facilitate code and patch versioning, they make it easy to deploy new software to test environments, and upon validation, automatically rolling the tested bundles to production.
My take
Most people think of PaaS as a convenience for developers. Pre-built, just-add-code services like GAE and App Service simplify application development by eliminating the need to provision and manage low-level infrastructure. However, as the Cloud Foundry examples with BOSH and Concourse demonstrate, the automation intrinsic to PaaS infrastructure also yields significant benefits to IT operations, reducing the administrative overhead, maintaining consistency, and speeding patch deployment.
PaaS and a Ci/CD methodology don't address the vast trove of existing enterprise applications. However as organizations redesign or retire applications over time, they should factor the benefits of a structured platform and infrastructure framework into their decision making. While many will rightly tout the benefits of faster development cycles, as the Meltdown-Spectre incident demonstrates, the need for rapid, repeatable infrastructure updates is even more important in an era of escalating and more pervasive security threats.