Moving from AWS experimentation to scale
I won't get into the technical DevOps weeds in this piece, but it’s worth noting that LogicWorks is primarily a Puppet shop, though they can do Chef, etc. based on client requirements. So when a company brings Logicworks in, what challenge are they facing? Short version: they want to get serious about AWS services. McKay:
Our clients come to us in a number of different stages. Often, it's when they've done some experimentation on Amazon. They may have some lower level environments up there, and they're ready to make a commitment to move production workload for an app, or a set of apps. They recognize they don't have the deep understanding of the AWS platform internally. They write the application; business and technical needs are aligned. But they want somebody to move these workloads onto Amazon and do so in a way that's already been time/value tested.
Addressing compliance through repeatable services
So what about companies that are pursuing cloud automation but are wary of regulatory hurdles? One key is making compliant services repeatable:
We'll do things like make configuration changes that are common across our client base. We'll connect our client to centralized authentication within their environment. A lot of that is being driven by our customers' requirements around compliance. We have a pretty strong play in compliant environments, particularly HIPPA and PCI. A lot of our automation is based on repeatably covering those aspects that are required by that compliance.
So what would be an example of a compliance-oriented service offering? McKay cited intrusion detection:
We have partnered with a company called Alert Logic, which has an intrusion detection capability and a series of agents on Amazon. We're automating the installation and configuration of those agents that connect back to their appropriate appliance in a regular way. Sos that we can assure that human oversight doesn't lead to intrusion detection not running on an element in the deployment.
It’s about fusing DevOps practices with compliance know-how:
Even before we moved to Amazon, we were familiar with some of the key methods of solving compliance issues. Intrusion detection is one - now we have that as something that's kind of turnkey for clients to take advantage of. Now that that's folded into an automation system that becomes much more scalable and repeatable.
Automation steps for AWS app delivery
McKay advises customers pursuing AWS app delivery to follow this type of framework:
1. Project assessment - decide on the app that will be built/deployed - if you're using an external provider, that means hammering out a statement of work and project scope.
2. Deploy a cloud automation framework that will support the app - avoid starting from scratch. In Logicworks' case, they will build out the templates:
Our teams have developed tooling to automatically create templates that match customer requirements. It’s key that we be able to do that early. I like to say that we're less than an overlay on AWS, and we're more of an underlay for the client application.
3. Build out the skeleton of the application, maximizing the use of automated services. A DevOps provider can save customer development by building out their app foundation, or the "the skeleton of the application." McKay:
At that point, everything is already plugged into our framework, meaning that we've deployed Puppet Master automatically for them.
4. Get the application code onto the boxes - that means deciding on how the code will be delivered and frequency of release. McKay says that customers need to make a decision about how much configuration and code is baked into an image, how much gets added at deploy time, and so on:
All of that information is going to lead our engineers to make a recommendation to them about how we design the deployment process.
5. Do destructive testing - yes, "destructive testing" is a good thing, even if it's not always viable to do it consistently like Netflix does:
This is a lesson heard loud and clear from Netflix: there's no way to know your automation and your self-healing really works - unless you actually try to break it. It's rare that we can convince a client to do destructive testing in an ongoing manner like Netflix; we hope to someday. But they're certainly willing to do it prior to going live as proof of methodology.
6. Get ready for go-live - put configuration management in place - another piece before go-live is configuration management. That means fine-tuning your chosen tools. McKay addressed Puppet:
We'll set it up so that anything that is controlled by Puppet, if it gets modified, Puppet will correct it. We're automatically pulling logs into cloud logs so we have access to that data, which can be very useful for the development team.
7. Make sure checks and balances are in place - automating cross-checks on deployments is another DevOps principle. Logicworks has built another tool for this, which McKay calls "misconfiguration scanners":
Those jobs run across our client base. They're basically looking for, and in some cases, correcting misconfiguration, or things that don't conform to best practices.
He brought up a real-world example:
Let's say someone is in the Amazon console and they adapted a plan instance. They don't realize that somehow they've switched to the Singapore region. They deploy their instances there. Later on, they can't find it. They forget about it. But they've got copies of their code running in an environment in Singapore that nobody knows about. This misconfiguration scanner will run through and identify that.
The wrap - applying these principles to use cases
We ran through examples of how these principles can be applied to a specific use case. Keeping to our compliance theme, McKay shared a healthcare example, a services company doing healthcare software. They had a cost issue with unused AWS instances:
There are certain environments that are not in use for certain times, sometimes for twelve hours a day, sometimes for three months. Sometimes not in use for 12 hours a day. We set up a filter to automate shut down of the entire environment, including unhooking it from all the monitoring elements so that they wouldn't get false alarms. It effectively cut costs in half.
Another healthcare customer analyzed a cloud trail to ensure there were no breaches:
Automatically turning on a cloud trail has saved one of our healthcare clients from having to go down a long and expensive forensics path. They determined internally that someone had emailed keys for accessing an app. Because we had automatically set up our hit logging cloud trail, we were able to go quickly and look at all the accesses, and see that nobody used those keys to access it during this time. They avoided a forensic investigation.
Reflecting on my talk with McKay, the benefits of these DevOps approaches go beyond efficiency and cost savings. I like the goal of freeing up resources for higher value activities, reducing manual labor and making testing cycles more efficient. Doesn't mean such changes are easy, but it's a necessary conversation to have.