New York Times ‘leans in’ to create a cloud-native culture
- Summary:
-
The New York Times had been running its own data centers, but has since shifted to AWS and Google Cloud, prompting a cloud-native approach.
Speaking at KubeCon + CloudNativeCon in Copenhagen this week, Deep Kapadia, executive director of engineering at New York Times, explained how the move was prompted by a decision to shift its applications out of its own data center, into AWS and Google Cloud.
A cloud-native architecture is defined as one that relies on containers, distributed management and orchestration, the use of micro-services, and a serverless architecture– all of which are specifically designed for cloud environments. It’s the next level of abstraction up from virtualisation and allows for greater utilisation, lower costs and better portability.
The open source technologies that underpin such an architecture – largely driven by the Kubernetes focused CNCF – are growing hugely in popularity and have seen rapid adoption amongst buyers over the past three years.
Kapadia explained that scale and flexibility has become incredibly important to the New York Times’ digital assets over the years, as it needs to build and deploy apps quickly, which can also come under a great deal of pressure over short periods of time (e.g. the election and the Olympics).
This prompted the decision to move to the cloud. Kapadia said:
We’ve had a lot of organic growth. In 2016, we decided that we didn’t want to run things in our data centre anymore, it didn’t make sense anymore. We don’t need to be managing data centers, we don’t need to be in the business of being super technical about that. So we migrated about 300+ apps to Amazon and Google Cloud. We just ended our migration of April of this year, last month. We completely shut down our data centers and in the process we shut down 140+ apps.
Kapadia said that the way that there was “a lot of issues” with the way that the New York Times built apps in the past. And the shift from on-premise to the cloud forced the organisation to rethink its approach to app development and its skill set. He said:
One of the things was, our internal IT infrastructure team, which was responsible for mending people’s desktops and computers, eventually morphed into an ops team that was managing our website, our DNS. That’s a very different skill, managing internal IT versus managing a website that needs to scale on demand. Also, there’s a change in consumption habits that required us to change the way we did things.
Driving change
Kapadia said that the organisation started to dabble with native architectures within AWS, but never got too far, as the New York Times still had a lot of expertise in running its own data centers. There was also not a whole lot of automation built into its provisioning and build processes, so it only ever went so far.
As a result, the New York Times decided to bring in some consultants to figure out how it could adopt continuous delivery across the organization. This worked in terms of new skills being learnt, but the consultants left individual teams to figure out what it meant for them - which caused problems. Kapadia said:
This resulted in the organization’s deployment stacks being completely fragmented. At one point we had five or six different ways to deploy applications to AWS. Different teams were moving at different paces, different teams had different levels of maturity. We couldn’t move fast enough.
There was also frustration amongst people who were familiar with new DevOps concepts and tools, such as containers, Kubernetes, etc.
As a result, the decision was revisited and the New York Times asked itself: what do we want to do? Kapadia decided to bring a new team together, formed of people from across the organization involved in these efforts, to figure this out. He explained:
We decided to work with the individual teams to help them improve their level of maturity, using common tooling, common ways of doing things. We formed what we now call the delivery engineering team.
The idea is we don’t manage people’s production code. We don’t take someone’s code and say, ‘here we are going to run it on this server or on this Kubernetes cluster’. We will create a pathway for them, for them to get the code from their laptops, all the way to to production. That includes common tooling, common deployment mechanisms.
We basically standardised on some ways of doing things. We basically said we want to have git-driven workflow. Github provides us with all sorts of bells and whistles to manage our workflow, so let’s use that. Let’s not push button and Jenkins to deploy manually. Secondly, we decided to move away from Jenkins. Everyone was managing Jenkins at the New York Times, and everyone was spending more time managing Jenkins than actual product building.
Instead of clicking buttons in AWS, we don’t want to do that, we wanted to be able to provision our infrastructure more quickly and easily, so we decided to use Terraform for that.
This allowed us to hire for a certain skill-set. Having those standards was really good, because it allowed us to gravitate towards one single set of tooling.
It’s about culture
Kapadia was joined by Tony Li, an engineer on the delivery and site engineering team at the New York Times, where he explained that it wasn’t just tools that has allowed the media company to adopt continuous delivery, automation and a cloud-native approach across the organisation. It’s also about culture. Li said:
Because of the different landscape compared to traditional data centers, this meant changes in the culture and methodology of development must occur if we want to set ourselves up for a good position in the future. The multi-tenant nature of cloud means that network and security have to be reimagined. How do we take advantage of various isolation levels in project organization structure to provide developer freedom? And how do we do less to get more out of the cloud?
So, with this in mind, we use common migration as an opportunity to improve how we develop at the New York Times. It was lean in and shift into this cloud-native mentality and embracing this meant improved developer experience, velocity and productivity. It wasn’t smooth sailing all the way, but here are our main takeaways.
Li used the following examples of how the New York Times changed its culture and approach to ensure cloud native was a success. He said:
- “We’ve shifted the process of how we do things. Shifting service ownership, meant that each team moved on their own schedule during this migration. Providing them with their own projects, allowed them to experiment with freedom and feel ownership for the operation and reliability of their application.”
- “Creating self-service tools for common tools eliminated the back and forth communication of the ticket based model that a lot of traditional infrastructure teams have. Documentation was critical to the success of these tools, because if people don’t know how to use them, you’re going to go back into a lot of back and forth communication.”
- “Educating developers on how to build modern apps, allowed us to make the migration process both easier and kept the door open for leveraging new solutions in the future. If you don’t have a Twelve-Factor application, you’re not going to be able to connect to that new fancy, service connection proxy, for example.”
- “Working in the open gave visibility to other who are likely doing other similar things.”
- “In person training sessions were probably the most efficient way to on-board a lot of people at once to a technology. This also helped satisfy many developers’ personal and professional training goals. Having a professional guide you through and answer any questions is a great way to boost their confidence and get them into it right away.”
- “We prefer to use open source tooling, as opposed to maintaining in-house developed tools, like Kubernetes, Terraform, OpenAPI - this way people can change teams internally without much on-boarding or friction, because they already know how to do things.”