Making observability a standard practice for every engineer in 2021
- Intellyx analyst Jason English talks with New Relic CPO Bill Staples about how to achieve observability in support of today’s cloud-native applications
Just like a steaming fast pizza delivery, observability is heating up modern software delivery. If you were responsible for creating e-commerce applications at a firm like Domino's Pizza UK, you might look back on the last 20 years with sentimentality about the relative simplicity of testing and monitoring software. Sure, customers still expected a hot pizza delivered within 30 minutes, but there used to be more phone calls and human processes involved in processing and fulfilling orders atop a relatively static tech stack.
Today, when fast-moving events create high-volume spikes in demand, 80-90% of Domino's pizza orders happen through their website or an app. High performance pressures are driving the modernization of their applications onto elastic cloud infrastructure and microservices, while a pandemic forces DevOps teams out of the office and into remote collaboration.
The old methods of pre-production testing and post-release monitoring we knew from our three-tier architecture days could no longer keep up with such demands. Instrumenting a custom agent for every service or container created prohibitive labor, licensing and data costs, as well as slower releases and a production performance hit. As Bill Staples, chief product officer of New Relic, explains -
When I started talking with companies about observability, I immediately heard that the economic barriers of complexity were really getting in the way from them embracing it everywhere. They had to pick and choose what they instrumented, and what tools they used.
There are scale, cost and personnel barriers to making observability a standard practice for enterprise DevOps teams - but 2021 is shaping up to be the year they all fall away.
Moving toward open telemetry data
The first big hurdle to cloud-native observability is the telemetry data that feeds it - the metrics, logs and traces emanating from operational systems, running applications and ephemeral workloads across an extended Hybrid IT environment.
The instrumentation that generates all of this telemetry data is far from standard. It may be picked up from the operating system, gathered by various monitoring agents, or embedded in the application code itself. Staples says -
You need to be able to ingest and query any type of telemetry data from production. So the data could be collected from New Relic agents, or those of any other vendor, or system events, or an open source tool like Prometheus.
There's a lot of enthusiasm across the entire Cloud-Native community for OpenTelemetry, which merged the OpenTracing and OpenCensus codebases last year. Besides New Relic, other major vendors like Microsoft, Splunk and AWS, as well as startups like Lightstep and Logz.io have engineers contributing to the project, and many firms are standardizing on OpenTelemetry as a common data platform for collecting data from hundreds of sources. Staples comments -
It's great to see such cross-vendor support, as the future of our industry depends on it. Our own product roadmaps, issue tracking and builds are done in the open now, we're contributing to OpenTelemetry, and we can basically consider any running system as a source of telemetry data in our platform.
What's cool here is that as companies choose observability platforms that offer the best features for their purpose, the OpenTelemetry data can remain portable and useful regardless of the vendor selection. Better still, engineers find their skill investments in collecting and correlating telemetry data are equally as transferable to the next role in their careers.
Flexible price and provisioning at cloud scale
Morningstar Inc., a leading provider of independent investment research,, re-platformed its user-facing website to a serverless architecture using AWS Lambda, Amazon S3 and Amazon CloudFront connected with APIs, to existing applications and internal services - and thus required monitoring and interpretation of metrics, event, logs and traces across both new and existing infrastructure.
By incorporating full-stack observability into the transformation project by design, the firm was able to nearly eliminate downtime, while drastically reducing resolution time for site issues, even as usage volume increased.
However, such widespread observability may also come at a high cost for some companies, especially if itemized data charges are incurred in high volume cloud-native computing environments. Ephemeral workloads, and related instrumentation, may be provisioned and shut down in multiple cloud instances in moments. Staples says -
We were seeing different per-meter or per-server pricing models, and our goal was to make observability a standard practice for every developer.
So we dramatically simplified the pricing of our Telemetry Data Platform to 25 cents per gigabyte of ingest of all production data types, with low per-user costs, starting with a free individual developer tier that includes full access to the platform and 100GB of data a month.
A newly inked New Relic partnership with AWS provides another example of how the vendor is easing adoption of this multi-tenant telemetry platform, and making it discoverable within the AWS Marketplace.
Applying AI for automation and responsiveness
Consuming petabytes of telemetry data from diverse sources can only be as valuable as the ability of an organization to meaningfully query and respond to it. Staples says -
Over the next year, we're going to see a lot more intelligent services, or AIOps capabilities, to query large volumes of data. It's not as interesting for small volumes of data, but once you reach a certain quantity and scale of data, it's just impossible for a human to find all of the relevant patterns and understand what's going on.
In this context, AIOps goes beyond correlating alerts and visualizing observability data, to allow application-like workflows for proactively responding to trends with specific changes that will eliminate failure risk and improve customer outcomes.
Domino's UK was able to consume its conventional application stack one bite at a time into a modern microservices architecture, and continually shave seconds off of order-to-delivery cycle times for customers by correlating collected observability data with desired performance goals.
The Intellyx take
Previous generations of application performance monitoring often required development teams to embed their own forms of reporting and telemetry into the application code itself before release, while operations teams needed to dig through esoteric system metrics and logs to keep things running on Day 2 and beyond.
That's not going to cut it anymore for today's Hybrid IT and cloud-based applications. Fortunately, modern observability practices and tooling is helping companies keep up with the changes. As Staples puts it -
I feel lucky to be here at this time, supporting customers and partners to create application use cases on top of observability data sets. We feel passionately that every developer should be able to take a data-driven approach, and it's really cool to see what companies are doing, once they can analyze, visualize and build atop this vast quantity of telemetry data.
Perhaps in hindsight, we'll see 2020 was the year observability was set up to take center stage as a leading competitive differentiator for 2021.