The 10 principles of observability for modern applications
- Tactical monitoring of enterprise technology is no longer enough. New Relic's Buddy Brewer and Alberto Gomez set out the 10 principles of full-stack observability
Observability is an increasingly important concept for enterprise technology teams. That's because, even as the rise of new technologies and approaches — the cloud, DevOps, microservices, containers and container orchestration, serverless, and more — are helping to increase velocity and reduce friction in getting from code to production, they're also creating complex new challenges.
Facing that kind of complexity, true observability — not merely tactical monitoring — is key to fully understanding what's actually happening in your company's software and systems. But what does that mean in the real world? What defines what it means to be a modern observability platform? We think it's a good idea to start with these 10 core principles.
1) Curation vs. participation
A modern observability platform must excel at curation — cutting complexity down to size, and selecting and presenting relevant insights for its users. But a modern platform must also excel at supporting participation, making it easy for users to bring custom metrics and data sources into this process.
Curation and participation are equally important in a modern observability platform. Curation can give teams a critical productivity and efficiency edge — the smaller the haystack, the easier it becomes to find the needle. (New Relic customers, for example, might recognize our distributed tracing anomaly detection or Kubernetes cluster explorer as examples of how curation helps to achieve observability.)
Participation, on the other hand, puts a premium on versatility — capturing and massaging data in valuable ways even when the platform doesn't know how to shape or present that data. It also relies on programmability — giving users the tools, and especially the APIs, to help them help themselves.
2) Support power users
Power users are an important part of any product's user base. They're the ones most likely to use — and appreciate — the deeper capabilities that set a product apart from its competitors. And they're often a product's most respected and effective champions.
When it comes to application monitoring and observability, power users tend to have very tough and demanding jobs — many of them, for example, practically live in their integrated development environments (IDEs). These users want to automate everything, and benefit most from a programmable and extensible observability platform — for example, via APIs that allow them to consume data (creating custom metrics, say) in addition to injecting data for the New Relic platform to use.
3) Applications rule
What matters the most to customers is whether their application is healthy or not. And if it's having problems, they want to know where those problems are coming from, quickly.
The message is loud and clear: An observability platform is most valuable when it's focused on measuring application performance and on surfacing application-performance roadblocks.
4) Embracing change
The pace of change in the observability space is breathtaking, and observability solutions must make tough decisions about capabilities and priorities. The plans and features that made sense six-months ago may no longer be relevant, so while product roadmaps remain important, observability solutions must be able to constantly adapt to the realities of technology innovation and a competitive marketplace.
5) Full transparency
Sometimes, observability requires a comprehensive, high-level view of application performance. Other times, it's all about the ability to drill down into very granular details with no surprises and full context.
A good observability platform delivers both of these capabilities, with a fully transparent path for moving between high-level and lower-level views — one that's predictable, consistent, and intuitive.
For example, let's say that you're looking at a summary view of performance in a time-series chart. You notice a spike in errors, and you want to know more about what's happening. You should be able to drill down from that summary view into the underlying data — to view unhandled exceptions, or even to view the stack frame or lines of code that introduced the error.
Just as important, that granular view should show the useful metrics you expect to see along with the context you need to understand what's really going on. This type of transparency is especially important in high-stress, high-urgency situations where dev and ops teams need to focus on fixing the problem — not on finding it.
6) Nobody knows everything
Observability is not like a Hollywood movie: The days of monolithic applications that a single person could fully understand — from soup to nuts — are gone. There's no hero genius riding in on a white horse to save the day when you have hundreds or thousands of things to observe. In complex modern environments, even the best on-call engineers may have a slice of understanding, but probably not a complete view of everything they need to track.
At New Relic, for example, we have more than 60 development teams, making it well-nigh impossible for anyone to have a truly up-to-date and complete understanding of what every team does and of how their projects are progressing — and the biggest enterprises may be orders of magnitude larger than that.
That's why a modern observability platform has to provide enough information for whoever is on call — not just some mythical support hero who knows all and sees all — to find and fix the problem.
7) Easy to start
Time to value is especially important in an observability platform — which teams rely on to solve their most urgent and expensive application problems. But quickly getting started out of the box isn't always easy, as observability platforms increasingly take on more sources of data and cover more use cases.
An observability platform should be constantly updated to make more elements, such as new user agents and new metrics, trackable right out of the box. And it should strive to make the out-of-box experience as intuitive as possible, knowing that many teams, for better or for worse, will first experience it while actually working to resolve an incident.
8) ‘Fast' is a feature
For a modern observability platform, getting the right information to the people who need it quickly is supremely critical. It can be the difference between solving a problem before it affects customers and potentially losing thousands — or even millions — of dollars in revenue, not to mention the damage to a company's brand image and customer relationships.
But moving fast isn't just about going fast; it's also about precision, and reliability, and responsiveness. Sure, it's essential to minimize ‘time to glass' — the critical gap between when an event happens and when a platform issues an alert. Within this process, however, there are a lot of moving parts involved — from detecting a problem, to alerting the right team members, to providing actionable information — all of which must come together and work right now. It's especially important, and often quite challenging, for an observability platform to deliver relevant and targeted alerts, and for vendors to respond promptly when customers have questions or concerns about these critical capabilities.
9) Open by design
Open systems and standards, such as the recently announced OpenTelemetry project, are becoming increasingly central as modern enterprises work to manage complexity, reduce friction, and avoid vendor lock-in. New Relic is fully invested in bringing OpenTracing, OpenCensus, and OpenTelemetry support to our customers, so that they can access and visualize all their correlated telemetry data, including custom metrics, through New Relic distributed tracing and the New Relic One platform.
The goal should be to allow customers to move more quickly and with greater agility, even as organizations learn more about customers business needs and priorities — all worthwhile objectives for any modern observability platform.
10) It's all about the platform
Performance issues, however, aren't always polite enough to stop where point solutions can find them. Many front-end problems, for example, originate deep in the application stack or even within infrastructure issues.
As applications and infrastructure continue to get more complex, a full-stack observability platform becomes even more important.