New Relic advances observability and AIOps - why it matters to business

Profile picture for user pwainewright By Phil Wainewright March 26, 2021 Audio version
Summary:
New Relic is expanding its observability and AIOps functionality, but what do these buzzwords actually mean for IT organizations and the businesses they support?

Cloud dollar sign under magnifying glass on blue sky © spacezerocom – Fotolia.com
(© spacezerocom – Fotolia.com)

The tech industry has a compulsive habit of thinking up baffling new buzzwords to describe ideas that would otherwise be perfectly self-evident. Take 'observability', which essentially means having an informed overview of what's going on right now in your IT infrastructure. Or 'AIOps', which just means using intelligent automation to react faster when you see something that's not right. New Relic is as guilty as any other vendor of obscuring the practical impact of its offerings with techspeak. Look beyond the buzzwords and you discover products designed to make it easier for software developers and IT operations teams to do their jobs, supporting business outcomes.

Last week saw the addition of new AIOps functionality to the company's flagship New Relic One observability suite, available to all users including the free-of-charge, entry-level edition. Anomaly detection is now automatically available, while other features such as intelligent alert filtering and automated root cause analysis can be switched on with minimal configuration. Pattern detection in log data launched in beta, too, while there are new integrations with incident management tools such as PagerDuty.

These capabilities (integration apart) all fall under the heading of AIOps — the use of AI and machine learning to automate the routine analysis of data when monitoring IT applications and infrastructure. That's a key foundation of observability, which is all about collecting and analyzing data to be able to quickly identify and solve glitches when something's amiss.

In a world where business success increasingly depends on a fast-moving, always-on digital infrastructure, such capabilities have become mission-critical. The new features help organizations to assess what needs doing and allocate resources accordingly. As Eugene Kovshilovsky, SVP of software engineering at New Relic customer CarParts.com, says in a press statement, the new features make it possible to:

... quickly bubble up issues from across the stack to allow us to take a targeted approach to determine what needs to be optimized or fixed, and how many human hours will be required.

'Get rid of the toil in software development'

Separate silos of operation and data are the enemy of rapid analysis and response. Therefore a big part of the challenge that observability seeks to overcome is the collection of monitoring data from across the tech stack, along with joined-up analysis to filter out noise and identify the cause of issues as quickly as possible.

While the latest announcements are concerned with analyzing the data and supporting decision-making, New Relic has also been extending the breadth of data it collects. In December the company announced its acquisition of Pixie, a startup specializing in collecting telemetry data from Kubernetes clusters, deep inside the application code. The aim is to help developers build robust software right from the start, as Pixie co-founder Zain Asgar explains:

We really want to make sure that we target software engineers, because they're the ones who can eventually build software that is more resilient, and observable ...

Our goal would be to try to, essentially, get rid of all the toil involved in software development.

By building telemetry that runs automatically, deep inside the container kernel, Pixie spares DevOps teams from having to wait for log information to debug and fix an issue, or needing to rewrite code to add in instrumentation. Asgar continues:

Part of what we want to do with Pixie is give you the insights and allow you to actually do these things without changing your code. So you can actually debug your applications a lot faster. And the entire goal is just [to] help speed up the entire software development lifecycle.

Avoid 'reinventing the wheel'

Building on technology called the enhanced Berkeley Packet Filter (eBPF) to capture application data, Pixie installs in just a couple of minutes and automatically starts looking at the data. A lot of the analysis is done on the spot, right in the Kubernetes cluster, to minimize the overhead and narrow down what's anomalous and needs further investigation. The aim is to provide a level of visibility into what's happening inside the application without the developer having to explicitly write code for that instrumentation. Asgar explains:

Instrumenting your code heavily is actually an ongoing burden. As you change code, you have to go add more instrumentation ...

The goal is we always provide a safety net, where if things are failing, you'll always have some visibility into what's going on. That's the ongoing burden that's taken away from from the development process.

Another goal is to build up a library of reusable scripts and workflows so that developers don't have to keep on 'reinventing the wheel' when they encounter issues. Asgar explains:

The goal is to help build up all these debug scripts and workflows, where over time, you run into an issue, you're like, 'My service talks to Kafka, what are the top types of problems that services that talk to Kafka will have?' And hopefully move away from having a lot of tribal knowledge within teams, to actually have a larger debugging playbook and database ...

I think there's still going to be a set of unique problems that are going to need people to go debug. But we'd rather have engineers spend their time debugging problems that they haven't figured out before, than debugging problems that someone's already figured out the answer for and you just don't know what it is.

My take

Why have observability and AIOps become such big new buzzwords in IT? This is not technology for technology's sake. The creation of new words is a signal that there are new needs which are not being met by existing technologies, and therefore new approaches and tools are needed. But it's still important to look behind the words and understand the underlying needs, otherwise those new tools won't be deployed effectively.

The context here is the ongoing digitalization of business in the wider move to Frictionless Enterprise. To remain competitive today, organizations need to be ever more responsive, adaptable and connected. That demands a sophisticated, rapidly evolving digital infrastructure in which many different components interact — increasingly a tierless IT infrastructure. The old batch mentality of building software, deploying it, and then poking around to see what went wrong doesn't cut it in this new world. Everything needs to happen continuously and as near as possible to real-time.

Observability is the response to this new context, harnessing technologies that are emerging to meet these new needs. Telemetry is increasingly built in to the IT stack and the volume of data that's collected is so great that it can only be handled with the help of intelligent automation, which is where AIOps makes its contribution. New buzzwords are never welcome, but in this case they represent a crucial shift in the technology world that provides a foundation for enabling business success in the real world.