Main content

The never-ending story - Dynatrace takes on data wrangling in an age of gen AI

Martin Banks Profile picture for user mbanks February 14, 2024
Summary:
Moving, managing and wrangling data in the new age of AI is now even more important, given the volumes that have to be moved, the speed at which AI solutions can work and the potential for disaster that follows if there is anything seriously wrong with the data being used. This was the focus of the latest Dynatrace Perform conference.

AI and connected business tech © Funtap via Canva.com
(© Funtap via Canva.com)

The more AI becomes an integral part of mainstream IT activity and business management, the greater the amount, scope and variety of data that needs to be consumed, analyzed and acted upon, and greater are the chances that users will make a mess of things. Such thoughts lie behind the latest developments to come out of Dynatrace, and announced at the company’s recent Perform conference in Las Vegas.

Add to that backdrop the exponential factor underpinning the IT industry’s now standard battle cry  of transformation.  Every business now really does need to be thinking about this if they are not to gently fade away. But the point that even those that have started down that road may have missed, is that once started it becomes a never-ending story - it will go on and on, and get more complex with every iteration. The appearance of AI as the technology all businesses 'need' now, rather than being a good idea for some time in the future, adds a whole new dimension to the complexity of ongoing transformation and the consequent scope for trouble to arise.

The fundamental issue the conference highlighted is the fact that not only is AI becoming endemic, it is not a single entity. Most users will end up specifying tens of different AI solutions for different tasks, which then have to be paired with specific datasets and sources that each will require. Managing this increasingly huge melange is going to create a new breed of developer-specialist: the equivalent of an artisan chef capable of taking that melange and creating an entirely new and original cuisine of dishes. 

As Dynatrace CEO Rick McConnell pointed out in his keynote the expectations of what will emerge are humungous. He talked through some of the predictions that have come from market and technology analysts, Gartner, to highlight the enormity:

The numbers speak for themselves. You look at the cloud hyperscaler growth, and even just yesterday afternoon, Microsoft and Google reported their results, growing the cloud, respectively 28% and 26%. Those businesses continue to grow, and if you look at the three major hyperscalers you see that the industry has now eclipsed over $200 billion in annualized revenue, and over the last two years that's grown north of 50%. Incredible. Over this two-year span, we've seen an increase in growth of over $70 billion. And that's due to the engagement of each of us in driving on-premise workloads into cloud workloads.

There is little sign of any slackening of that trend, and indeed AI is only going make it grow even faster. It follows that new specialist tools will be needed to manage the ensuring complexity, and Dynatrace is already well-established in pitching itself as a significant provider of such tools. The company’s co-founder and CTO Bernd Greifeneder shows no signs of reducing the development pressure. This year’s batch includes new observability tools for both AI operations and data movement, as well as a new centralised data pipeline technology designed to work with new AI tools aimed at allowing users to build and manage what he calls "hypermodal operations".

This is where multiple independent modes – for example processes using different technologies or applications – need to be run both simultaneously, and often collaboratively, across that single business environment. As McConnell observed:

With generative AI, Artificial Intelligence seems to be coming as ubiquitous as digital transformation. And it is becoming mandatory to evaluate how you can deliver this kind of productivity. I also like the talk about the notion that generative AI is enhanced by other AI techniques. It is not shocking to see the complexity in our environments is driving greater risk. What's more interesting, perhaps, is this assumption that AI is becoming more and more crucial in driving threat protection.

That amalgam of various AI tools is likely to bring a complexity that may be hard visualize right now, but it will be an environment where it is easy to make mistakes setting it up and building the required co-operation between the elements, so the ability to observe and remediate as you go is now mandatory - a must-have-or-risk-everything situation.

Observing AI

Greifeneder’s starting position when it comes to observing and managing AI is that fundamental - and elusive - requirement of all businesses – a single source of truth across all departments. He sees AI as a collective source of increased power in their ability to deliver it and achieve greater levels of collaboration. The Dynatrace route for delivering this is the use of three types of AI, he explains: 

It is a unique combination of predictive AI, causal AI - that causes not just correlates - then generative AI. All three sum up a hyper-modal AI because we use all the tools’ modalities of data sources to process. But this is also precise, and that is important for automation; at the end of the day you need to survive and automation is the only way to do it. So to help you with this, we have created over 100 use cases ready-made to get going, with also over 700 integrations to get going quickly.

According to Chief Product Officer Alois Reitbauer, security plays an integral role in AI observability in order to help users meet their specific regulatory and geographic needs such as HIPAA and ISO 27,001. It plays a major role in the delivery of Davis Hypermodal AI, based on the company’s long established  Davis AI platform and its existing predictive and causal AI capabilities. This then works with last year’s introduction, Grail, which is the single point of truth storage system for all data about operations, applications and issues associated with running the business.

The AI Observability component aims to cover much, if not all, of the end-to-end AI stack a business will be using. This includes orchestration frameworks such as LangChain as well as support for the major platforms used for building, training, and delivering AI models. At the other end of the scale it can observe infrastructure elements such as GPUs, and foundational models such as GPT4, plus semantic caches and vector databases. Reitbauer explains: 

It gives you access to the Davis Hypermodal AI for all of your use cases. So now you can really ask anything that you care about. But with you, every new question you ask comes with analysis, with prediction models, and things we need to do.

Piping aboard

All of this, as Greifeneder pointed out, means more and more potentially important monitoring, and working, data is being is being generated and delivered from a growing range of different sources:  

What do we do with it all? No one agent can handle all this, but the customers want to send from other data sources more, so the challenge you have is when did you count all your data sources and tools that you have to collect data? I mean, think of lengthy crypto, think of all the different `Do It Yourself’ tools. Think of all the ingest pipelines and the maintenance efforts to keep them and the normalisation all intact.

The Dynatrace answer is called Open Pipeline. Announced by him at the conference, he said the goal is to give users a high performance, real time stream processing engine. When working with the data from one agent, the normal process is to use open telemetry or APIs to get data into Dynatrace, but this can miss out the transfer of the data security, and this is what Open Pipeline is specifically targeting. He adds: 

Forget those issues with certificates, certificates are out of date, or how to load balance data. Open Pipeline is taking care of this in order to think of the scale. So far, in Dynatrace, you can ingest 250 Terabytes per day. With Open Pipeline, we pump this up to 500 Terabytes per day. And by the end of the year we will be at 1,000 Terabytes per day. But it is more than just a data dump. Open Pipeline has a patented and high performance rule engine that allows us to contextualise the data to a high quality.

This includes filtering, masking for security, privacy management and normalisation, as well as end-to-end user experience metrics. It is conducted without a speed penalty. Users can transform the data so that they use only exactly what they need, available at what he claims is the highest quality and with context included. The end result should be data is ready to deliver the best analytics results possible.

It comes as an addition to the existing Dynatrace environment that, once introduced, will be automatically added, where it will then feed the contextualised data directly into Grail and the use cases being built up for observability, security and business. This includes data concerning the variable number of non-functional requirements users now need to meet, such as Lock Data and Scalability.

Observe data

Having come up with a way to pipe in good data in vast quantities, the corollary is the need to observe it, which is provided by Davis. 

This covers a number of key issues typically found when ingesting external data. These include freshness, where the age of the data, particularly in automated processes, can have a significant impact on the accuracy and effectiveness of the process. Another one is data volume, particularly where large datasets may get interrupted and restarted. Schema and Tracing are also important parts, for they help identify if, and how, a data source has changed. 

Another important part of data observability is the user experience in bringing data out of the Grail storage solution and delivering information and insight of value to a business. The company has continued to develop and expand its data exploration tools with the introduction of more ready-made dashboards to its existing library of data visualisation tools. These are seen as a useful way to rapidly build and share insights by selecting a template, rapidly creating a dashboard and customising it, and then re-using it. 

My take

What Dynatrace introduced are new tools and services aimed at strengthening the tools that give developers and CIOs insights into what is happening, real time, with their AI tools and their data, as well as how multiple streams of data are shipped to where the AI needs it. This may require a new breed of ‘chef’ capable of creating entirely new cuisine of processes and applications. In a follow-up article, I will look at what is likely to become a new professional specialism those chefs will need to acquire – Security for AI systems

Loading
A grey colored placeholder image