Main content

Early AI adopter Informatica is set up for the generative age, but not profiting yet

Chris Middleton Profile picture for user cmiddleton May 1, 2023
The Chief Product Officer at data management specialist Informatica is excited but philosophical about the generative AI era

Chatbot assistant conversation, Ai Artificial Intelligence technology concept. Casual business man chatting with chatbot via mobile smart phone application while working on laptop computer © TippaPatt - Shutterstock
(© TippaPatt - Shutterstock)

Informatica says it is in the business of delivering “trusted data for trusted outcomes”, according to Chief Product Officer Jitesh Ghai. That message seems increasingly relevant as many organizations rush to adopt artificial intelligence solutions that have been trained on data scraped from the Web.

The Redwood City, CA-headquartered data management company – market cap $4.36 billion at the time of writing – was an early adopter of enterprise AI, launching its Claire platform back in 2016. Through it, Informatica processes a claimed 53 trillion monthly transactions in the cloud for more than 5,000 active customers, including 85 of the Fortune 100.

Yet despite this, the company has struggled for profitability, losing $54 million on revenues of $1.5 billion in FY 2022. It reports new quarterly figures next week, in advance of Informatica World in Las Vegas, on 8-11 May (diginomica will speak to the company again at that point).

For Informatica, it is not just data itself that is the critical factor, but also managing it well – which means managing metadata effectively. Ghai says:

Managing data is more relevant than ever before. There are more types of data, and more fragmentation – on premise, in the cloud, in multi-cloud hybrid architectures. So, the problem of managing data is not simply integration or engineering, it's not simply ELT or ingestion, but also data cataloguing and governance. It's also building single sources of truth with Master Data Management. And it’s democratizing data.

So, how does AI democratize data for the company? Some of today’s generative or large-language AIs would seem to be doing the opposite: scraping data that has been shared in good faith online and turning it into revenue streams for the likes of OpenAI. Ghai says:

Over the last five or so years, we've been an intelligent data management cloud provider. The Claire AI and ML engine is something we launched [in 2016]. And so, this is not something that happened because of large-language models or ChatGPT or GPT-3.5 or 4 or whatever. 

Many years ago, we recognized that data is going to be more important than it's ever been for the enterprise. There's going to be more of it, and organizations are going to want to leverage that data and use AI’s power to do that, to effectively get access to disparate data sources that are trusted – and to do that at scale. 

Our point of view is that data management needs AI. So, we are provisioning data for data science teams, to hydrate their data lakes, to curate datasets, to train their own ML models. Our AI ML engine Claire is trained against our customers’ metadata, to deliver predictive data intelligence and insights about their data. And through it to automate data management. 

We pioneered no-code data pipelines, and we can auto-generate business logic. If you point us to a set of sources and to a target, we will auto-generate the mapping.”

Still in its infancy

So, what was behind the early move into enterprise AI – one that seems increasingly prescient, if not yet profitable for Informatica? Ghai explains:

[In 2015-16] everybody was trying to build data lakes with Hadoop. And the promise was ‘Put all your data into this and you can democratise it and do whatever you want’. But it wasn't a lake, it was a swamp. A data swamp that needed metadata. It needed curated metadata. And to do that you needed to apply AI and ML to catalogue and index the lake to make it productive. 

So, AI and ML was the logical addition. We're the benefactors of being in this space and are exclusively focused on it.

Has Ghai been surprised, and perhaps even concerned, by the sudden rush towards generative and large-language AIs over the past few months? A popular enthusiasm that, on the face of it, would seem to have thrown good data management out of the window – for some users of these technologies, at least, out there in the wilds of hype? Ghai says:

What's surprising to me as a technologist is the simplicity of the transformer model. You get the enormous potential and power of it – there's a sense of beauty in that, you know. That something so simple can be so transformative. In the AI space, that is interesting, surprising. 

With the application of transformer models – generative AI, text, images, video – that's hugely transformative and exciting, and I believe it has the potential to be an internet-like disruption. I believe we're on the cusp of something similar with AI-driven productivity.

No doubt. But the concept of trustworthy data – long Informatica’s focus – would seem to be being undermined by the availability of popular tools – toys, almost – that encourage users to play with unreliable data, and yet trust the results. Does that worry him? 

Meanwhile, a new generation of GPT-based enterprise tools is emerging, which allow users to query their own trusted data with natural-language prompts. Their CEOs are fond of saying, “Just throw your data at our tools, you don’t need to catalogue it or make it consistent”. That must grate with Informatica? Is it in danger of being like one of those houses that people build motorways around? Ghai says:

From my vantage point, I'm not concerned, because in my conversations with CIOs and CDOs we all acknowledge the immense excitement and innovation in AI over the last few months. But we're all being very thoughtful of how we plan to leverage this technology within our organizations. 

It's clear, everybody sees that there's something tremendous happening, but it's in its infancy. It's one thing to automate your calendar, or to curate your playlist, but it’s a whole other ballgame to work with enterprise data, with the mission-critical data that businesses run on. 

Vector databases are not going away, SQL engines and Spark query engines are not going away, and data management is more relevant now than it will ever be. Because it is needed to make these transformer models more productive. And to do that you need trusted enterprise data. 

So, enterprises are intrigued, but also thoughtful and deliberate. There are all sorts of lawsuits out there – around image generation and code generation, because the generated code was the proprietary IP of some other organisation. There's a lot of governance on the generative AI front that still needs to be sorted out. But there's nothing wrong with that. It is just part and parcel of a hugely transformative technology enabler.

He adds:

Everything our customers do persists as metadata, so we understand deeply their data nervous systems, because we deeply understand how they're managing data, how they're moving data, and where it's going. Because we are part of them realizing their multi-cloud strategy, their hybrid cloud architectures, of them embarking on those and delivering their digital experiences. 

We catalogue all of their enterprise data, index it, organize it, and where we're able to, give insights and automate the cataloguing and classification of datasets. That's the whole promise of AI-driven productivity: taking thousands of hours and making them minutes!

We’re excited about generative AI, because it allows us to take a technology that's hugely transformative and make it immediately productive, but without having to worry about the ethics and governance of it being trained on the Wild West of the public internet. We are taking it and applying it to a very specific problem of data management experiences.

My take

An intriguing taster for Informatica World later in May, when we will return to this conversation and see what new Claire tools are on offer.

A grey colored placeholder image