Informatica wants to use Amazon-style ratings to create 'fertile' big data lakes

Derek du Preez Profile picture for user ddpreez September 30, 2014
Informatica director of Technology Operations for EMEA, Greg Hanson, explains how he plans to help customers not only organise their data, but ensure it's accurate enough to derive value.

Enterprises are increasingly being faced with a complex technology environment that requires the efficient integration of applications both on-premise and

in the cloud, as well being faced with the choice between 'sturdy' relational databases and more scalable big data technologies, such as NoSQL and Hadoop.

Whilst the IT department is grappling with the challenge of piecing all of these systems and technologies together, both inside and outside the firewall, business users are becoming frustrated with the lack of useful information being made available to them – despite being constantly told that data is the new currency.

This is the picture painted by Informatica's Director of Technology Operations for EMEA, Greg Hanson, who this week outlined the company's plans to tap into these ongoing market problems and create an intelligent data platform that manages all of a customer's data requirements.

Part of this includes the introduction of Amazon-style ratings on data quality, as well as peer-to-peer analysis, to allow business users to quickly see how useful their information is.

When Informatica launched onto the technology scene over twenty years ago, it marked itself a niche by specialising in Extract, Transform and Load (ETL) software. This developed into integration technologies, as well as broader master data management systems – whereby the company focused on measuring data quality, profiling data, measuring data completeness, as well as cleansing and enriching data.

But what really gave the company longevity, according Hanson, was its decision to focus on metadata – the data describing what a company's data looks like, helping users understand how they can use that data in a purposeful way. He said:

Becoming a metadata company is really valuable to us now. When you build something in Informatica, it is logical, it doesn't have any physical connections until the point at which you run quality, profiling or an integration service. That's important now, because if there's one thing that's predictable in this space, it is more and more rapid change. What you need is organisations to be agile enough to cope with that change.

Hanson's point being that because metadata is a logical understanding of a company's core data, it is  easier to shift and change tactic, integrating with new systems as and when is required. This focus on easy integration has extended to the cloud, where Informatica offers tools to allow companies to not only integrate fringe cloud apps with on-premise apps, but also more core platforms that are now being ripped out and placed off-site. He said:

All of a sudden you've got this new breed of applications that sit outside the firewall, as well as infrastructure and platforms in the cloud. One of the opportunities for Informatica is to simplify what is a very fragmented, heterogenous landscape, where we can offer all the services through a metadata orientated infrastructure. Being able to acquire data from all those data sources, cleanse it, enrich it, match it, master it and then make that data useable and consumable for business users. They can then make that data valuable.

greg hanson
Greg Hanson

However, this is not the end. Hanson said that the next evolution for the company is moving from data integration, to a new category of product – Informatica's intelligent data platform. The aim of this is to provide a layer of value on top of the 'managed' and organised data that is collected from a variety of sources. Basically, Informatica wants to make it possible for business users to quickly identify what is the most valuable data it has available, when it is required, before it is analysed.

As previously mentioned, Hanson plans to largely do this by providing business users with data ratings and reviews, in the style of an Amazon shop-front. He said:

The key ingredient for me when shopping on Amazon is the recommendations and the peer reviews of products, that's what drives my buying behaviour. In the enterprise, you need to be able to provide an easily searchable Amazon like query – where are my top ten customers? From the hundreds of systems in the organisation, which might be in cloud, or on premise, we can identify from a live data map where that data exists for that customer.

Here's where all the data services are available to you using customer data - but we can also give you a trust rating for that data. So, this data is the most accurate piece of data you can use for your analytical purpose. Then we can allow peer review of data, give other business users the ability to comment on top of the metadata. Allow them to provide some commentary on data sources – were they fit for purpose? It becomes the Amazon shop of data.

The data quality scoring is based on the accuracy and completeness of data and how appropriate we believe this data is for the particular purpose we are using it for – those things combined give you the intelligence layer.

Hanson said that the live data map, or the intelligence layer, is basically a reuse of a company's metadata. Or as he described it, it is about Informatica “eating its own dog food” and providing business users valuable data in a consumable format. Hanson said that the ideal situation for a company to be in, from his viewpoint, would be to have Informatica preparing data, provisioning the data, rating the data and then letting data scientists use the valuable data presented to them to start analysing new products and services.

One of the key areas that Informatica believes that this will be useful is in combination with modern 'big data' technologies, such as Hadoop architectures, which are great for scalability, but are less reliable and need expensive services wrapped around them to make them useful. Hanson's main concern is that the software and database industry hasn't learnt the mistakes from years gone by. He said:

One of the things that is said about Hadoop, which is a relatively immature architecture, is that it doesn't have all the process and layers that a 20 years old RDMS system will have. But the advantage is that it is hugely scalable and offers business users some really nice things, like natural language processing. My observation would be that we have failed to learn the lessons from 10 years ago.

What kind of data lake do you want to swim in? Do you want to swim in a lake that is fertile and supports life? And supports fertile analysis? Or do you want to swim in a lake that is fed by polluted streams of data? Which then becomes a collection of pollution that can pollute the rest of your business - if you don't have necessary process controls around quality, completeness and accuracy.

However, despite Hanson making a clear pitch for Informatica to push into this space, he does warn that deploying the technology alone is not enough for

© Mark Carrel -
s companies to succeed in building well managed and valuable data systems. The biggest challenge Informatica faces is getting companies to recognise that governance and support from the business is hugely important to success. He said:

It's one thing to have the technology, but it is always about people, process and technology. Data governance is a key aspect of that. In order to do data governance correctly, it is not just about the best technology in the market, it's all around the system, the process and the business sponsorship to put governance in place. We can help with the technology, but you also need roles like data stewards and data analysts in place - and that takes an organisational commitment, as well as a technology commitment. That is our challenge.

My Take

Informatica is trying its best to make data integration, not always the most exciting of topics, more appealing and useful to the business user. And Hanson is right, integration and the accuracy of data are some of the biggest challenges for a company that wants to use the cloud and big data to gain a competitive advantage – get those foundations right. What's the use of analysing a bunch of stored data if you're not even sure if it is accurate?

This is going to become even more pertinent as the internet-of-things becomes more popular – an area that Informatica is directly targeting. In fact, Hanson revealed that the company is already working with customers on connecting, integrating and managing devices, where he provided an example of a casino in Asia-Pacific that is putting RFID tech into all of its chips.

However, as appealing as an 'Amazon-shop of data' may sound, I would need to speak to a decent customer making use of it before endorsing the idea. It sounds like an interesting tool, but given the sprawl and diversity of data now available to business users, combined with the challenge of getting users to buy into the use of 'ratings', I'm not fully convinced.

Having said that, if Informatica can make it easy to use and can get business users to see the value of checking the quality of their data first, it could prove to be quite useful to its customers. This is a challenging topic, one that we often talk to end-users about. But as I said, would need to see it in action first...

A grey colored placeholder image