Main content

IBM unveils its “essential” Data Product Hub as enterprises ponder how best to democratize data

Katy Ring Profile picture for user Katy Ring July 2, 2024
Summary:
A deep dive into DaaP.

data

All organizations have data, and all organizations have roles that have an insatiable appetite for data. And yet, getting organizational data from warehouses, lakes and lakehouses in an easily accessible form has not yet proved easy to deliver. If users do not have SQL skills, they cannot pull the data directly and so have to vie for the attention of expensive and scarce data analysts. Furthermore, once users navigate the lengthy request cycle, and get the data, it is often not trusted, untimely and may be incomplete. The current solution to this problem is to manage “data as a product” with the goal of serving multiple uses cases and users much faster.

What is Data-as-a-Product?

Data products are many and varied and they tend to be supported by data processing techniques that can cater for a broad audience. A business analytics dashboard, or a chatbot are both examples of data products. These are not to be confused with Data-as-a-Product (DaaP), which is a service-oriented approach, applying product thinking to the creation of datasets. It is a slightly confusing distinction, but then data professionals usually have a different skillset to marketeers and they understand the difference, even if the rest of us are slightly non-plussed.

The point of Data-as-a-Product is to nudge a shift in thinking among data professionals so that, as Zhamak Dehghani exhorted in her paper in 2019: 

Domain data teams must apply product thinking […] to the datasets they provide; considering their data assets as their products and the rest of the organization’s data scientists, ML and data engineers as their customers.

Product thinking means ensuring that datasets have a series of capabilities such as discoverability, security, understandability and trustworthiness.

Six years down the line and we are now talking about DaaP in terms of a service available to many different roles within an organization. It has evolved into a capability, empowering employees to be able to find and explore data that can enhance data-driven decision-making without waiting weeks or months for the data product to be provided, and without tying up too much of the data team’s time.

IBM’s Data Product Hub

In order for DaaP to be available within an organization (or extended ecosystem) the data product needs to be registered so that it is discoverable, and to contain metadata about the data. The data quality has to be checked to ensure trustworthiness and contextual data quality information should be provided. Access to the data products must be managed by internal governance policies.

Most organizations will already have data catalogs, but how do you set about supporting data professionals to enable the front end DaaP service to potential users? IBM’s answer lies with the recent launch of its Data Product Hub. This is an internal data marketplace to package data products so that they are easy to discover by data consumers. It can be used to create, share, discover and reuse curated data. It leverages existing analytics, tagging business domains to data products, providing a data contract to explain the terms and conditions of use, establishes an access workflow and the ability to manage the data product during its lifecycle.

Subscribers to the marketplace can request a data product to be created by filling in a form, laying out the main elements of the request and time period required. The data provider reviews the requests, can ask for more information and then goes ahead and creates the product with data quality scores. The data provider can also add information about data privacy levels and restrict access, if, say, the data contains personal information. At this point, personas can be added to approve access.

The Data Product Hub sits on top of the existing catalog and once the data product is ready, the provider clicks publish and the data product can then be searched for and discovered, along with its access level, recommended usage and data contract.

IBM envisages several use cases for its Data Product Hub: It could form part of a data lakehouse offering with WatsonX in the IBM cloud; it could be an on premises play as a data catalog marketplace; or it could be used as a data warehouse enabled for data sharing using Snowflake and AWS S3.

IBM is taking two sales options to market for its Data Product Hub: a lightweight SaaS option for an annual subscription fee, targeted at sectors such as technology, retail and manufacturing; and a client-managed on premises option for an annual licence, that is more suitable for financial services, healthcare and telecom organizations. The SaaS option is based on a specified number of data shares. A Data Share refers to the act of transferring a data product from the provider to the consumer, through urls, or other delivery formats.

My take

Clearly as the enterprise adoption of many forms of AI is only as useful as the reliability of the data on which models are trained, the focus of many organizations is back on the management and accessibility of their data. Enter the data as a product concept to offer an answer. IBM is by no means the only vendor to provide technology to enable the concept. Competitors include Informatica, Collibra and Snowflake, as well as start-ups such as the German company, OneData.ai and Tamr.com (founded by data industry veteran Michael Stonebraker). While the concept is unlikely to capture the market imagination as completely as AI has, DaaP is a capability whose time is nigh. Definitely an area to watch.

Loading
A grey colored placeholder image