Still rife with ethical issues, data sharing has a bright side too

Profile picture for user Neil Raden By Neil Raden July 8, 2020
Summary:
Data brokers and personal data collection continues to cross ethical lines. But there are bright spots - including supply chain data sharing startup Aperity. I talked with their CEO about how their approach is different, and why AI and machine learning play a crucial data processing role.

Man leaping across gap in rocks against sunburst on blue sky © GIS - Fotolia.com

In a previous article, Data brokers and the implications of data sharing - the good, bad and ugly, I wrote that Information Resources (IRI), Nielsen and Catalina Marketing have been in the business of collecting data and selling data and applications for decades.

Still, the explosion of computing power, giant network pipelines, cloud storage and, lately AI, is a fertile ground for the creation of literally thousands of data brokers, mostly unregulated and presently a challenge to privacy and fairness:

Data brokers are currently required by federal law to maintain the privacy of a person's data is used for credit, employment, insurance, or housing. Unfortunately, this is not scrupulously enforced, and beyond those four categories, there are no regulations (in the US). While medical privacy laws prohibit doctors from sharing patient information, medical information that data brokers get elsewhere, such as from the purchase of over-the-counter drugs and other health care items, is fair.

Rule #1: when data gathering and selling involves people - customers, patients, drivers, depositors, insureds, voters…ethical issues arise. Protections in the US are lagging behind other countries, but it is assumed that personal medical and even genomic information is protected. One glaring example is Michigan Medicine - University of Michigan:

The policy governs Michigan Medicine's approach to sharing of patient-level data and biospecimens with industry and non-academic and non-governmental entities. The policy's primary goal is to ensure that data and biospecimens shared with industry are collected under transparent informed consent that permits and promotes the maximal use and value of the data and biospecimens consistent with the donors. In addition, application of the policy ensures that the approach to sharing with industry is thoroughly documented and consistent across the organization.

This policy is disturbing and borders on unethical for several reasons. A patient in a clinical trial is there because existing therapies aren't working. Are they really in an emotional state to rationally consent to have their biospecimens (blood, tissue, tumor samples) loaned out to anonymous profit-making organizations and have no insight into where or how they are applied?

The rationale is that data for research is siloed, balkanized, and difficult for researchers to gather enough, with enough diversity to model the population. But, it seems, Michigan Medicine is not just altruistically providing this material to industry; they are profiting from it to the tune of $5 million so far.  

Rule #2: If you collect intimate information from people with informed consent, you must inform its use, but if you're selling it, the patient should be compensated or at least have the right to decline. Hypothetical: my tissue samples are being sold to an agricultural chemical producer to concoct experiments to demonstrate their lethal chemical are safe.

I was recently introduced to a new kind of data business, started by an old friend, John Madalon. He was the project manager for a project of my consulting company twenty years ago. He has come up with something intriguing, and, so far as I can tell, free of any obvious ethical issues. It's called Aperity.

A new type of data exchange

It's time to recognize that after decades of technology in information management and analytics, it is still challenging for participants in a supply chain to understand not only the big picture, but their role in it. In this instance, the supply chain does not mean the flow of material required to manufacture products and keep the company running.

For example, Consumer Packaged Goods (CPG) producers have an outbound supply chain of distributors, retailers, and consumers (there are exceptions, such as Walmart, which assumes the role of distributor and retailer). Agricultural supply manufacturers have a supply chain, including distributors, retailers, and grower farmers. Rather than a data brokering or aggregating data, we are beginning to see the emergence of a new form of data exchange where all of the participants the supply chain exchange information. There is mutual benefit to all of the participants.

In comparison, pulling data together from many sources and adding value to it has, in the past, required a significant investment in people and capital. The data broker business of collecting data from partners, organizations is a mature business. Unfortunately, gathering data surreptitiously is also a mature business and a problematic one. At the bottom are unregulated data brokers who collect very personal information, and provide it anyone who is willing to pay for it.

Like Nielsen and IRI, more legitimate models began gathering CPG (Consumer Packing Goods) to syndicate and provide integrated and derived data back to manufacturers. Both companies offer many more services than that now. Other data "businesses," like Data.gov or CDC.gov, provide value-added analysis as a public service. There are also collaborative industry alliance data services that manage a database of actionable data, best practices, and cost reduction strategies.

Historically, the classic model for a business gathering data from various sources has always been defining the format of the datasets in advance and working, sometimes tirelessly, to conform the data to the schema. In most data businesses, data comes into the service in a multitude of formats, semantic dissonance about the meaning of the data, and timing issues: it is hard to put data together in a time frame when the data sources have various time dimensions. 

Slowly, in the past three years or so, software to handle much of the tedious work to integrate and blend data in the enterprise is a dynamic and competitive field. The magic potion behind that is Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning for creating and maintaining Knowledge Graphs, an indispensable tool for managing and navigating massive data repositories and facilitating unattended ingestion and cataloging of data.

Because of elastic cloud services (with a lowercase "e"), rapidly expanding AI capabilities and excellent modeling and presentation tools to leverage the data, it is possible for a for a nimble company with an excellent idea to provide a superior data service without an army of data handlers. Aperity bills itself as "an innovative provider of data management and analytic solutions for supply chain partners that developed a data exchange for the beverage industry."

John Madalon, Founder and CEO of Aperity, explains:

Data comes directly from what we call "Data Providers." Anyone can be a data provider - distributor, retailer, syndicator (such as Nielsen), or whomever. What we realized early on is that we can't control how/when that data appears, the contents of that data, or the quality of that data.  So what we had to do was create a very dynamic process that can pivot very quickly to changes to anything about the data.  Our ingest tool really doesn't care how it looks coming in.  However, we have to be able to recognize these changes with as little human intervention as possible, be very precise about logging and lineage, and make sure any changes are transparent \and dynamic, and automatically roll through.

And how is that feasible?

Because everyone can use their own definitions of products, stores, activities, surveys. We made sure we built our AI around harmonizing that data to common definitions, but making sure we made that transparent to everyone.

How and what does a distributor send back to the supplier? What they sold to retailers? Other surrounding information like, I know, promotions, etc.?

We started very simply - distributor - retail shipments.  That data includes volumes, value added packaging, displays, pricing, discounts, deal levels, taxes, etc.  Anything you would see on a paper invoice.  We expanded that to include other essential information - in call data, activity.  

Madalon went on to explain that they break things into two primary buckets:

  • Data Exchange  - what people transfer back and forth
  • Marketplace - how they do it (software & platforms)

Data being exchanged:

Because we have standardized/harmonized the data and the access to the key datasets (and growing), we can act as that super highway of data flow. Historically, we focused on the following inputs that flow through the Data Exchange. With the partner network rapidly growing, that is going to be extending very rapidly….For the most part, we have been moving data from the supply chain to the manufacturer. With our expanded partner network, the desire to move data out to the trade is an important part of our growth.

Aperity operates as a partner with the exchange participants, and created a way of doing business they refer to as the "Data Exchange Five Core Tenets:"

  • Data is owned by Data Exchange members
  • Members decide what data can be shared across the Exchange
  • Members have full data visibility and traceability: of their own data
  • Members have flexibility to add new supply chain partners
  • Members deserve to receive accurate data in a timely manner

My take

Aperity does not collect data on individuals and therefore does not have ethical issues to deal with like list brokers, data brokers, and Michigan Medicine. Aperity provides participants with analytics to use in addition to the data exchange.

Aperity provides participants with analytics to use in addition to the data exchange. If those models were designed to encourage teenagers to drink alcohol, that would be an ethical problem, but that is not their business model. The world of distributed, multi-channel data integration is in fluorescence with the use of AI. Aperity is an example of that.