How to capitalize on data without harming its subjects?

Part two of a eForum discussion on the data economy proves to be a fascinating look at the challenges and risks associated with data platforms.

The UK wants to find new ways to unlock the value of data and energize an economy that's increasingly information-based yet underperforming. That's according to a recent Westminster policy eForum.

My previous report from the event looked at trust and the political dimensions of this challenge, comparing the government's aims with the views of consultants and the Information Commissioner's Office (ICO). Other speakers zeroed in on the nitty gritty of an economy where there is a precarious balance between public and private interests, in all senses of those words.

Sarah Snelson, Director of microeconomics consultancy Frontier Economics, explored the potential role of data intermediaries, organizations that facilitate greater access to, or the sharing of, data, while sitting between sharer and end-user.

While the online economy promised to cut out the middlemen, that was always nonsense. Instead, the digital world has created reams of powerful new intermediaries, creaming off the profits that innovators were supposed to make from connecting with users - the music and retail industries being the most obvious examples. In theory everyone benefits from the network effect, but exposure alone doesn't put food on the table.

But in parts of the data economy there are good reasons for intermediaries, beyond Spotify supremo Daniel Ek's determination to profit from every song ever written. One example is allowing researchers to work with data safely, while protecting the data subject's privacy - in short, the opposite of Facebook's model of ‘you're our product and we'll sell you to anyone - even if they undermine democracy'.

One example is OpenSAFELY, a secure analytics platform for electronic health records, which allows researchers to analyze health data without seeing it directly. Another is Oblivious AI, which again uses privacy-enhancing technology to allow scientists to work in sensitive areas. A third is the Online Safety Data Initiative (OSDI), which aims to minimize online harms, such as to women and girls. All projects that make the data the asset, not the person it may be attached to.

Snelson wanted to know if there is real-world investment in such worthy initiatives. She said:

"We find evidence of a small number of companies receiving investment funding in this space, with equity funding to date of around £71 million. We've identified 25 companies, three of which are working primarily with health data.

So, there is a bit of activity in this space, but it's certainly by no means a large amount when you compare it to other emerging sectors. This prompts the question: Why do we not see more private investment happening, if we think there's an important role for data intermediaries?

I think there are three broad reasons why this might be the case. The first is that they may only have small potential markets, which makes them less attractive to venture capitalists. The second is they could be in reasonable potential markets, but face barriers to scaling, which again makes them unattractive to VCs.

What we're really talking about there is the lack of any framework for those companies to provide confidence in their ability to enter new markets. It could be that they have a good model for one sector, but then encounter different rules and regulations. For example, if you think about moving from health data to credit card or legal data, a different set of rules comes into play. Does a framework exist for any of that?

The third reason is they may have large potential markets, but there's something inhibiting venture capital investment. It could be perceptions of that market, or uncertainty, particularly when we don't have a critical mass of these investments and companies.

Excellent points, but with a troubling subtext: initiatives that unlock innovation while protecting citizens' privacy are much harder to fund than the big social or cloud platforms.

Snelson said:

It could be that we need to support the flow of information from specialized markets, where intermediaries are developing faster than other markets. If there are reasonable potential markets but barriers to scaling them, we're thinking about what the legal and regulatory frameworks may be that can help support the security and use of data. And we're thinking about the standardization of data and investment in enabling systems, such as privacy-enhancing technologies.

Meanwhile, accelerator programmes and matched public/private funding could help where there's a perception problem in potentially big markets, she added.

An agreed language for data sharing

John Nash, Fellow at thinktank Demos, looked at key challenges in enabling the flow of data between users, companies, and governments - for good or ill.

When we were logging onto mainframe computers with a username and a password, that was the foundation, if you like, of the internet, and there's general agreement that this isn't an ideal approach anymore. It enables all sorts of problems, from fraud to identity theft, to tracking to profiling. It also makes it difficult for people to identify themselves.

An extreme example of these problems is Twitter, where millions of fake accounts, bots, and trolls attach themselves to blue ticks in order to make celebrities think there's popular support for extreme ideas. As a result, extremes appear to become normalized, and we all take to Twitter to shout at each other. A Somme of warring personalities. Kerching!

The only real alternative at present, claimed Nash, is access delegation protocols - those ‘Connect with Facebook' or ‘Sign in with Google' buttons we are all familiar with. The issue here, of course, is that this solution creates a similar problem: Google and Facebook knowing everywhere you go and becoming the arbiters of your identity, even if it's a fake one.

This gives rise to more and more questions, he said - but at least ones that have implicit answers.

We really don't have an agreed language for asking for and providing information. We have a whole lot of sub-standards, or a whole lot of banking standards. But we really haven't written the dictionary on data, if one company says I need ‘X' then can that company over here provide ‘X'?

The argument I would make is that we need to understand the magnitude of this moment and we need a new dedicated standards body funded by industry. The job is just to develop that language so that everyone's on the same page. It could apply to personal data, to aggregate data, to open data, and it would be a big effort.

The second question is who can access data? Now currently, this is a decision that's made by big technology companies - Apple decides who can put an app in the App Store, Google decides if you can access their APIs, and so on. With that, they wield a great deal of power.

The good analogy here is the way that the highways were regulated. At some point the roads became busier so they became less safe, and government decided they were going to license people to drive cars, which made the roads safer and life easier for car rental companies.

The ideal role for the state is to license entities to make those requests, and we already do that in banking. We already do that in multiple sectors where we essentially say, ‘You are fit and proper to ask for this and you are fit and proper to provide this'. But because that burden currently falls on companies, it's costly, it takes a lot of time, and a lot of duplication goes on.

What's the solution?

The third question is ‘Where is the data?', said Nash - a question that many organizations struggle to answer if they exist in a hybrid cloud/on-premises world, where a data centre might be anywhere on the planet. But Nash was talking in technical, rather than geopolitical terms.

This brings us to that discovery problem, a routing request. If you're a user and you have a relationship with, say, a bank or you have some government service that you're trying to connect with a company, there is a routing problem. This is what data intermediaries are attempting to solve, right, because the data is with the intermediary. But there's far more data that can be stored with one intermediary, so you end up back with a discovery problem again.

So, what's the solution? Nash said:

The sort of technical argument we've come up with is the idea that requests are routed by the OS provider, so the request is made, and the requesting organization is connected with the providing organization. When a user interacts with an organization - maybe it's a government department, maybe it's a company - they express their needs as standardized requests.

The OS provider then checks, or the regulator says, ‘All these organizations are licensed to provide us with information'. [The point is] no personal information is being displayed, there is nothing for the user to set up. It just works. It means that life is much simpler for companies, governments and users. If you want information, you can ask for it and get it from an organization without having to establish a relationship with them.

This would have interesting possibilities, he suggested.

DeepMind said recently that if they develop an AI that could assess your risk of heart disease based on payment and health data, that's a regulatory minefield, but also a valuable application. This provides a framework to do it. Work out the standardized requests, set the licensing threshold with the regulator, make the request, and then you can allow that very specific data to move in the right way, while offering protection for people and organizations.

In this way, the internet moves from a model of people entering or divulging personal information to lists rooted in unique identifiers, he explained.

No two organizations talk about you in the same way, right? If you're shopping with an online retailer, they can simply request a one-time payment token from your bank. And that can't be used by an attacker or sold on the Dark Web. A lot of this is already happening and falls under the privacy-enhancing technology umbrella.

Those are the three important questions. If we don't answer them in a meaningful way, then I don't think we're going to get anywhere.

My take

A fascinating discussion, and one that reveals that - whatever we do - we circle back to being in the hands of the same handful of tech behemoths.

An alternative approach might be the one suggested by Web prime mover Sir Tim Berners-Lee: virtual pods of data that are owned by the individual, and can be opened or closed on their terms alone. A clever idea, but not one backed by investors or the online community.

This is the core problem: big money attracts big money, however much we might wish for a more equitable world.