Main content

How the UK water industry is collaborating to contextualize water data

George Lawton Profile picture for user George Lawton February 9, 2024
The UK water industry is collaborating on open water data to help contextualize challenges and opportunities for improvement. Their work has important implications for improving the environment and other sectors that could benefit from open data collaboration.


The UK water industry has been in the headlines recently for all the wrong reasons. Major newspapers in the UK have been highlighting stories reporting on a growing number of water discharges and pointing the finger at over-compensated executives and investors, chronic under-investment, and suggesting collusion with regulators at posh clubs.

Although there may be some substance behind these narratives, they also tend to gloss over the wider context constrained by limited reporting and the most significant opportunities for improvement. 

The industry hopes to change that by collaborating on the Stream program. They are starting to help standardize data formatting, metadata, licensing, and best practices for relatively straightforward data on drinking water quality. But in the long run, the work could inform a shared knowledge graph that could be combined with other data about the impact of climate change, new housing, land use changes, and watershed management practices on river health. 

I sat down with Melissa Tallack, Managing Director at C2Life, who previously led data and digital services at Anglian Water Services, and Andrew Myers, Lead Architect at Northumbrian Water. They described their progress, challenges, and opportunities of this important work and some implications for other industries that could improve from better data collaboration. 

Starting with trusted processes

Aside from the data formatting challenges, it’s important to appreciate the risks of increasing data transparency. Tallack explains:

We have to never forget that we are part of critical national infrastructure. So, some of our data relates to the infrastructure that we own and maintain. And so risk assessment and understanding the risks associated with opening and sharing data has to be fully understood, which is why through Stream, we're building common processes so that we embed best practices into that risk assessment and mitigation, so that as an industry and as individual companies within that we remain responsible data stewards.

An essential first step in the Stream program lies in bringing in the lawyers, risk managers, and executives to build a trust framework. This also included developing a process for ‘triaging’ the data, which means figuring out how to safely share the data in a way that could still provide value. For example, rather than just sharing raw drinking water quality data by address, they aggregate it by neighborhood to protect customer privacy. They also need to protect sensitive data that may indicate the output of a manufacturing plant or changes at a military base. Myers says:

One of the things that we're trying to do with Stream is to centralize that and get an agreement across the sector rather than every company’s lawyers spending tons of time going through the pros and cons and the ins and outs of every little possible nuance. We want to try to get a common picture so that not every company has to commit tons of time for lawyers or tons of time for external consultants to help with that. So that was one of the key work streams we've got as part of the Stream project. They are not anti-open data; they fully get it, but we need to do it in the right way to make sure we are following the right steps. 

We know what we need to do to protect personal data, any data that may be commercially sensitive, and any critical national infrastructure. But equally, do not use it as just a barrier. We want to try and make sure we're aware that we've gone through that assessment process, but you then want to get as much published as we can.

Adding context

It’s important to note that the water industry has regulatory obligations to respond to requests in response to environmental information regulations. They can do so in numerous ways to meet basic regulatory requirements, such as curating a PDF document or Excel spreadsheet in various formats. However, reactively responding to the bare minimum also means the data lacks broader context within or across water systems. Tallack says the low-hanging fruit lies in automating ways to share commonly requested data and adding relevant context: 

If somebody comes and requests a particular data or environmental data set, then we are obliged under that regulation to provide that, so it makes sense to look at commonly requested types of information under that process and proactively publish that data. That's the low-hanging fruit. It's stuff that we can proactively publish instead of reacting to individual requests. The beauty of being proactive about that is that we can also publish contextual information to support people in using that data. If we're just responding transactionally to a request, we have to make that request and nothing else.

So, proactively publishing data along those lines enables us to provide supporting information and context that will help the person who requested it to utilize that data, avoid misinterpreting that data, and creating false insight.

For example, much of the popular press on this complex problem focuses on data indicating growth in the number of discharges, suggesting this might consist of concentrated and untreated toilet water and industrial effluents. But these narratives also gloss over the fact that the industry has only recently installed the sensors to record this data and that most of these discharges are highly diluted by stormwater surges. Tallack explains:

It's like if you just treat the illness with one measure, you're unlikely to solve the illness and make someone better. So, it's the same with river health. The river is a complex ecosystem with many pressures on it. It's not just one pressure. It's multiple pressures. And therefore, if you need to understand how to maintain the health of that river, you have to look at it as a whole system. And therefore, it comes back to you having to look at it through all different lenses. That means bringing all different types of data from different places together to understand that the picture is incredibly complex. 

It isn't as simple as one factor, and if you solve that, you've solved it all. It's multiple factors. So, you have to bring those multiple parties together and bring the data from those multiple parties together to really understand the problem and understand the best mix of interventions to improve the health of that river. It would be like looking through a room with it through a keyhole, you can't see the full picture, you're just looking at what you can see. 

Improved data transparency will make it easier to study this problem in its entirety. For example, catchment management involves figuring out better ways to manage storm surges by investing in boosting natural ecosystems and discouraging paving over gardens with concrete. These kinds of efforts could provide a much more cost-effective solution for improving river health. But they also require coordination from many parties, including citizens, builders, regulators, and water companies. Myers says:

Some of the calls are to treat it just as a water sector kind of only problem. I think there are certainly areas that have been under-invested in, and there are certainly areas where the water industry really does need to think about making some serious improvements. But equally, we can only increase capacity by so much. You can put lots of storage in and pour billions of tons of concrete to put huge infrastructure projects in place in every part of the UK. But the time it's going to take to do that is going to be vast, the cost of doing that is going to be vast, and the carbon cost of doing that is going to be vast. 

We need to have that joint picture and say actually, ‘What is having the biggest impact on river health and where can we intervene? ‘Then we can target that investment in the right places. We can intervene quite quickly, and that all comes down to having the right data to make appropriate decisions on the areas to invest.

These efforts could also improve collaboration on best practices, new research, and innovations in water remediation technologies. In the early days, this might be about making it easier to share data about the effectiveness of various approaches and their relative cost. Down the road, it could also support efforts to share designs of different facilities connected to digital twins about their construction and operating costs.

Tallack says there is also work going on with other industry bodies, such as Spring Innovation, which is accelerating water sector transformation, and UK Water Industry Research, which is creating a collaborative research platform for the water sector. Tallack concludes:

We always say the selling point around what we've done in the water industry is we came together as an industry. We didn't try to solve the problem independently. We said we'll come together around open data, and we’ll do it in a way that ultimately facilitates the uses of that data because if we can facilitate the use of that data, then more value will be generated than if we operate independently and come up with our own ways. Open data is an enabler of open innovation. The more eyes and the more cognitive diversity we apply to a problem, the more chance we’ve got of solving it.

My take

The UK has an essential strength in building a culture of open data. While better technologies and data management services can play a role, it’s essential to build out a framework of trust that includes lawyers, risk managers, and executives. 

It’s also important to look at how industries, municipal authorities, and regulators can expand the context of their individual views and data sets. Solving problems like river health, air pollution, housing, economic growth, and climate change requires a systems-level approach informed by diverse data sets and expertise. 

A grey colored placeholder image