Shell builds an integrated data platform to accelerate energy transition

Gary Flood Profile picture for user gflood August 5, 2022
Databricks forms the basis of multinational oil and gas company Shell’s new Unified Data Analytics Platform


Petrochemical giant Shell says that it is increasing efficiency across its operations by better use of data across the company.

This is being made possible via its new platform, which the firm’s internal data team says is empowering hundreds of engineers, scientists, and analysts to innovate together, through the democratization of data analytics and AI.

This matters, as Dan Jeavons, VP Computational Science & Digital Innovation at Shell, explains: 

If you think about where we are right now, in the energy sector, we’ve got a major problem, which is that most energy resources still depend very heavily on hydrocarbon-based fuels.

To quote Al Gore, Jeavons jokes, that’s an inconvenient truth, but it's something companies like his know they must address quickly. It’s been estimated by the World Economic Forum, for example, that the overall carbon footprint could be reduced by as much as 20% through the better deployment of digital technologies at scale. 

However, it will take multi-year, major CapEx investments to change the infrastructure companies like Shell rely on. However, Jeavons says: 

And that's just going to take time, of course - but one of the things we can do now is leverage digital technology, which is much quicker and easier to deploy at scale. That can then allow us to have a significant impact on our CO2 emissions.

Jeavons - who has just relocated to Bangalore to better oversee his 300-strong global data team - says that the company is prioritizing better use of data from the new platform to speed up the firm’s transition to greener ways of working.

That will happen, he predicts, by better use of digital to make Shell’s existing business more effective and efficient.

For instance, using data in a smarter way to enable design process changes using advanced simulation and computational science - combining things like AI and physics, he says, which then can result in greater efficiencies in the field.

Finally, and data analytics will also be harnessed in work by optimizing the management of the more complex, distributed, and diverse energy systems Shell expects to need to deploy in the future. Jeavons says: 

Optimizing all that at scale is a challenge, so digital technology plays a key role to make sure that the investments we deploy run optimally.

Unifying vast amounts of sensor data

Jeavons says that in his sector, the obvious - and best - place to start is with data. However, the company wants to do this by dealing more effectively with global, time-series data at scale. He explains: 

If you think about an enterprise like Shell, everything we do is physical - from our retail stations, through to the pipelines we operate, the wind turbines that we run, the solar parks we operate, our transforming refineries, and also, of course, into the upstream business where you have existing oil and gas assets.

There’s data all over this map, he says, in the form of huge amounts of individual measurements, e.g., weather conditions, or temperatures, pressures, rotation speeds, and so on.

That results in a “vast amount” of data from a “vast array” of sensors, but historically Shell found the only way of dealing with that data was highly localized data warehouses on-site.

To get anything like a company-wide, central data warehouse to run the queries at the level and complexity it wanted, Shell had to create aggregation mechanisms, which often leaked data. Jeavons says:

The real reason you're localizing is you can't deal with the scale, and so you're effectively simplifying and stripping out information as you go through those different stacks. We did pretty well, but we still struggled to enable this at scale. And of course, when machine learning and AI came along, we wanted to be able to bring all that information in its rawest form to train the algorithms to help to optimize the overall energy system.

We're looking to bring all of that data into an integrated cloud environment, and then run both reporting mechanisms, standardized APIs, and also machine learning jobs, from the same integrated data architecture.

That’s finally been achieved, he says, by use of  “data lakehouse” technology from vendor Databricks, with initial conversations starting with the vendor in 2019. This has helped, Jeavons explains, as it finally means all the data for reporting purposes - the warehouse - can be collated. A data lake has also been enabled, which can be used to train the new machine learning models Shell wants to exploit.

The result, says the company, is a scalable, fully managed platform that unifies Shell’s entire data analytics lifecycle. Jeavons contrasts past and future, and says: 

Historically, to run a global query, you probably would have had to interact, at minimum, with north of 10 to 12 different systems. You’re now able to do that from a single platform at scale in a performant way. That allows us to answer all sorts of questions that we couldn't answer before and compare all sorts of parts of the business that were very difficult to compare historically. 

Now, there is just one data product, which Shell line of businesspeople can interact with to develop their dashboards and power apps, without having to go shopping all over the place to integrate all sorts of datasets all over the enterprise.

Equivalent of removing 28,000 US vehicles from the road

To put it in perspective, Shell says it has now got at least 2.7 trillion rows of data on its analytics platform. A practical way of having all that data in one place, he says, includes a radical improvement in proactive technical monitoring, where its engineers are getting much better at knowing when things might go wrong with equipment - where Shell can then proactively intervene. 10,000 pieces of equipment are now being checked this way, with machine learning processes directly enabled by training data in the lake.

Another key target, as stated, was optimizing performance. The first benefits of unified data analytics across Shell are reducing the CO2 impact of its LNG (liquified natural gas) trains. Jeavons says:

We’ve shown that we can increase LNG train production by about 1% to 2% of the liquefied natural gas trains, using the data that comes from this. 

We've demonstrated we can take out something like 130 kilotons of CO2 emissions associated with that process, typically, which is about the equivalent of removing the equivalent of 57,000 European vehicles or 28,000 US vehicles from the road.

Jeavons concludes that he is convinced that having integrated data is the best foundation for Shell to accelerate energy transition.

A grey colored placeholder image