French pharma leader Servier cuts drug research time with use of knowledge graph

Gary Flood Profile picture for user gflood January 17, 2024
Neo4j’s graph database underpins ‘Pegasus’ - an internal tool that Servier says could end up being mission-critical to its pharma research


The R&D arm of global pharmaceutical company Servier is using graph technologies to improve the success rate of drug candidates in the clinical phase.

Specifically, it has built a knowledge graph based on Neo4j tooling called Pegasus.

This, its developers claim, allows it to better organize and probe both third-party and proprietary data.

The result, says the system’s primary developer, Data & Research Scientist Jeremy Grignard: bringing useful drugs to market that much quicker.

He says:

Servier’s mission is to discover new drugs to treat diseases, but the process of research and discovery of new drugs is very long, expensive, and very risky - many failures can occur at all stages of the research process and development phases. 

But if at the start of discovery we could identify what we call ‘disruptors,’ small chemical molecules for example which will later become a drug, then both we, and people in our therapeutic areas like cancer and cardiometabolism, both benefit.

A commitment to innovation

Servier - which recorded revenues of just under 5 billion euros in 2021/22 - is the second largest pharma brand in France.

Headquartered in Suresnes in western Paris, medicines developed by the organization treat patients in over 150 countries.

Controlled by a foundation, the firm - which also calls itself Les Laboratoires Servier - employs over 21,000 people.

Central to the Servier mission is a commitment, says Thierry Dorval, Head of Data Sciences & Data Management, to basic research.

Over 20% of revenue from its sales are invested in R&D each year, he points out.

That means it is always open to exploring the potential of new technologies like deep learning algorithms, sequence design and mathematical modelling.

As a result of Grignard’s work, that list also now includes graph-based knowledge graph.

This is because of its potential to make handling of all the highly heterogeneous data sources its scientists want to work with - from public biological or medical databases to experimental results - that much easier.

Key to doing that efficiently, adds Dorval, being able to deep dive only into the most promising parts of the data.

He says:

That ‘sparseness’ is important, as what matters the most for us is the links between those data, not the absolute value of the data themselves. We are interested in knowing, for example, that two chemical compounds are similar, or that two proteins are quite ‘x;’ we are interested in relationships here more than values.

In practical terms, he says, that means that the team can now ask questions that it did not previously know where possible.

He says:

Pre-graph, we were much more narrow-minded in terms of the questions we asked the data. Now, we have many more ways of requesting information from the data we are interested in.

Grignard adds:

Because we have so much heterogenous pharma-biological data, we had to work out how to capitalize and link this data to support better decision making for Servier’s projects. 

And because it’s these relationships we want to find, a data approach like graphs that is all about tracing relationships is the perfect architecture for something like Pegasus.

The system also allows Servier scientists to combine a chemistry-based analysis along with a biological one on what they are looking at - which also speeds up their investigations, say the pair.

Millions of relationships allow novel questions to be posed

As stated, Pegasus is a knowledge graph built on Neo4j, with data pre-processing conducted in Python.

In its latest form, that translates to a system of approximately 50 million individual nodes and 300 million-plus links (relationships) between them.

As this is a very specific domain, the usefulness of Pegasus to a non-pharma expert can be obscure.

For example, Grignard and Dorval very quickly have to end up talking deep biochemistry to explain what they can now do because of using graph.

Grignard, for example, is very excited about progress he sees being made around ‘ASOs’ - which stands for ‘allele-specific oligonucleotides’.

An ASO is a short piece of synthetic RNA or DNA that regulates the synthesis of a protein a researcher is interested in.

To achieve success here, the sequence must be made highly specific to the protein in question.

Using Pegasus, he says, means the time taken to see if a proposed sequence is related to a desired target is being dramatically reduced.

That really helps, as there could be hundreds or even thousands of possible sequences to work through.

He says:

So, what makes the graph really efficient for us is both the way we can ask new questions but also the speed at which we can process questions and the size of the problem set we can give it.

This kind of search is going to be improved even further by making it even easier for non-specialist Servier data scientists to interrogate the system directly, without IT’s involvement.

Specifically, Grignard is pairing his knowledge graph with his graph vendor partner’s graph query language, Cypher.

This will interface with an LLM (Large Language Model) he is building that will translate scientist questions expressed in natural language (e.g., French or English) into Cypher queries that will be pushed into the knowledge graph and output an answer in the same natural language - not database code.

When asked about the ultimate application of such techniques and the use of knowledge graph at his organization, as both good scientists and honest data engineers Grignard and Dorval are cautious about over-promising.

With no trace of irony, for example, Grignard says:

What we’re doing here is not rocket science; we are just trying to improve the drug discovery pipeline by focusing on business and scientific questions we want to solve. 

Yes, we are implementing new technologies, but only to help solve the questions that are intrinsic to our colleagues’ needs.

Dorval is a little more optimistic.

He concludes:

Our work here is really trying to help the business by bringing chemistry, biology, and pharmacology just a bit closer together.

Doing so gives us the opportunity to deploy and test cutting-edge data science approaches and explore using LLM directly for the business, which we think is very promising.

A grey colored placeholder image