While most commentators picked up on her headline figure that this could save 22,000 people dying from cancer each year, it should be noted that it was just one of the target set of diseases the NHS would like more help in combating.
And while many are sceptical about the implementation of her strategy - including us - many researchers really do think we’ve reached a tipping point in applying sophisticated data analysis technologies to big problems.
Prominent in their number is the German Centre for Diabetes Research, the DZD (Das Deutsche Zentrum für Diabetesforschung e.V.). Based in Munich, the body, a Federal institute of the Republic, looks to combine promising leads from different research groups, clinical studies, university hospitals, and basic research to seek some fresh tools in the battle against diabetes, a lifelong condition that causes a person's blood sugar level to become too high.
We asked the organisation’s head of Data and Knowledge Management, Alexander Jarasch, what he thought of Mrs May idea, receiving this pretty positive reaction:
Diabetes is a metabolic disease, and has to be studied from different perspectives. We have patients that you have to treat who either have a genetic dysfunction, or are obese, and so we have several different types of data coming from both clinical and basic research we have to combine in order to eventually help patients to prevent or better live with diabetes.
Whether advanced technology like Mrs May thinks could really help deliver better medical outcomes is the main question also here in Germany, and I guess everywhere people are looking at how best to deal with cancer, or diabetes, cardiovascular diseases, or Alzheimer’s.
They are all very complicated conditions, and they have to be studied much more in depth than we’ve so far been able to, and I agree 100% new technologies are very important to get new insight.
Classical research techniques have hit the buffers
Jarasch knows this as his responsibility at the DZD is data infrastructure - putting together the right databases and other IT infrastructure to help his colleague’s scientific or bioinformatics analysis. And he is also starting to be convinced classic research has gone as far as it can, so scientists need to combine data or connect data better to find new answers.
Jarasch’s team is doing this in two ways. One, he is - perhaps as would be expected - looking into AI:
We will definitely use Machine Learning to identify unknown patterns - for example, to try to identify new subtypes of diabetes. Currently, it is not yet clear which parameters - we call them biomarkers - you need to do this, but we think using unsupervised Machine Learning could spot new things from our data and generate hypotheses, sets of biomarkers, that could be subsequently confirmed by experiment.
DZD also wants to use natural language processing to help it analyse huge amounts of medical literature more rapidly, doing things like scanning hundreds of unknown diabetes-related texts for commonly-occurring keywords (i.e. gene names, or protein names) linked with diabetes, or other diabetes related phenotypes.
But when he then says he’ll put that data straight into graph, some brows will furrow. Graph is a way of working with complex data (see some of our coverage here) that has gained some traction in some commercial sectors, but has yet to be seen as a particularly appropriate technique for medical research. But Jarasch says it’s time that perception changed.
We turned to graph for the reason that data is connected, and as a multi-disciplinary research centre DZD by definition wants to combine different types of data from multiple applications, different parts of Germany, different disciplines like bioinformatics or chemistry and protein engineering.
Historically, the vast bulk of this is stored - and still comes in - as a mixture of classic relational database form, spreadsheets, or large amounts of unstructured free text. A year or so back, he told diginomica/government, the organisation decided it needed a common way of working with this information and to make better sense of it.
Diabetes is a metabolic disease, but it's not enough to look into metabolic data only: you really also want to take into account genomics data, or proteomics data, they are all linked. A gene encodes a protein active in a metabolic pathway, and metabolises a metabolite, and this metabolite in turn can regulate another gene, so that’s really already a sort of a network with thousands of components all connected with each other. But we have to connect them, and we have to have a new layer of analysis on this data to so useful things with these connections, so we developed a new system that brings the data together but which also also exposes relationships in that data so we easily jump from one data point to another.
The way he has found to do this is a combination of the Neo4j graph database and its associated Cypher graph query language. Its first application: connecting up large amounts of metadata from DZD’s clinical studies and then trying to connect them on a genetic level, with one project already encoding millions of nodes representing different metabolites in these metabolic pathways.
Jarasch says it is still early days for graph at his institution, but developments like the Bloom Cypher NL interface will make it even easier for non-database specialists to start working with the kind of data they want to, he predicts.
Seems like Mrs May’s team may need to rewrite her speech and drop in a new potentially promising technology for her Healthcare Grand Challenge, then: graph.