Main content

What if AI hype is a bust? How knowledge graphs can #accleratetrust within and between organizations

George Lawton Profile picture for user George Lawton April 10, 2024
Kaiasm Chief Scientist Liam McGee explains how knowledge graphs can create a Rosetta Stone to keep different groups on the same page to #acceleratetrust. This has important implications for all organizations seeking to build a more solid foundation for generative AI.


Enterprises are waking up to the fact that most AI, particularly the trendy generative AI variants, tend to hallucinate. Increasingly, knowledge graphs are being proffered as a foundation to reduce this tendency.

Unfortunately, the term is used in many ways. Depending on who you ask, it could be a graph database, taxonomy, ontology, semantic web, digital thread, or a trendy new architecture like data fabric or data mesh. Suffice it to say that most people are looking at more precise ways of joining up data for various types of stakeholders and experts. 

I recently sat down with Kaiasm Chief Scientist and co-founder Liam McGee, whose company looks at how knowledge graphs can #acceleratetrust across engineers, business execs, and lawyers involved in large building projects. At a high level, he sees knowledge graphs as a kind of Rosetta Stone that makes it easier for these different people to ask more precise questions and improve confidence in their responses. To clarify, no graph databases or generative AI are required but could play a supporting role. 

McGee explains:

You're trying to tie your representations of the real world to the real world. So, you are accepting the fact that you're not dealing with data, you're dealing with things. And you're just representing them with data so that you can manipulate the data, but you're doing this in order to do things with the things. That allows you to break down a lot of barriers within the organization so it escapes the silos. That's why it's organizationally really useful to do. Trying to build a translation between every silo is much less efficient than building one translation for each silo. And that's why it's quicker.

From SEO to infrastructure

For context, Kaiasm started off developing website optimization tools. First for making them more accessible and then later for search engine optimization. Under the surface, both of these require solving a knowledge graph problem. For example, the hardware store Screwfix internally referred to equipment for affixing nails as a ‘nailer,’ while a far larger audience of customers searched Google for ‘nail guns.’ Kaiasm designed tools that automatically translated the names and structured the web pages for thousands of products that Screwfix sold into the most relevant format for Google searchers. This gave the search engine something to make sense of and dramatically improved sales. 

However, during COVID-19, marketing budgets dried up as retailers shifted their focus on keeping their supply chains rather than marketing things they could not deliver. Kaiasm looked for other opportunities and realized improving infrastructure data workflows was a natural fit for their core competency. McGee says:

We sort of re-tooled our toolset and our methodology to look at how do we how do we deal with all the silos that there are in large and complicated organizations that are trying to represent the real world in a number of different ways for a number of different people and how do we increase the semantic interoperability.

Bridging trust on a knowledge graph

These days, they are doing a lot of work for companies building things like bridges and roads whose employees all have different interests and ways of talking about projects. Business execs want to know how much a project will cost and which bin to assign the costs to. Lawyers want a more efficient way to talk about the contract clauses, check that each clause is fulfilled, and identify the next steps when they have not. Engineers want to be able to prioritize tasks, measure quality, and coordinate work across different teams. All of these teams talk about elements of a project in slightly different and sometimes overlapping ways. 

To complicate matters even further, experts may use different terminology even within one discipline. Teams may also refer to different building standards to track projects, such as the National Highways Asset Data Management Manual, Design Manual for Roads and Bridges, and the Manual of Contract Documents for Highway Works. 

Bridging trust requires building semantic glue, managing data owners, and making it machine-readable. Building the semantic glue starts by identifying the important questions people ask across various workflows and linking these across various names and data structures. It also needs to be in a format that is intelligible to folks like bridge engineers who might not be a data model specialist. McGee explains:

It shouldn't matter what you call it. You should be able to hold all the different things that people call it against the thing. There's a thing and a bunch of names, and it's fine to have loads of them. You can have a preferred one. But if you don't use a preferred one, that's still okay as long as we know the one you use.

Next, you need to assign a rule to each business-critical information asset defining data ownership to ensure that the data is accurate and can answer questions from different parties. This owner is a role inhabited by a human, and if they move on or go on vacation, you have to put another human in the role. It also helps make it easier to figure out who to ask when someone is unclear about a definition. 

Automating the process

The third piece is translating unstructured data such as contracts, spreadsheets, and progress reports into a machine-readable format. This allows different parties to programmatically answer questions in a clear way that everyone can agree on. This helps answer questions definitively rather than a matter of opinion when the data is pulled from different sources. Lawyers, for example, want to make it easier to clearly discern when contract clauses have been satisfied in a way that can be visible and agreed to by all parties. 

In the case of bridges, Kaiasm’s tools help distill and translate the appropriate data into machine-readable linked JSON documents. McGee says JSON is nice because it’s easy to hang more bits of detail of it. They don’t tend to use traditional ontology formats like web ontology language (OWL) because they are less familiar to programmers. 

Kaiasm’s tools help people visualize the structure, play with it, correct it, add bits, and merge it with other structures. This makes it easy to export into a variety of formats. For example, construction teams might work with Oracle Primavera project management files. McGee argues: 

It doesn’t really matter what format it's held in. It can be spat out as anything that conforms to a logical frame. We've concentrated on using the lessons we learned in the early days of making large amounts of information highly navigable and memorable. So, once you've found it once, it's still in the same place next time. That's what we use in the app to allow people to get around and to select parts of it for export.  

Cleaning the meaning

For example, a bridge engineer might want to distill all the information the organization currently records about bridges across different applications. He might determine there is a bit of a mismatch and can then work with the data owner to sort it out. Then, once this is cleaned up, it’s easier to specify data requirements for third-party information suppliers. McGee explains:

As with everything else, its value is really on the interpretability and cleanliness of the information held within it. The data lake will not be the solution to all your problems, and the graph database will not be the solution to all your problems. In fact, no technology is going to be the solution to all your problems because your problems aren't purely technological. They are a mixture of people and things and technology, and you need to deal with the socio-technical problem, of making sure that business people understand how to ask questions of technical people. 

And the technical people understand the consequence of not understanding the question the businesspeople just asked them. That's where you get sort of contract relationships in place between, say, commercial people who are buying the information and technical people who are providing information and making sure that you can have a contract in place that says, ‘Regardless of how you hold this information, this is the information we want, and there is an objective way of checking whether we've defined it in a way that you can prove whether you're providing it in the way that we've defined.’ That's when that sort of trust stuff comes in. And that's where it becomes really useful to have the information not into databases that you are mapping but in this kind of separate conceptual layer where it says, ‘What are the real things we're asking for information about, what information are we asking for and what format do we expect to get?’ You can define that in a nice, machine-readable form, but in the end, it's independent of the application itself.

Neurosymbolic foundations

At the moment, Kaiasm is not directly using AI. However, McGee believes that in the long run, knowledge graphs will help build a symbolic foundation for more capable neurosymbolic AI that hallucinates less. Symbolic representations are crucial engineering and legal workflows that require exactitude and explainability. 

For example, a large engineering firm may have over a million spreadsheets relating to a nuclear power station or loads of design drawings. Engineers might want to survey the projects they have done over a few decades to answer a particular question. In this case, a generative AI model could help them ask questions in natural language about a subset of this data. For example, “Which are the bits where we tend to go over budget, deliver late, or people get injured.” Without the symbolic groundwork, this can be challenging if every project has its own breakdown structure and encoding scheme. McGee explains:

Generally, LLMs are brilliant at understanding what people probably mean because they're essentially converting lumps of text into encodings that can be compared to other encodings. The idea that these can help me query this data and give me my answer back in plain English is a really obvious area for generative AI.

But McGee is not completely sold on the current AI hype panning out in the way enterprises expect owing to high training costs and data storage requirements. He quips:

What if the current AI hype is another bust that just doesn't work? They still need the clean data.

My take

The key takeaway is that building trust across organizations is more of a socio-technical problem than a new technology problem. Creating a knowledge graph needs to start with how different roles and parties view the world rather than a new tool. This kind of analysis can plant the seeds for how you implement the data infrastructure, create appropriate rules, and automatically enforce them. 


A grey colored placeholder image