Elsevier wades into generative AI - cautiously
- Summary:
- Starting small, but tapping into ChatGPT to jumpstart its efforts.
Scientific giant Elsevier is cautiously adding generative AI capabilities to its Scopus abstract and citation database. And it jumpstarted the process by leveraging the ChatGPT engine.
This may sound like a recipe for disaster If you have been following the recent stories on AI hallucinations. So, Elsevier is starting small with an alpha release of the new AI capabilities and taking advantage of its existing citation search engine, knowledge graph, and custom ontology to ground ChatGPT’s results to a chain of trust. This builds on the firm’s previous work on Small Language Models and graph data we covered in March.
It is starting with the standard ChatGPT Large Language Model (LLM) rather than investing the resources to develop their own. Maxim Khan, SVP of Analytics Products and Data Platform Elsevier, believes this approach could strike the right balance between development time, cost, and product quality. If all goes as intended, it could set a model for publishers and enterprises to safely take advantage of these new tools' power. For now, they have 15,000 researchers kicking the tires and hope to go live for all users in 2024.
A better search
The Scopus search tool helps lawyers, banks, governments, researchers, and healthcare professionals to get the lay of the land for multi-disciplinary research. Professionals in one discipline can quickly discern the terminology, top researchers, and institutions in adjacent disciplines and domains.
For example, a finance expert might want to assess the impact of electric vehicles on supply chains. They could find out how scientists in new battery technology are characterizing the latest advances and which approaches seem to be leading the field. Scopus AI would identify the top papers across disciplines and then distill these into a summary. The finance expert could dive deeper into the more interesting leads for further insight.
A core part of their strategy has been the research and development of tools and infrastructure of the concepts buried in respected technical journals. Khan says:
Over the last twenty-plus years, we have invested quite a bit into the linking of data. We do a lot of entity resolution on the data that we have. So obviously, publication data, but also things like preprints, and broader data, like funding data, to pull out topics and organizations from journals and link these things. And that creates a very rich graph.
On top of this rich graph, they have built out search analytics to make sense of complex topics. The new AI capability enhances the existing service. Khan explains:
The big difference is we're now starting to, in a focused way, try to apply the same NLP techniques that we use mostly for kind of extractive parts of the data and the thinking on the user side, which allows users to leverage the solutions and the content and the data that we provide.
Mitigating hallucinations
Khan acknowledges there are risks with hallucination. One of the things they are doing is grounding the LLM’s work in vetted Scopus content through retrieval augmented generation (RAG). He says:
Using the query that the user types in, we're firing that into a semantic search engine and getting back the list of results. And we're using that, in addition to the query, to prompt the LLM to give essentially a summary. So we're essentially using the LLM as almost the natural language interface.
So when you get the results back, you actually get the references from Scopus that support all of the summary statements that come up in the summary. So that obviously reduces the risk of us making up references because it's very hard to make them up when you've essentially returned them from a search engine. And it also grounds the interpretation, the summary in the results returned by the semantic search. So this technique, we feel, gives us a way of exploring the benefit that LLM could bring for our users, but also in a way that manages the risks
Getting a jumpstart
Elsevier decided to take advantage of an existing LLM rather than start from scratch to create the most value for users at a minimal cost. The traditional LLM training process starts with internet scale training, which takes months and can cost millions of dollars. Enterprises sometimes fine-tune these models, do reward modeling, and reinforcement learning after deployment. Elsevier decided to start later in the process, explains Khan:
I think most people could probably stick to taking models that are out of the box and then optimizing those through things like prompt engineering, which is kind of what we've done. We are using ChatGPT within Azure within the Microsoft environment. It’s on a private instance, so no information gets out of the box, and we want to maintain privacy and confidentiality. Now, we are essentially LLM independent.
If we believe that other models are more suitable to meet this need, as part of the overall stack of the solution, we will look at those options. But we've designed this to essentially separate the different layers of our solution, so we've got options around these models as they continue to evolve.
Two components for customizing the big models after the fact include fine-tuning and prompt engineering. Although fine-tuning may provide value in some specific domains, Khan does not believe it makes sense for now:
The people using those models out of the box will do prompt engineering, and some will do fine-tuning. Although with the benefit versus effort of fine-tuning, it’s not clear that makes sense because you can get very far with prompt engineering. So what we are doing is augmenting the prompt, the user query, with the results that came back from the semantic search. And then, we are asking the model to essentially summarize the results in answer to the query.
Building feedback loops
Scopus has developed several sources of feedback. User research and testing provide in-depth qualitative feedback before launch. They are also surveying users as they use the tool. They also measure users’ engagement on the platform to capture ongoing metrics. Insights from early adopters will help decide whether to invest more resources into improving prompt engineering, semantic search, or additional features. Khan explains:
We are releasing it in stages to a subset of the overall usage of Scopus because we want to actually figure out various things about the solution and how much value and benefit it is giving before we scale.
An essential aspect of this endeavor has been the investment in developing a rich knowledge graph of Scopus’ underlying content. They are not just feeding raw data into the LLM. It’s been curated, structured, and linked in such a way that it is easier to process, analyze and understand for both humans and bots. Khan says:
As an information provider to the world of research, we want to explore more broadly how to leverage the graphs that we have to better meet existing needs or meet adjacent needs within the community. The ability to give the community better ways to leverage those knowledge graphs through search or analytics is top of mind. We're really starting with this specific use case as a starting point. But over time, we want to leverage our entire knowledge graph, or allow the community to leverage our entire knowledge graph to support the decisions that they're making and hopefully reach better outcomes.
Tim Berners-Lee’s long-term vision of the World Wide Web evolved towards a semantic web with explicit links connecting entities. Although he started describing this new vision a few decades ago, it’s been slow to catch on. One big challenge has been the limited tools and complexities of connecting all of the people, places, and things across pages.
Khan thinks several things have changed since those early days. Standards like JSON-LD have become more widely adopted and accepted by the community and refined through practice. And there have been improvements in linked data infrastructure. But building a semantic graph still requires effort, he concludes:
Several parts of this ecosystem have matured significantly. But you still do need investment and expertise to play in this space. And in order to serve the communities, you can't just put stuff into a box and expect it to work. There is still work to do around applying these technologies, applying these standards, building these end-to-end systems, and investing in the curation and the data.
My take
The generative AI hype tends to gloss over the data used to train and run new services. Curating quality data is one aspect. But investing the time to discover the meaning of and links within the data will make it more intelligible for humans and bots.
This is not a new concept. About 2400 years ago, someone asked Confucius what he would do if he were governor to restore harmony. He said he would “rectify the names” to make words correspond to reality:
If language is incorrect, then what is said does not concord with what was meant; and if what is said does not concord with what was meant, what is to be done cannot be effected. If what is to be done cannot be effected, then rites and music will not flourish. If rites and music do not flourish, then mutilations and lesser punishments will go astray. And if mutilations and lesser punishments go astray, then the people have nowhere to put hand or foot. Therefore the gentleman uses only such language as is proper for speech, and only speaks of what it would be proper to carry into effect. The gentleman, in what he says, leaves nothing to mere chance.
Maybe today, a similar approach will make AI outputs correspond to reality as well.