National Archives democratizes access to court data with NoSQL and metadata
The UK’s official custodian of public records, the National Archives, says its use of the document-oriented database MarkLogic Server is critical for meeting accessible digital justice aims
A multi-model NoSQL document database is powering a new public-facing court judgement portal, designed to offer free access to important legal decisions for the public.
The new system, Find Case Law - currently in alpha, but operational since April 2022 - is a new cloud service running on AWS provided by the official UK preserver of all government and records of national importance, The National Archives.
Based out of its Kew, London, main headquarters, the Archives looks after 11 million important historical documents and treasures.
Many of these valuable items are secured deep underground in the British countryside, as well as in its 200 kilometers of physical shelving.
The recent rise in importance of digital records also means the body must secure documentation in digital form - a trove which is already over 2 Petabytes, says the organization’s Digital Director, John Sheridan.
But a directive from right at the top of the British legal system in 2020, in the wake of The Digital Justice Report, forced Sheridan and his team to look for new ways of making some of the key information coming into the archive from British courts easier for third parties to work with.
Before Find Case Law, there was no route for judgments to come to the archive straight away - we might eventually get them decades down the line. You could access this information if you were prepared to pay for it through things like the British and Irish Legal Information Institute (BAILII) - but even then, it wasn’t easy.
This was a big hurdle for several users, says Sheridan: citizens interested in the latest findings from the High or Supreme Court, other non-UK legal systems that base possible changes in their laws by UK legal precedent, and the providers of specialist third-party commercial legal support databases.
If you are an innovator wanting to obtain access to this information, there was no publicly funded service providing access to the material.
Plus, while from an archive perspective the decisions that a court makes is just a record, from a legal perspective all courts develop the law through making decisions in particular cases and setting precedents, and a judgement is where you find that out.
Also, important information doesn't keep itself and the technologies we use for keeping digital information are here today and gone tomorrow - so if you want the information to survive, someone needs to do some work. And that's what we’re for.
Therefore, Sheridan needed to share judgements with not just the public and legal professionals, but also users of the data who need it for their own products or service, and AI researchers who want to build predictive models of how a court makes decisions.
Sheridan’s colleague, Nicki Welch, a Service Owner for Access to Digital Records at the institution, explains that the output needed to be both easy for website and mobile visitors to handle but also for this large body of data users, too.
The judgments we deal with can be hundreds of pages long, and often at least 20 to 30 pages. We have to be aware of the size of the document, and not just how it's going to look on the web to users, but how people are going to navigate through them.
So, our documents have a bit of a complicated shape, but you also want to pass on a lot of quite rich metadata to do stuff with.
That meant that a standard business relational database approach for all this semi-structured documentation would not be ideal as a basis for a new system, she says.
Semi-structured, with rich metadata
National Archives was already using the data software it eventually applied to deal with these problems in publishing legislation, says Sheridan.
We built a transformation tool ourselves that converts the Word files we get straight from places like the Privy Council Supreme Court and High Courts into a particular standard of XML called legal document markup language.
That's available on the website, should you want it via our API, and then we transform it into HTML5, which is what we feel the public should expect from a modern digital service. And then we also provide the PDF as well, just in case users want to be able to download or print out the judgement.
Sheridan points out that the service is believed to be the first live service based on the current legal document markup language standard.
Welch says that the system has quickly become popular, with over 3000 judgments published since launch. Access is also strong on mobile, he says, with 42% of users visiting the service from devices other than desktops. She says:
We don't publish every judgement from every court; we take judgments from the Courts of Record, but we’re up to at least 15 or so 20 judgments a day when those highest courts are in session.
Summing up the impact of Find Case Law, Sheridan says:
We had three main targets for the service: one, accessibility - and that's not just access, but true accessibility, so making this content easier to read for everyone; a judgement looks great on your mobile phone, which no one had really done before. Two, to fulfil our obligation on long term preservation, as our job is to preserve what’s important; and three, enable reuse - that you can take this data and do things with it, as it’s not just a bag of Word documents, but well-modelled data.
Improving the quality of British justice system data
Once fully bedded in, the eventual aim is to fully meet The Digital Justice Report’s call to improve the quality and accessibility of justice system data across the British legal system.
Eventually, all Judicial Review rulings, European case law, commercial judgments and many more cases of legal significance from the High Court, Upper Tier Tribunal, and the Court of Appeal will be made available through the service, say the team.
Beyond that, intriguing potential next steps for NoSQL document database and metadata at National Archives, says Sheridan, include the first steps to a knowledge graph. He says:
One of the nice things about the software we’re using is that it allows us to store documents in a standard document format but also useful information in a knowledge graph alongside that document. So where one judgement refers to another or where it refers to a piece of legislation, we’d like to turn all those references and citations into a graph alongside other information, like which court this was from, and who the parties were.
A graph like that will be very interesting because it might tell you what the most influential judgments have been, or what the most litigated pieces of legislation are, and so on.
In the shorter term, adds Welch, as her team is already marking up the XML to connect to references to other cases.