The British Library looks to the future as it reveals the incalculable damage of its ransomware attack

Chris Middleton Profile picture for user cmiddleton March 15, 2024
The Rhysida gang did not just steal data, but also crippled IT infrastructure for six months following its ransomware attack. Even so, the British Library is grabbing the opportunity to modernize.

An image of the British library building
(Image by 12019 from Pixabay)

In 2024, it is easy to imagine that all human knowledge is online forever – and can now be explained by an AI system; but this ignores the fact that an estimated 82% of websites are dormant, while countless others have vanished from the internet since the 1990s, along with their archives of data. (If the internet is truly the world’s memory, then much of it has been wiped, or can only be glimpsed through whatever caches remain.)

It also ignores the fact that a vast amount of data, knowledge, insight, memories, and expertise is found in libraries. One estimate is that Google has scanned just 40 million books out of an estimated 158 million – the first (2023) figure comes from Google itself, and the latter from the ISBN database. However, the second figure only dates from 1967, when the international book numbering system was introduced.

2.2 million new titles are published every year, according to that organization, so it would seem that a colossal amount of useful data still exists in libraries that cannot be found online – though we must acknowledge that many libraries partner with Google via the Library Project to digitize texts

Arguably, libraries remain the world’s key knowledge repository – deep knowledge that is generally free of the noise, marcomms, and misinformation of the Web. Despite that, library visits are dwindling in the UK – down to 100 million annually in 2021-22 from 299 million ten years earlier, but that is partly because nearly 800 libraries have closed since 2010, due to austerity. Even so, those attendance figures still outnumber those for cinema and football combined, so don’t write off libraries just yet – though Statista data suggests that the most common age group for users is 45-64.

National repository the British Library alone holds 170 million items in what it describes as a “living collection that gets bigger every day” as new titles are added. That includes: 13.5 million printed books and e-books; 310,000 manuscripts; 60 million patents; 60 million newspapers; more than four million maps; over 260,000 journals; seven million sound recordings; and over 500 terabytes of preserved data in the UK Web Archive (currently offline).

Thousands of its books and manuscripts, the oldest dating back to 1,500 years before Christ, have been digitized; but at the time of writing, it is impossible to say how many.

That is because the British Library is still recovering from the malicious attack last autumn by the Rhysida ransomware gang – which is also why the UK Web Archive is offline. It is thought that the gang exfiltrated 440GB of data on 28 October alone, after it had tested the Library’s internal systems for vulnerabilities a few days earlier. 

In total, the gang is believed to have stolen 600GB of data, equating to half a million documents – much of it privileged, personal information about the Library’s staff, users, authors, and more, and some of it texts from the Library’s collections. In November, some of that data was put up for auction on the dark web – a common tactic among attackers, who demand a ransom from victims while selling their data to the highest bidder.

However, while the volume of exfiltrated data can be measured, the Library admits that – six months on – it is still unsure what some of it is.

That a national library and data repository of world renown should be a target for a malicious Russia- or CIS-based gang is no surprise. What is surprising, however, is just how little coverage the attack received at the time in the national or specialist presses, and how little attention has been paid to the British Library’s slow recovery in the months since. 

What this tells us about how much Britain cares about its library is anyone’s guess.

The details

So, what happened? Why did such an important institution fall victim to a large-scale attack that crippled its IT infrastructure – lifting entire databases, destroying servers, and locking up and encrypting others? The answers are revealed in a new report by the Library itself – which should be commended for being transparent about its failings so that others can avoid the same fate.

The report explains:

First, a targeted attack copied records belonging to our Finance, Technology, and People teams on a ‘wholesale’ basis, resulting in the copying of entire sections of our network drives. These files represent around 60% of the content copied in the attack.

Second, a keyword attack scanned our network for any file or folder that used certain sensitive keywords in its naming convention, such as ‘passport’ or ‘confidential’, and copied files not just from our corporate networks, but also from drives used by staff for personal purposes as permitted under the Library’s Acceptable Use of IT Policy.

That policy will be reviewed in light of the attack, notes the report. It continues:

Third, the attackers hijacked native utilities (e.g., IT tools used to administer the network) and used them to forcibly create backup copies of 22 of our databases, which were then exfiltrated from our network. 

We believe that several of these databases contain some contact details of external users and customers, although we will be unable to analyze exactly what data was copied in this way until some of our database infrastructure capabilities are restored.

Despite this, sensitive details such as customer bank details are not thought to have been included, says the report – though the Library cannot be certain.

It adds:

Work is now under way by our Corporate Management Information Unit to conduct a detailed review of the exfiltrated data to confirm our assumptions about the nature of its contents and identify any specific sensitive material. 

Where sensitive material is detected in the course of this review, the individuals affected (whether staff or external) are being contacted and provided with appropriate advice or support, and the ICO [data protection watchdog the Information Commissioner’s Office] is being kept informed. 

We believe that the unedited Electoral Roll database held as part of the collection was not compromised, as all indications are that the enhanced levels of encryption in place on that particular database functioned as intended and protected it from the attack method described above.

Similarly, our PCI DSS controls have ensured that no credit card data was compromised; the storage of customer card data is not permitted anywhere on our network and is regularly scanned for and eliminated where present.

So, what of the Library’s store of digitized manuscripts – a collection of thousands of priceless historic documents? The report says:

We believe that secure copies exist, both of our born-digital and digitized content, and of the metadata which describes it. [But] each dataset will need to be validated to ensure its integrity before being restored on the Library’s new infrastructure.

Later it clarifies the point, saying:

Digital collections are all accounted for through back-ups and/or third-party copies, though final full validation will only be possible once each dataset is checked and brought back on stream on the new infrastructure.

A reliance on legacy

The British Library says it has neither paid a ransom to the Rhysida gang, nor engaged with it – in line with National Cyber Security Centre (NCSC) policy for publicly funded institutions. 

Even so, the attack has caused incalculable damage to the Library’s systems, reputation, and IT infrastructure, not to mention prevented many staff from doing their jobs for half a year. Moreover, it has hampered critical research among users who are reliant on the Library’s unique content, and created a huge backlog of work.

For a gang thought to be linked to a hostile state, those are results in themselves. Indeed, the damage extends far beyond the theft of privileged data, the Library explains:

The attack methodology of Rhysida and its affiliates involves several different elements, including defence evasion and anti-forensics (e.g. they ‘clean up after themselves’ and delete log files etc., in order to make it hard to trace their activities), exfiltration of data for ransom, encryption for impact, and destruction of servers to inhibit system recovery (and as a further anti-forensic measure). 

It is this last attack type that has had the most damaging impact on the Library. Whilst we believe that we will eventually be able to restore all of our data, we are hampered temporarily by the lack of viable infrastructure on which to restore it. This infrastructure is in the process of being rebuilt or renewed, with work due to complete by mid-April, prior to the phased restoration of systems and data.

The report adds:

The Library’s vulnerability to this particular kind of attack has been exacerbated by our reliance on a significant number of ageing legacy applications which are now, in most cases, unable to be restored, due to a combination of factors, including technical obsolescence, lack of vendor support, or the inability of the system to operate in a modern, secure environment.

So, what does this mean in practice? 

The report explains that the “historically complex network topology” that has grown over decades allowed attackers wider access to the Library’s network than would have been possible with a more modern network design.

Some of our older applications rely substantially on manual extract, transform and load (ETL) processes to pass data from one system to another. This substantially increases the volume of customer and staff data in transit on the network, which in a modern data management and reporting infrastructure would be encapsulated in secure, automated end-to-end workflows.

This reliance on legacy infrastructure and processes is the primary contributor to the length of time that the Library needs to recover, says the report. So – if nothing else – the attack has forced the adoption of more modern servers and security controls.

The British Library explains:

There is a clear lesson in ensuring the attack vector is reduced as much as possible by keeping infrastructure and applications current, with increased levels of lifecycle investment in technology infrastructure and security. 

The Library responded as quickly as it could in the circumstances, and followed the necessary steps to limit the attack, but still suffered very significant damage.

A focus on rebuilding

So, where are we today? Just before Christmas, the British Library Board approved a proposal to establish a Rebuild & Renew programme to plan, coordinate, and deliver a longer-term recovery. 

The report explains:

The programme will also review the Library’s corporate approach to Business Continuity, as opposed to within individual departments, incorporating lessons learned to ensure its future ability to respond to incidents of a similar scale in a consistent and structured way. 

Formal testing and exercise regimes will also be reviewed to enhance the Library’s overall preparedness for major incidents.

But there is a silver lining, of sorts, acknowledges the report:

Following the October 2023 attack, the Library has an opportunity to transform its use and management of technology across the organization, to wholly adopt and embed best-practice security mandates, and to implement fit-for-purpose policies and processes that will enable us to fully realise the benefits of our technology.

It adds:

The substantial disruption of the attack creates an opportunity to implement a significant number of changes to policy, processes, and technology that will address structural issues in ways that would previously have been too disruptive to countenance.

Among numerous changes, these will include:

  • A best-practice network design, implementing proper segmentation with a ‘defence in depth’ approach.

  • A hybrid computing landscape that maximizes the benefits of cloud for development, application, and virtualization.

  • And a best-practice, role-based-access control setup for domain and storage services, enshrining the principle of least privilege across the organization.

My take

In a world of books, learning, and stories, this is a cautionary tale indeed. One that reflects both the reality of how organizations and infrastructures evolve over time, and the lessons of how difficult it is for publicly funded organizations to stay at the cutting edge. 

Even so, the British Library should be commended for its transparency and openness, and for the public-spirited nature of its report, which can be read in full here.

We would expect nothing else from such an important public institution.

A grey colored placeholder image