TfL builds digital archive to preserve London history
- 2012 Olympics got transport organization off the starting blocks and it is now working with a company called Preservica to store its digital records.
Ok, so the latter might not have the grand scale of the other aftereffects listed above, but it’s an interesting example of how wide-ranging the benefits of the London Games have been, as Tamara Thornhill, corporate archivist at TfL, explains.
In the archive world, dealing with digital records and particularly the preservation of digital records has been an issue the profession has been looking at for a little while now. For a long time, a lot of organizations have been continually waiting for someone else to make the first move.
It was the 2012 Olympics that gave TfL the incentive needed to make that first move. The Olympics project team pulled together all of the transport-related digital material around the delivery and support of the London Games, and offered the whole lot to TfL back in October 2012 to preserve in its archives. Thornhill says that the organization agreed, aware that this was a great opportunity to make sure that the information was not lost. And that was the prompt that kick-started the digital archiving project for all TfL records. Thornhill adds:
TfL is an enormous organization. There’s a plethora of different electronic systems in use across the organization. That makes even just knowing what digital records are out there and being created very difficult.
We’d been a little bit frightened, it’s fair to say.
To give some context to Thornhill’s version of ‘enormous’, TfL currently employs around 29,000 staff, operates across more than 300 different sites, and is sitting on over 150,000 boxes worth of physical paper records. Current records are stored at offsite storage facilities partly in London, but mainly outside of London, while the paper archives are all stored in salt mines in Cheshire.
When TfL first began considering its digital archive project, paper records far outnumbered the electronic ones, leading to a concerted effort by Thornhill and her team to broaden the range.
During the first year, we started looking at how we could start acquiring more digital material, what were our easy wins, as well as starting to think about particular business areas that we could target as pilot programs.
The ones we chose for quick wins, we were looking at press releases, the Metro newspaper - TfL has a page in the paper. It was those types of things that are distributed by email so it’s very easy for us to get added to the distribution list.
The Secretariat division, which is responsible for board papers, minutes and declarations of interest, also started working with the archive team early on. By 2014, the organization had a good body of work to begin a digital archive project in earnest. Next step was to identify a suitable storage repository and the preservation aspect. Thornhill explains:
We had nothing in place for preservation before, there were options available for storage and management of digital records, but none for preservation.
The team initially considered an in-house solution, but this was dismissed early on as it would have been “phenomenally expensive” and didn’t really work for the requirements.
Although TfL looked at other potential providers, digital preservation firm Preservica won the contract due to its support network, and the flexibility on offer around the data and the setup model for getting material into the archiving system. This can be done by TfL staff, or a designated person in the Secretariat area, for example, or over a web link or via a specific desktop tool; the system can also be accessed by staff working from home.
The fact that Preservica is AWS-based was a mark in the firm’s favour, as TfL had already started using AWS to host its geographic information system (GIS) material and its website.
The number one priority TfL had when looking for an archiving system was the file format migration, ensuring that all files would be supported to satisfy the preservation element.
Cloud storage was high up the agenda, as was the ability to have tiered storage. Thornhill explains that archive storage is slightly strange in that you do not necessarily know what people will want to look at, and how frequently they will want to look at it; however, there are some things that will always be go-to items, like annual reports.
TfL wanted a preservation system that allowed it to store infrequently accessed records in a lower tier of storage, but retain elements like annual accounts in high-level storage so they can be obtained as quickly as possible.
The storage capacity needed to be able to increase infinitely as well. TfL started with 2.5TB of data, but that’s already increasing as the organization has managed to collect 181,000 digital files within the first five years.
Another element on the list of requirements was supporting different permissions, as access to TfL data and documents is constrained by various commercial confidentiality, data protection and security issues.
The organization also needed to be confident of the support network, Thornhill notes:
With archive software, whilst a lot of the time its architecture is probably not that complicated, for some reason a lot of IT professionals struggle to get their heads around it at times. I’ve experienced this throughout my professional career as an archivist. I don’t know if it’s almost too simple for them.
We wanted to make sure whatever system we came up with, there was a very strong support network for it.
Looking back, Thornhill regrets not getting the TfL IT department more involved in the project early on. When Preservica and other software vendors came in to do demos of their software, important stakeholders from legal, Freedom of Information and TfL Online were invited to attend.
I would have pulled some of our IT people into those demos, to have that better understanding of what it is we’re using it for.
IT pros are beginning to realise that digital archiving does not actually mean what they’ve traditionally thought it to mean, which is just bunging something into deeper, darker storage. But there’s still a lot of work to be done on that, so I would have pulled them in to do that little piece of education work. I think that may have helped us with some of the sequencing issues we had.
Midway through 2015, TfL had started experimenting with the system, and had what Thornhill dubs ‘Preservica Lite’ available. However, it had to wait until the start of 2017 to have the full capabilities available, due to the aforementioned in-house sequencing, and security challenges. Thornhill adds:
We haven’t switched the public or internal capability on. We’re still doing quite a lot of work on our metadata and our XML schemas. We basically want to make sure that when we do open the floodgates, all the data is in as good a condition as it can be, that everything is described as well as it can be, consistently and in a way that is actually useful for people.
The four-person archiving project team is currently busy looking at the traditional metadata fields captured for physical records, and working out what they need to capture and display for digital records.
Depending on what it’s required for, that determines at which point you present it. Maybe it’s only seen by the archivists, as it’s boring technical stuff. Maybe some of that data is only seen by Preservica as it’s boring technical stuff that we don’t need to know.
It’s all about making things as clean as possible. We don’t want people to go into the system, find a record and be presented with a mass of information that’s confusing or putting them off.
Measuring the benefits
The archive Thornhill and her team are hoping to make widely accessible holds over 150 years of London history. It includes a wealth of information on our past, including staff details from as far back as 1863, images of WWII tube shelters, Olympic transport-related material for both the 1948 and 2012 Games, and 3D modelling plans for the recent Tottenham Court Road station upgrade.
It’s virtually impossible to put a cost benefit on the project. What we tend to argue in these circumstances is that while the records are there for public access, with the restrictions I’ve already spoken of, we are ultimately here to serve the business. What we’ve been finding and are able to prove in the paper world, is that the business is using those paper records more and more to prove rights and responsibilities, to prove patent trails.
If we don’t take the same steps to preserve our digital records, then in 10-20 years time, the business is not going to be able to do that parallel work in a digital environment.
While certain records might not merit a place in the permanent collection, there are those that still require a long-life span, such as escalator maintenance manuals and specifications that only exist in a digital format. Thornhill says:
Unless you equip us and our records management colleagues with tools to make sure that record can be opened in 50-60 years time, you’re not going to know how to repair that escalator. That helped prove the immediacy of the problem.
Thornhill hopes that the TfL project will encourage other public sector bodies to focus on their own digital archives. She explains:
There is a bit of a split now between the private sector and the public sector. The private sector has embraced the change more quickly; particularly I’m thinking of HSBC’s archives and Unilever, they’ve really taken on the challenge of digital preservation.
In the government world, I’d imagine we’re the biggest. We do get lots of people coming to talk to us and coming to see what we’re doing, and that’s great. Anything we can do to help.