What if I told you as an IT leader that you could not just side-step using older hardware and storage software but also get to a place where you needed half as much storage space - and in the long run, could be confident of saving maybe as much as 25% in data storage cost?
And even better, you’d no longer has to worry about a single point of failure in your backup infrastructure, so you are always free to refresh key content servers when needed—including taking advantage of technology advances—with no disruption?
Chances are, you’d be pretty pleased. And we’re delighted to be able to say we know of just such a happy individual - Mark Penny, who works at the University of Leicester as a Systems Specialist (Infrastructure).
Context for all this - the University of Leicester has three data centres, managed by Penny and his team, whose responsibilities include providing backup for all home directories, corporate systems and research data (including High Performance Computing data) in a mixed environment encompassing Windows, VMware and Lustre.
Now, Penny and his fellow in-house tech team say they’ve worked with technology from vendor Cloudian, using its HyperStore object storage system, as the basis for a completely new backup platform.
Penny told diginomica/government how this had all come about:
Back in 2010 we implemented a complete refresh of our backup infrastructure and we selected Commvault as part of a solution provided by a reseller. Because we had a good fibre channel SAN infrastructure, that was the logical thing to use for the back-end disc; we were using fibre channel discs upon our existing fibre channels because it provided the throughput that we needed, whereas our network infrastructure wasn't in such a good state.
Back in 2014, however, there was another major hardware refresh, where Leicester replaced all the discs and all the server hardware, but stuck with fibre channel, so it was effectively like-for-like, but just with newer hardware, Penny went on - that’s to say, SAN-based with software backup.
But as usage grew, by beginning of 2018 Penny says the University was up to about two Petabytes of usable back-end fibre channel disc with 10 media agents.
This was the big thing we were trying to solve, because if a media agent server went down for whatever reason, due to the fact that fibre channel is a one-to-one relationship between the disc and the media agent or server, once that relationship is broken (because you've lost the media agent), you've lost access to all that data, which can impact both backup and restore. We wouldn’t be able to continue backing up data, access backups or conduct restores until we replaced the hardware and spun up the system again, a process which could have taken weeks.
The system was vulnerable to being completely unavailable through the failure of a single server - but by using S3 object store, we could make the media agents stateless, giving us protection against the failure of the entire backup system because a single server went down.”
Fortunately, Leicester never faced such a crisis. But Penny said he still knew this wasn’t the ideal way to secure all this information, and that he needed to start looking at software defined storage as an alternative.
As we said, this eventually took shape as object software on 12 HPE servers with 3.4 Petabyte raw storage facility. For data protection, erasure coding was enabled in a 9+3 configuration, resulting in 2.5 Petabyte usable capacity, which in the way Leicester’s set it up, up to three servers can fail simultaneously without impacting data availability; in other words, creating a robust, fault tolerant, shared storage environment that could serve as the backup target.
We compared it back doing like for like [again], but duplicating the storage and duplicating the media agents was very inefficient and very expensive in comparison to a single site way of doing it.
Object storage is, as is well-known, not for everyone, but it seems Penny wasn’t phased by this:
A lot of people have no understanding of how object storage works, but we already understood it and were comfortable with the ratio coding. So we built a proof of concept that resulted in a recommendation to senior leadership team that we should use an S3 object store because it a) gave us resilience against a single media agent server failure and b) it gave us protection if we lost this complete data centre. Looking forward, it gave us a lot of flexibility, too.
Another benefit, he adds, is that with an object storage system, you get so much space efficiency:
With our old SAN system, we needed 48U of rack space for 2.5 Petabytes of usable storage. Now, we have the same capacity in just 24U—a 50% saving. The system when we bought it was around about 3.2 Petabytes, and we're already up to 3.8 petabytes, as we’ve bought three more new nodes.
This has all also helped us in the work that was being planned to relocate our data centres, because it meant that we could relocate parts of the backup system with no interruption to service.
We mentioned longer-term costs savings. The basis for this is how much Penny is convinced that his overall storage costs will go 25% down once he deploys object storage in is two other two data centres:
One of the virtues of object store is that you can replace hardware by just simply adding new nodes and retiring old nodes, so hopefully, big lift and shift type migrations are a thing of the past.
We're not expecting to have to completely replace the whole solution from scratch, but simply rotate new servers in and take old servers out.