How Splunk helps UCAS with more than its annual two-week headline grab

Profile picture for user mbanks By Martin Banks October 16, 2017
Every mid-August UCAS is headline news, but there is a great deal going on the rest of the year, all of which requires dealing with vast amounts of personal data demanding the best possible security

UCAS is an organisation which grabs the headlines for for two weeks every August when the UK’s A-Level results are announced and students start the task of matching their educational results to courses in the country’s universities. It brings that mixture of joy (getting into a better Uni than hoped for) and frustration (with the service being clogged) that every August brings as students make the big change from school to further education.

By September however UCAS has, like a dragonfly, disappeared from view again. But as the organisation’s Enterprise Security Architect, Andy Gibbs, acknowledges there is no time to put his feet up for a well-earned eleven-month rest. Not only does the work go on but the load is getting bigger.

And as the only organisation of its kind in the world, so far as he knows, he is aware that its influence, and its business activities, are starting to spread beyond the shores of the UK. 

In addition, given that it is continually dealing with not just the personal details of a huge number of students – or as the organisation prefers to call them, 'learners' – it is equally involved in the work, and information resources, of the many, different, teaching establishments those learners end up working with.

Having guided the organisation through its triennial recertification of ISO 27001, following major changes that included migration to an AWS cloud-based environment, Gibbs is well-acquainted with the need to provide really tight security. He is also aware of the need to prove that security is working, even if that means an individual security tool might fail because it meets a new, as yet unknown, threat.

That means being able to provide triage services when security problems occur, based on the provision of real time auditing of what is happening across the IT environment. Further out, his goal is also to use augmented intelligence and machine learning systems to teach the IT systems what is best practice and how to use it. His tool of choice for these roles is Splunk, not least because its analytics capabilities with machine log data drives the other capabilities the organisation requires.

Six objectives

UCAS works to meet six strategic objectives. The first, and most obvious one is helping learners find appropriate further education places when they come out of A-levels. That could include a university degree, going to a specialist vocational college, apprenticeships, teacher training courses and conservatoires, for those that want to do arts and fine arts courses.

There is also a post graduate scheme, plus a scheme called UCAS Progress which is targets learners at an earlier age and help them answer the inevitable Do I even want to consider going through A-levels and university?’ question.

It also works with the educational providers themselves, providing the services needed to attract, recruit and admit students to their university places. Gibbs says:

This is like a wedding broker between learners and the education providers. But a big chunk of what we do is also advising the advisors, so the schools, the parents and the rest, giving them guidance on how to better direct the learners into their ultimate educational destination.

It also provides data and analytics for educational establishments, and interested universities can access a premium service guiding them on how to better target their courses to a broader market place, including into Europe or further afield.

On a more commercial side, it can help with advertising and marketing for some learner services, such as accommodation providers looking to target learners as they move to forthcoming courses, while the final strategy is to present this package as an example for other public sector organisations and educational institutions abroad. Gibbs explains:

What sits at the nub of that is some highly confidential data. We are taking on 800,000 new applicants each year. Most of those are learners in the age group of 19 to 23, they are usually telling us about their most detailed circumstances. They’re sending us their personal statements, we owe them a duty of care to look after that properly. If we don’t then the Information Commissioner’s officers will remind us, usually with punitive fines if we don’t do that job properly. I’m pleased to say we’ve never fallen foul of that in any material way so far.

This is so much the case that Gibbs is not overly concerned about the coming of GDPR next year as it has already established good governance around its organisation.

One of the most important factors UCAS has to manage – and `has to’ are key words here – is the alarming ramp the scaling process goes through in those two obvious weeks in August. Gibbs notes:

We have a peak in demand for our services that verges towards two orders of magnitude greater than the levels of processing we’re doing throughout the year. In other words, it increases by 40x, and the ramp up on that is enormous. When the confirmation clearing process starts it ramps up with a peak that happens in a matter of minutes, about half an hour to be conservative. It’s of that order of magnitude. What that means for us is that we have to have an infrastructure that has to be able to cope with that level of flexibility, and this is where the choice of Splunk has been quite useful.

He sees it as the nervous system of the infrastructure, dealing with all the system messages so they can be properly triaged so they are not forgotten and that alerts are acted upon. This can require some dozen screens to manage at peak times, and also includes monitoring social media to see what users are saying about the service:

That’s quite useful because sometimes you’ll hear them saying ‘this is not very responsive’ or whatever, and they’ll say that to each other rather than going on a helpdesk and saying ‘i’ve got troubles on this’. We can be proactive, and we are.

Technological promiscuity

The system also provides forensics and analytics when things do go wrong, allowing them to go back and understand what happened, when, who did what, and who pressed which button, says Gibbs:

That’s particularly important to us because if there is ever an investigation, in the worst case by the Information Commissioner's office, then we have a real time audit to categorically prove that we were not complicit in some breach or data.

Gibbs also sees Splunk helping to provide the resilience to protect the UCAS systems from two particular threats that it faces – the learners themselves and the Universities.

He has a particularly descriptive term for the learners, which he calls "technologically promiscuous students", who can have personal systems that become well loaded with virii and malware which can often then try to invade once connected to UCAS.

Gibbs is keen to point out, however, that many of them have already developed good levels of technological hygiene. This is particularly the case when compared with the universities:

You must remember that we’re dealing with 400 universities that are hotbeds of the greatest computing expertise that we’re developing in the country, and that’s great while they’re still developing it and they’re positive and forward-minded. But we’ve also got to bear in mind that some individuals there want to be a bit playful, and we’ve got some of the best research minds out there as well, and that can sometimes go adrift. We have to be mindful that we’ve got a very technologically switched on and sometimes very promiscuous user base.

The next step Gibbs sees for UCAS is to use Splunk to start exploiting augmented intelligence and machine learning tools as a way of automating the application of best practices. It is about a year into the implementation of this and looking to do some of the advanced analytics detecting any abnormal behaviour. He is also interested in the potential of Splunk partner Insight Engines natural language querying system and the analytics tools that lie behind it:

That’s something we’re very much looking at. As a small to medium sized enterprise we can’t afford a huge army of deep panellists to be telling us about everything that could and couldn’t happen, and then a big army of triage nurses to work out which of those we should be concerned about or not. So the more we can get the analytics to model what it discovers to be normal behaviour, the better. I can absolutely see the day when you can say ‘Alexa, tell me who’s breaking into my network and where’s it coming from’.

It’s a trite example, but I think all this natural human interaction that’s now being tried with some of this next generation technology is absolutely going to be placed in front of what we’re using at the moment. It’ll be much better when you’ve got an analytical front end that you can talk to and ask the equivalent of Alexa what you need.

My take

Here is an organisation facing a wide range of different environments, including having to cope with a massive and almost instant hike in operational scale by users that can be suffering the slings and arrows of outrageous technological promiscuity. And all the while it has to maintain the highest levels of scrupulous security, all with the help of Splunk and its machine log-based analytics.

The goal of adding machine learning and AI to ensure a regime where best practices are also maintained as the operational baseline is a logical next move.