Alan Turing Institute’s Big Data ambitions boosted by cloud

SUMMARY:

How does the UK’s new national institute for data science plan to spend a gift of $5 million in Azure cloud-computing credits in 2017?

andrew-blake_dannylane-jpg-300x200-1
Andrew Blake

If 2016 was a big year for the Alan Turing Institute, then 2017 is the year in which the UK’s new national institute for data science must start to deliver on its grand promise to: 

change the world for the better.

So far, all the signs are promising. Established in 2015 by five key UK universities – Cambridge, Edinburgh, Oxford, UCL and Warwick – along with government research council the EPSRC [Engineering and Physical Sciences Research Council], the Institute has made some significant announcements in the last year or so from its headquarters at the British Library in London.

As Andrew Blake, its director, explains:

Yes, we’re a very new organisation, but I feel we’ve come a very long way in a very short space of time.

For example, the Institute has signed long-term partnerships with UK surveillance agency GCHQ, financial services company HSBC, engineering-related education charity Lloyd’s Register Foundation and microprocessor giant Intel.

It also embarked on a major recruitment drive during 2016, building a 150-strong workforce of mathematicians, statisticians and computer, data and social scientists, along with support staff, who will work together on the mission of advancing data science research and identifying real-world challenges to which its findings might be applied.

Many of these staff are drawn from the five founding universities, who are able to share their time between their own faculties and the Institute, which has done much to smooth the fast ramp-up, explains Blake. After a year focused on getting the Institute up and running, and conducting some pilot projects, its main business of research began in earnest in October 2016, he says.

Credits

Also in October 2016, came the announcement that software giant Microsoft was to donate $5 million in Azure cloud computing credits to the Institute. As Blake, a Microsoft Distinguished Scientist and former director of Microsoft Research in Cambridge, said at the time,

Azure cloud services will provide our data scientists with an easily accessible platform where they can prototype ideas with a fast turnaround of results, complementing local computing facilities available in the Institute’s five founding universities and national resources such as the supercomputer Archer supported by EPSRC.

Wanting to know more, I spoke to Blake to ask how the Institute was planning to use these credits and how that might impact on its work in 2017. He told me:

The world is changing. An organization like this, set up five years ago, would have probably needed to set up a substantial data centre. But we set out to be an institute in the cloud from the beginning. Our administrative systems, we immediately put on the cloud. A great deal of our scientific systems will be there, too, as we won’t be running a data centre of our own. People come here to our headquarters and look a bit astonished about that – they want to know where our data centre is – but pretty much all we have here is a kind of ‘switch room’ for the Internet.

Beyond the gift of those Azure cloud computing credits, Microsoft is also donating time and expertise as part of its collaboration with the Alan Turing Institute. A particular benefit for staff will be training on cloud-based approaches to data science, Blake reckons, so that they start to see their projects in a new context – one in which they have access to near-infinite data processing and storage capabilities:

So many of the kinds of studies we’ll be able to do now would have needed to be conducted on a far smaller scale in the past, due to limitations in computing resources. Now, we’ll be doing things on a far grander scale – and, of course, one of the things that’s great about the cloud is that it can sustain a burst of activity, so if all of our people need to access cloud computing at once, that’s fine, because it’s sort of averaged across a vast pool of resource.

One example here would be in machine learning, which is one of our major interests. Today, getting machines to learn using deep neural networks is a very resource-greedy process. And if researchers are all facing deadlines, then there’s likely to be a huge call on resources. If we were using our own hardware, it would quickly get maxed out. Now we have the luxury of not having to worry too much about the timetabling of resources.

Face-to-face collaboration

While a cloud set-up does allow staff to access resources remotely, he agreed, Blake is still keen to stress the importance of face-to-face collaboration in the Institute’s highly multi-disciplinary approach:

There’s a certain sort of magic that happens when people sit down and drink coffee together and discuss their work – particularly when we’re trying to encourage counters between different disciplines.

Some of our partners from academic environments may not have experienced those encounters so much in the past – the statisticians’ faculties may be located some distance from that of their colleagues in computer science, for example. They may not bump into each other and swap ideas on a regular basis, but we want to do things differently here.

For example, we’re working on some ideas for the early stages of data science – what people sometimes call ‘data wrangling’, getting data into decent shape for analysis. It’s a very time- and labour-intensive process that people tend to complain about a lot, but we’re already exploring how we can apply machine learning to data wrangling and that really encourages a real mixing of disciplines, some real cross-discipline collaboration.

Against a backdrop of such big plans, I’m wondering how long that $5 million-worth of cloud credits might be expected to last the Institute, even with the private clouds of five founding universities and the provision by Intel of further private-cloud services at its disposal? It’s a good question, Blake agrees, saying:

I really don’t know. We’re just going to see what kind of weight of usage it [Azure] gets and see how long the credits last us. But the nice thing about the partnership with Microsoft is that it’s not a ‘fire and forget’ thing. Microsoft wants to come along with us on various projects, so I see this as a long-term collaboration. We’ll just have to take that one as it comes, I suppose.

Image credit - Alan Turning Institute