We’ve all heard a big brand company suffer a major security breach and then go on to exclaim that it was a ‘one off’ or that there was ‘nothing that could be done’.
However, this attitude doesn’t sit well with John Lewis IT security manager Geordie Stewart, who believes that single security events that cause far reaching damage more often than not don’t tend to come out of the blue.
There tends to be warning signals, or precursors, that companies just aren’t picking up.
Speaking at a recent Splunk conference in London, Stewart explained to delegates that the retail giant found itself in a similar situation not that many years ago, where it found that it was receiving so many low value alerts that it was hard to see the wood for the trees.
However, John Lewis is now using Splunk as a single view into the company’s security operations, which allows us to have a better understanding of what is working and what isn’t. It also has set up best practice responses as a result of having the system in place.
I’ve seen a lot, I haven’t see it all, but I’ve seen a lot. I’ve seen a lot of things that companies don’t talk about in public, a lot of things that tend not to be discussed openly.
[John Lewis] is really built on one thing, which is customer service. It’s very important to us, the customer experience. So we know as well that security is very important to a lot of our customers, which is why it is critical to us.
Stewart said that, as one would expect, John Lewis has a number of large e-commerce websites, for which it has to accept payments from customers on. A lot of these payments are accepted via credit card, which Stewart said that as part of that “devil’s bargain” means that the banks get to tell John Lewis what to do.
For example, in 2011 the banks warned John Lewis that it needed to do something with its logs, in that it wasn’t enough to save them and store them on tape somewhere. Instead the retailer needed to be actively looking at them.
To do this, John Lewis decided to implement the RSA enVision platform, which Stewart describes as “awful”. He said:
Stewart said that the key challenge as an information security manager is trying to figure out “what the heck is going on in an organisation”. A problem which he describes as universal. Stewart added that information security tends to be dedicated to a head office function, where security officers sit in their “ivory towers”, but have actually little idea of what’s going on downstairs, let alone at other sites. He said:
It was impossible to find what you were looking for. As a result it was underused. In 2014 it ran out of support, we looked around to see where we go from here. For us it was a bit of a no brainer because Splunk was being used on our e-commerce platform, JohnLewis.com, and it was very successful there.
We started with PCI and just trying to meet the banks’ requirements. But for us it has become a lot more than that.
The battle really is to understand what is going on, what can’t we see? We know that we have known unknowns, we know that we have unknown unknowns as well.
There’s always a huge amount going on in the company that you’re not aware of, but potentially that’s where the danger is.
Stewart explained that whilst John Lewis was running RSA enVision, the company never really mastered the search and investigated capabilities. He said that it was really difficult to find what you were looking for, and if you did find it, it was a complicated process to navigate.
However, Stewart believes that these processes are important, as big security events don’t just come out of the blue. He believes that there is always a lead up and companies should be monitoring for these valuable triggers. Stewart said:
The context for this is that bad things don’t just happen. In all of the major incidents I’ve reviewed, at all of the different companies I’ve worked for, there have always been precursors.
Single, bad, black swan events don’t just happen out of the blue. If you look into the details of all the near misses and the unsafe acts behind that, there has always been a pedigree. The signs have been there, it’s just that nobody had been counting, nobody saw, nobody knew what they were looking at, nobody joined them together.
However, Splunk has offered John Lewis the capability to do this. He added:
Splunk gives you the opportunity to understand and recognise whether those precursors for the thing you really care about are happening, how often they’re happening, what’s the pattern, if you were to intervene how would your intervention be most successful?”
Traditional security people had a bad rap for sitting in their head office and banning Facebook, banning USBs, banning Tuesdays. This gives you the opportunity to be a bit more nuanced in your approach. Instead of going with a best practice, everyone else is doing it, you’ve got the opportunity here to understand your pain points. What are the things that are happening in your organisation that correlate with the bad things that you’re trying to stop happen?
A single view
Stewart said that the big answer for John Lewis in solving that problem has been a “single pane of glass”. He said that two years ago to string together a view of the security relationship between a major incident and unsafe practices would involve looking and correlating across a number of different systems. This is now more easily solved with Splunk. Stewart said:
Splunk for us gives us the opportunity to do it in one view. We can correlate across application players, network players, operating system layers, all in one go, without those model tools.
The opportunity with Splunk is to work out what is ‘normal’ for your application. We looked at our Google logon authentication and we worked out: what does normal look like? If we had 100 failed logons on a Monday, is that normal? Should we do something?
It also enables us to ask the right questions. For example, what’s that jump between Tuesday and Wednesday in the morning? Was there something bad happening? Was there some testing going on? It allows us to ask the right questions to get to those unknowns.
Stewart added that Splunk, in turn, allows John Lewis to understand what an effective alert looks like. In
other words, John Lewis is able to put in a threshold for alerts, and for example, say that it should never receive more than 100 failed logons per hour for a system. If the failed logons goes above that number, it will generate an alert.
He added that the key problem with the RSA platform had been that the system would produce sometimes millions of alerts a day for something that was highly unlikely to be of value. Whereas, with an understanding of what the application should look like with Splunk, combined with the thresholds, the alerts are now of much higher importance.
Stewart added that Splunk has given John Lewis the opportunity to put in place some best practice ‘next steps’ if one of these high value alerts do come through. He said:
What we’ve done in working out what normal looks for applications, is work through these layers. And we have emails that go to people that can take action and we follow rules. We have a plain english explanation of what’s going on. Not some kind of error code. If possible, we hyperlink to operational procedures.
One of the other key things is that if you really need for each of your layers to work out in advance what you’re expecting people to do. If you can’t figure out the response to an alert at 2pm on a Tuesday with everybody around the table, there is no way an operator is going to know what to do any better than you do at 5am on a Sunday. That’s a key takeaway. Every alert needs to have a planned set of actions, even if it’s only a skeleton.