Main content

Digital leadership - outages and risk management require a cultural focus

Mark Chillingworth Profile picture for user Mark Chillingworth February 12, 2024
Summary:
Digital leaders and CIOs identify the biggest risk that is not the official register

risk
(Pixabay)

In 2023, enterprise technology hit the mainstream headlines for all the wrong reasons. During the summer, the UK National Air Traffic Service (NATS) suffered an outage that grounded flights across Europe at a peak time for travellers. In October, Barclays bank customers were unable to access their accounts as the app went offline, and then fellow bank HSBC had a similar issue on Black Friday, preventing customers from buying Christmas gifts.

Though clearly technological issues, digital leaders believe major outages often speak of a cultural problem in organizations. That cultural problem has led to online services not being resilient and prone to failures at times of high demand. 

Culture problem 1 - Risk registers

Organizations have a regulatory demand to keep a register of risks, including those posed by technology. Discussing the 2023 outages, digital leaders pondered whether the technical issues that caused these damaging incidents were on the risk register. The reason for the question - all too often, CIOs find that risks from legacy technology are put on the register, but no action is taken. Tiffany Willcox, CTO of medicine firm Datapharm, says: 

It is not acceptable to say it is on the risk register. It is our responsibility to make sure people understand the risk and to up the ante if needed.

Gabe Barratt, CIO for US headquartered business process outsourcing firm TSI, says: 

Risk management all too often has nothing to do with managing the risk and everything to do with the reporting on a risk.

Femi Bamisaiye, until recently CIO of Aviva General Insurance says that, once again, it is about the culture of the organization towards the risk register and acting upon it: 

Operational resiliency is regulated and mandated, and that means that it is there with the top of the organization. I have been reporting on operational resilience at least once every three months, and they want to know what we are doing about risk.

Bamisaiye points out that last April the former CIO of bank TSB was fined by the Prudential Regulation Authority (PRA) in the UK because, in its words, he failed to take reasonable steps to ensure that TSB adequately managed and supervised appropriately its outsourcing arrangement in relation to its 2018 IT migration programme. At the time this was headline news across the UK. 

Clearly, Bamisaiye has witnessed a good culture towards risk registers, but for other organizations, that culture is yet to arise, as Gerard McGovern, Director, Digital Strategy at the Tony Blair Institute for Global Change, says: 

The biggest problem with risk registers is that it becomes that bucket that we all have in our jobs called stuff, and we put it to one side, and it is only gone through every quarter, so there is only scrutiny four times a year.

Culture problem 2 - Misunderstood technology

Our digital leaders worry that technology outages are not understood in terms of how devastating they are to the end consumer, which is hard to fathom when you see images of airports jammed up with passengers unable to fly or if another financial services provider wins business from you. But the digital leaders have a point: there still seems to be a disconnect between business being lost by technology failure and for other reasons. This is not the case in all sectors. As a result, technology outages don't have the same level of culpability. McGovern asks: 

Are there repercussions on the CEO when they have a risk notified to them, and they do nothing about it?

He asks based on his own experience in the health and charity sector, where the Wannacry attack on the UK NHS led to major outages, missed operations and distress to patients, he says: 

If there are no repercussions, then there is no motivation to be better.

Kevin Gohil, a former CIO and now AI, data and digital specialist, says many of the problems stem from that old favourite, poor alignment and communications between the CIO and the rest of the organization: 

There is a degree of IT being kept at a distance because IT is really hard. But there is a responsibility on us as technology leaders to keep conveying the importance of what technology means to the business.

Looking back at his technology leadership career, Gohil says he tackled this problem by sitting with the CFO and placing a monetary value on every issue of technology risk. 

Culture problem 3 - Poor transparency

Not being able to use online banking services and credit cards due to an "internal system issue," as HSBC called it on Black Friday 2023, is a nasty surprise for customers, but digital leaders state that the causes of these issues are often known in an organization. In some cases, and we are not saying this was the case at HSBC, there can be a culture of fear that prevents transparency and technical issues from being discussed openly. Barratt at TSI agrees: 

If we don't have a culture that allows people to speak up and say 'I think something may go wrong' and then doing something about what they said, means people will stop speaking up.

Software engineer turned entrepreneur Sylvain Hellagouarch agrees and says: 

Organizations tend to go back to a steady state, and if you interrogate that, you open up a Pandora's box, and people don't like that. But digital resilience is about consciously opening Pandora's box and saying, 'We can wait for an incident and learn from that, but that is not very cost-effective'.

Having witnessed this behaviour,  Hellegouarch founded Reliably, a platform to enable activities that lead to digital sustainability. An example he cites is that when an organization needs to make cuts, it can experiment with 5% and then 10% cuts to see what strain it puts on the teams and technology. This prevents the cuts from creating service-level gaps but still enables cost savings to be delivered.

Digitally resilient

Organizations demand that their customers adopt digital methods; in the case of banks in the UK, there is a widely publicised round of bank branch closures taking place, with online banking replacing the branch. This is fine until it's not, as was the case for Barclays and HSBC customers, who felt let down. Digital leaders and their bosses around the executive leadership table, therefore, have to ensure that the digital business lines are robust. 

Felipe Peñacoba Martinez, CIO for Revolut Bank in the EU, says this is, once again, about culture: 

At Revolut, we have the concept of you build it, you run it. By having an operating model where the teams that built the solutions know that they are going to run them 24/7, then for their own sake, they know they have to consider resiliency and scalability from the outset and not as an afterthought.

Ravi Nar, who has been CTO with a number of start-ups and large organizations, says enterprises should consider reorganizing technology into four team types, which allows digital leaders and their staff to reduce excessive workloads (team cognitive load) which can lead to better flow of value and reduced risks: 

It is about knowing when you need to break up large solution and software teams. The staff in these teams are struggling to support and improve the software they have been given because of factors such as complexity and size. The effect of this burden being long lead times for resolution of issues and reputational damage, to name a few. Ultimately, this approach is about evolving the structure of your organisation around value delivering teams.

Nar’s recommendation and experience follows the lines of Team Topologies, an approach to optimize value delivery of the organization by working on organizational design alongside creating the right culture. One of the benefits of this approach is the reduction of team cognitive overload and, therefore, errors and risks.

My take

It is easy to consider the outages of 2023 as technology failures, but as forward-thinking business technology leaders express, it is often a technical failure brought about by a problem in the culture of an organization. Whether the banks will reveal the causes of their outages remains to be seen (NATs has), but any analysis and remedy plan must look at the culture with equal weight to the technology risks. 

Loading
A grey colored placeholder image