The cost of IT outages: from the buy-side
- IT outages are a fact of life and they will be costly - both financially and in terms of brand and reputational management. In this special 3 part series, we look at the impact of outages from different angles, beginning with the buy side perspective from two organisations with a strong dependency on technology.
IT outages are a fact of life and they will be costly - both financially and in terms of brand and reputational management.
I recently chaired a roundtable on behalf of law firm Pinsent Masons on the subject of outages which aired views from both the buy and sell side as well as some legal advice.
In this special 3 part series, we'll air some of the highlights from the discussion, beginning with the buy side perspective from two organisations with a strong dependency on technology.
The buy sideTullow Oil is one of the largest independent oil and gas exploration and production companies in Europe as well as Africa's leading independent oil company with basin opening discoveries in Ghana, Uganda and Kenya. Founded in 1985, it also has interests in South America and French Guiana.
As a major oil and gas company, IT outages cannot only have a financial and operational impact, but potentially loss of life, notes CIO Andrew Marks. He states:
"The costs caused by an IT outage can be both short and long-term, both of which need to be accounted and planned for. For example, in the manufacturing industry, an outage can lead to a day’s delay in production, which has a knock-on effect to suppliers down the chain as payments can be disrupted and staff pay impacted.
"After an outage, a business will be playing catch up in order to get back on top of orders and production, and as more companies use Just In Time supply chain models to keep costs down, the more disruptive an IT outage can be. In manufacturing, it may cost money to close a production line down, but it costs even more money to get it running again, play catch-up, and ensure the problem doesn’t strike again."
For Marks, governance - or rather lack of adherence to governance - is a major cause of outages:
"Assuming a company has viable governance policies in place, outages tend to occur when either people choose not to follow the rules, or they aren’t aware of them. Introducing a clear ‘fix’ to a black swan event – which outlines the exact processes all parties involved need to follow – should mitigate the impact of an outage.
"It’s essential that business put in place fallback mechanisms to pre-empt an outage. This disaster recovery planning can take the form of integrated supply chains – outlining alternative means to receive orders – or could also see replicated systems across various geographies and platforms (e.g., the cloud).
"In addition, putting the right processes in place, and knowing how quick you can be to respond to a black swan event, will dictate how much damage an outage can have on a business.
"It's about actually complying with your governance. Obviously that assumes your governance is appropriate to start with - and that's a harder nut to crack!"
Indeed some organisations choose to take greater risks than others, he notes:
"Some business actively chose not to invest in planning for failure, as they believe it will be cheaper to ride out the storm, face the shut down, and pay for business interruption insurance. Self protection comes at a price, but have all the costs been truly considered?
"Judgements tend to be made around how much we are willing to invest. The other view is, we've made a risk-based judgement and we made a mistake. We though we only needed to do a certain level of maintenance, but we actually needed to do more and therefore something failed. That could be maintaining the data centre environment so that the servers are fine, but the air conditioning units failed and as a results that led to a domino effect.
"Businesses need to look at their business critical services, and plan around them according. What is critical to the business’ running, what will ruin a business if it fails? Preparing appropriately can mitigate the problem, and minimise the ripple effect of an outage."
Who takes the blame?
Marks adds that 'blame' for outages rests with whoever is accountable for governance:
"Ultimately it's the CIO and if he/she is a board member then it sits with the board. If, like me, they report to an executive director, it sits with him. I employ a specialist who looks after governance and I work with my supply chain team to prepare contracts appropriately.
"I take an interest myself. We are active in contract management and supplier relationship management. It's not right or appropriate just to bring in lawyers to act on our behalf and sign a contract saying 'the lawyers did it, it must be fine'. There's a whole chain of command in between."
The relationship with the supplier is critical, suggest Marks, particularly when it comes to managing service level agreements:
Neil Bayles, head of IT, governance and performance management at QBE Insurance Group, also emphasises the role of good supplier relationship management in establishing appropriate service level expectations:
"I would get the contract out on a regular basis in a positive way, to talk about what we have been achieving and what we have been struggling with. Do we still need the supplier to be held to this? What if we move to a different level, how would the supplier feel about that? It's about having a genuine relationship."
"We're trying to separate off SLAs from core IT and calling it more a performance management function. The devil is in the detail. A lot of effort goes into detailed contractual frameworks that are actually very high level in reality. One thing that we've been pretty good at is not relying so much on contractual SLA and focusing more on building relationships with the service providers."
QBE Europe is one of the world's leading international insurers and reinsurers, headquartered in Sydney, Australia and operating out of 48 countries around the globe and with a presence in all key insurance markets:
"We are a heavily outsourced organisations. When you break down the barriers the business performance gets better. In those areas where we have lower level good relationships across teams, we have much better supplier management."
Appetite for risk
Beyond that, Bayles argues that it's essential for organisations to be honest and open about their appetite for risk as this will directly impact on their infrastructure investment:
"Most organisations suffer from lack of investment in infrastructure somewhere. Primarily it is about prioritisation of investment to stop things being substandard. You are always going to have priority calls to make. You're always going to have some areas that are out of date.
"If the business truly understands its technological priorities, it can plan accordingly. Investment is an ongoing struggle, and no CFO will sign off a bill for every failsafe under the sun. As such, businesses need to engage the IT team, identify the core systems, and protect them accordingly.
"What business can understand is their overall risk appetite. Most of the large outages are actually a series of single failures rather than one massive one. You need to understand the appetite of your organisation for risk. It may choose not to plan any mitigation but so long as they are aware of the risk.
"Ultimately the CIO is accountable. In reality, the difficulty to overcome as a CIO is to ensure that the business becomes accountable for its own issues. It comes back to transparency and the ability to articulate that. The business needs to understand what risks they are taking."
Bayles accepts however that ultimately the IT team will be held accountable for technology failures, but argues that the business side of the organisation needs to step up as well:
"The IT team should take responsibility in the event of an IT outage. However, in order to manage an outage as efficiently as possible, supply chains, contracts (both internal and external) and lines of responsibility need to be transparent.
"Tracing the line of responsibility in order to find a cause is easiest if clear contractual boundaries are in place. All too often, when there are gaps in understanding about who needs to do what, a blame culture takes over – only exacerbating the issue.
"Identifying and managing operational risk is the key to limiting the repercussions of an IT outage. Getting the right processes in place should be a priority, but in order to do this a business needs to be more integrated with its IT department."
In part 2 of this series, the sell-side perspective from the IT vendor community.