Too big to fail as well? Systemic risks of cloud dependency and an existential enterprise threat

Profile picture for user kmarko By Kurt Marko July 29, 2020
Economic modelling research into systemic risk suggests that financial services companies aren’t the only one’s that might be “too big to fail.

(Pixabay )

The financial crisis of 2007-2008 was a teachable moment about the obscure, under-appreciated risks of highly interconnected and interdependent systems. Prior to the almost complete freeze of the world’s financial system, few outside of the rarified environs of bond traders and financial speculators had heard of loan tranches, CDSs and synthetic CDOs.

However, once home prices began to fall, they toppled hedge funds that made highly-leveraged bets on the booming market in mortgage-backed securities. Once the house of cards started falling, the damage cascaded to huge brokerages and insurers like Bear Stearns, Lehman Brothers and AIG, which threatened the entire financial system and gave everyone a lesson in how interconnected the financial system truly was.

The tech world has provided similar lessons in systemic risks when highly interconnected systems like DNS, cloud infrastructure or online marketplaces fail. For example, the 2016 bonnet attack on Dyn DNS blocked or hampered name resolutions for dozens of websites including mega sites like Amazon, Netflix, Paypal and Twitter. While some of these restored service by switching to backup providers, the scope and ramifications of the attacker were, in the words of one security researcher, “really ominous.” 

Last year, I wrote about another incident with a wide blast radius when a Google Cloud outage disrupted its Gmail, G Suite and YouTube products, along with services from Apple iCloud, Snapchat and others. I noted that many of these disruptions could have been averted via designs that used available cloud redundancy features, however, what if the outage had been more extensive? What if a system-wide outage at one cloud or network provider set off a chain reaction of failures like the 2007 financial crisis? Is such a systemic shock possible or are such services so interconnected that we don’t understand the dependencies and linkages? Earlier this year, two researchers at RAND decided to find out via some innovative research into systemic risks from inter-firm networks and supplier-customer ties. 

The Gordian knot of cloud services

Economists often use the term “too big to fail” when describing financial firms whose failure would have such catastrophic implications for the broader economy that it would be irresponsible to allow them to become insolvent. The term came into common use during the 2007 financial crisis to justify huge loans and grants to firms like AIG, Citibank and Fannie Mae. RAND researchers Jonathan Welburn and Aaron Strong used the financial crisis as a cautionary example when summarizing their research findings in a recent column that questions whether some technology firms have become “Too interconnected to fail.” 

The parallels of today’s online economy to the 2007 financial one run deeper than most realize since that crisis extended to manufacturing firms. For example, automobile manufacturers received bailouts for fear the collapse of one could take out an entire network of parts suppliers. As Welburn and Strong note in their report (emphasis added),

“f one looks more closely at the 2008 crisis, the broader economy has already been a driver of systemic risk. In an effort to prevent a deeper crisis, Chrysler, Ford, and General Motors— the so-called “Big Three” American automakers—each received emergency loans to abate a larger crisis (Goolsbee and Krueger, 2015). Although the need for rescuing Ford and General Motors was apparent—both were under pressure from sharply decreased auto demand and had the potential to drive significant job losses and aggregate losses if they failed—the need for rescuing Chrysler was, above all, about systemic risk. Leading up to the crisis, it had been estimated that of Chrysler’s suppliers, 54% and 66% were also suppliers to Ford and General Motors, respectively (Goolsbee and Krueger, 2015). As a result, the risk of a Chrysler failure was the risk that it could pull down Ford, General Motors, or both, by first toppling shared suppliers.

Unlike 2008, no one today is worried about Amazon, Apple or Google going out of buinesss. Instead, the risk in the online, cloud-based economy is that sustained, wide-scale outages at one could quickly disturb business throughout the economy. As Welburn and Strong put it in their column (emphasis added):

Just like CDOs, however, the cascading network effects present a much larger risk to the whole economy. A single disruption to AWS, perhaps due to a large-scale cyberattack, would instantly be a cross-sector problem, potentially shutting down swaths of the economy. And private enterprises wouldn’t be the only ones affected: GovCloud, a tailor-made version of AWS, provides cloud services for the Defense and Justice departments and the Internal Revenue Service.

Measuring enterprise connectedness

The RAND paper describes a mathematical model to estimate the connectedness of enterprise production networks, aka supply chains, using both publicly available data and statistical inference. As the authors note, “Production networks provide a channel for economic contagion,” adding that (emphasis added):

This mix of traditional economics and data science let us see how firms are connected within a network across sectors—and thus which ones represent central hubs of the economy. The most-connected companies, if hit with a seemingly isolated revenue shock, could cause outsize losses to the whole US economy.

I won’t bore you with the details, but know that the paper has the requisite number of Greek-infused equations. As they describe it, the report’s methodology extends inter-firm risk analysis from:

The general study of systemic risk and aggregate shocks and into the study of specific events. For example, firm-level analysis could heighten the understand- ing of the potential aggregate impact of localized events, such as natural disasters. The estimation of interfirm production networks in this report are a first step to true firm-level analysis.

What’s the potential damage?

The result of all the statistical inference and graph theory is a model that estimates the distribution of total losses for a one percent shock to an individual firm’s output (revenue). Thus, the report estimates that if Amazon were offline for one percent of the year, or about 4 days, the total lost revenue to both it and its customers would be $77 billion - or 54% of Amazon’s total revenue. In contrast, a one percent distribution at GoDaddy, a domain registrar and Web hosting company, would create disproportionately large aggregate losses of 18-times its total revenue. 

Charting the data for hundreds of firms shows a highly skewed distribution with a very long tail demonstrating, in the words of the report,“many firms are relatively unimportant when considering systemic outages, but there are a small number of firms of critical importance.”

(Per source)

The following table summarizes the companies posing the largest absolute and relative systemic risk based on a one percent disruption of their operations. Note that even a loss ratio less than one indicates a sizeable economic multiplier (54-times in the case of Amazon).

(Per source)

Counter to perceptions honed during the financial crisis, the most interconnected companies are often in retail, communications, electronics and insurance, while those with the largest loss multiplier typically provide business, engineering or production services. As the authors conclude (emphasis added):

Firms posing systemic risk have more heterogeneity than the focus on financial firms has led many to believe. Instead, our estimations demonstrate that many of the most central firms—and thereby firms posing the risk of largest aggregate losses following an idiosyncratic shock—are of varying sizes and in varying industries. Furthermore, focusing on those aggregate losses as a ratio of firm revenue revealed how some firms have a disproportionate impact on the economy through a multiplier effect borne out of network ties. Of those heavily interconnected firms, we observed that many represent top financial firms (e.g., Bank of America, J.P. Morgan) while others represent top technology (e.g., Alphabet, Amazon, Apple, Cisco), telecommunica- tions (e.g., AT&T), and health care (e.g., UnitedHealth Group, CVS Health) firms.

My take

Cloud and communications services have proven indispensable for enterprises and employees seeking to maintain a semblance of normalcy and functioning business operations during the coronacrisis. Unfortunately, as the RAND research and various outage incidents demonstrate, tying one’s business to the fortunes of another creates new revenue and operational risks that are beyond one’s control. The RAND work marks the first effort at exposing sources of systemic risk outside the financial sector and quantifying those risks across the entire economy.

If the financial engineering before the 2007 crisis taught us anything, it’s that any highly interconnected system designed to eliminate risks contains obscure, unperceived threats that only manifest themselves after the damage is done. Online marketplaces, app stores, cloud, communication and application services could be this decade’s version of CDOs and CDSs, but with ramifications across a broader swath of the economy. The RAND authors provide a valuable early warning of potential disruption and the need for long-term mitigation planning when they write (emphasis added):

The highly networked nature of the economy has the potential to amplify known sources of systemic risks and add new ones. … Advanced economic modeling can locate the central nodes in the network—those that, if disrupted, will lead to significant economic damage. After the Covid-19 pandemic, which is accelerating the transition to a virtual economy, policy makers need to broaden their definition of systemic risk.

Instead of waiting for a government commission to conduct a post mortem on some future cloud-based economic contagion after the damage is done, organizations must include risk mitigation and redundancy measures into all future deployments of cloud, application and communications services.