news

The Crowdstrike outage and global software's single-point failure problem

Digital boards are seen due to the global communications outage caused by CrowdStrike, which provides cyber security services to US technology company Microsoft, it was observed that some digital billboards in Times Square in New York City, United States, displayed a blue screen and some screens went completely black on July on 19, 2024.
Selcuk Acar | Anadolu | Getty Images
  • The CrowdStrike software bug that took down global IT infrastructure exposed a single-point-of-failure risk unrelated to malicious cyberattack.
  • National and cybersecurity experts say the risk of this kind of technical outage is increasing alongside the risk of hacks, and the market will need to adopt better competitive practices.
  • Government is also likely to look at new regulations related to software updates and patches.  

The frequency of large-scale attacks on corporate enterprise IT is increasing. That's not unusual or unexpected as companies spend heavily on cyber defense in an asymmetric war against hackers who can string together a few lines of code and wreak havoc.

But the largest IT outage ever on Friday resulted from a CrowdStrike software bug uploaded to Microsoft operating systems, rather than any malicious attack. The bug came from an increasingly common tech threat that that gets less attention than malicious attacks: the single-point failure — an error in one part of a system that creates a technical disaster across industries, functions, and interconnected communications networks; a massive domino effect. 

Earlier this year, AT&T had a nationwide outage attributed to a technical update. Last year, the FAA had an outage that occurred after a single individual replaced a critical file in a route update (now, the FAA has a backup system to prevent that from happening again).

"It's more frequent even when it's just routine patching and updates," Chad Sweet, The Chertoff Group co-founder and CEO and former Chief of Staff at the Department of Homeland Security, told CNBC on Friday.

Single-point failure risk management is an issue that companies need to plan for and protect against. There's no software in the world that gets released and doesn't later need to be patched or updated, and there are best security practices that exist for the period of time well after a production release that cover the ongoing software maintenance, Sweet said. 

Companies that the Chertoff Group works with are closely reviewing software development and update standards in the wake of the CrowdStrike outage. Sweet pointed to a set of protocols the government already provides, the SSDF (Secure Software Development Framework), that may give the market an idea of what to expect as Congress starts inspecting the issue more closely. That's likely after the recent string of incidents, from AT&T to the FAA and CrowdStrike, as the single-point technical failure has now clearly impacted citizens' lives and the operations of critical infrastructure on a widespread basis.

"Get ready on the corporate side," Sweet said.

Aneesh Chopra, Arcadia chief strategy officer and former White House chief technology officer, told CNBC on Friday that critical sectors including energy, banking, health care and airlines have separate regulations overseeing risk, and measures may be unique in the most regulated sectors. But for any business leader the question now is, "Assuming systems go down, what is plan B? We will see lots more scenario planning and if this is not Job No. 1, it is Job No. 2 or 3 to have those scenarios outlined," he said. 

Unlike many issues in D.C., Chopra noted there is a bipartisan commitment to issues of critical infrastructure and systemic risk, and technical standards are a "hallmark" of the U.S. system. Chopra predicted efforts aimed at understanding interdependent digital systems and single-point of failure prevention.

Chopra believed "improving competition" could also strengthen accountability in the IT space. The business-to-business software space is highly concentrated and reliant on single providers, like CrowdStrike.

"If there is a mechanism to update in a more open and competitive way there might be pressure to make sure that that is done in a manner that has i's and t's dotted and crossed," Chopra said.

Sweet said that will inevitably lead to business world concerns about the risk of overregulation. While there is no way to know whether there was a way for CrowdStrike to operate using a more open process to detect the single-point failure, he said it is a legitimate question to ask.

The best method to avoid overregulation, according to Sweet, is to look to market-reinforcing mechanisms, like those in the insurance industry. "The short answer is, let the free market do it, through things like the insurance industry, which will reward good actors with lower premiums," he said.

Sweet also said more companies should embrace the idea of "anti-fragile" organizations, as he does with his clients, a term coined by risk analyst Nassim Nicholas Taleb. "Not just an organization that is resilient after a disruption, but ones that thrive and innovate and outpace competitors," he said.

In Sweet's view, any single legislation or regulation would be hard pressed to keep up with rising both malicious attacks and technical updates that are pushed through with unintended consequences.

"It's a wakeup call for sure," Chopra said.

Copyright CNBC
Contact Us