The Crowdstrike outage and global software’s single-point failure problem

The CrowdStrike software bug that took down global IT infrastructure exposed a single-point-of-failure risk unrelated to malicious cyberattack.
National and cybersecurity experts say the risk of this kind of technical outage is increasing alongside the risk of hacks, and the market will need to adopt better competitive practices.
Government is also likely to look at new regulations related to software updates and patches.

📺 24/7 South Florida news stream: Watch NBC6 free wherever you are

The frequency of large-scale attacks on corporate enterprise IT is increasing. That's not unusual or unexpected as companies spend heavily on cyber defense in an asymmetric war against hackers who can string together a few lines of code and wreak havoc.

But the largest IT outage ever on Friday resulted from a CrowdStrike software bug uploaded to Microsoft operating systems, rather than any malicious attack. The bug came from an increasingly common tech threat that that gets less attention than malicious attacks: the single-point failure — an error in one part of a system that creates a technical disaster across industries, functions, and interconnected communications networks; a massive domino effect.

The Hurricane season is on. Our meteorologists are ready. Sign up for the NBC 6 Weather newsletter to get the latest forecast in your inbox.

Earlier this year, AT&T had a nationwide outage attributed to a technical update. Last year, the FAA had an outage that occurred after a single individual replaced a critical file in a route update (now, the FAA has a backup system to prevent that from happening again).

"It's more frequent even when it's just routine patching and updates," Chad Sweet, The Chertoff Group co-founder and CEO and former Chief of Staff at the Department of Homeland Security, told CNBC on Friday.

Single-point failure risk management is an issue that companies need to plan for and protect against. There's no software in the world that gets released and doesn't later need to be patched or updated, and there are best security practices that exist for the period of time well after a production release that cover the ongoing software maintenance, Sweet said.

Money Report

news 2 hours ago

China's wealthy are increasingly looking overseas for business investment opportunities

news 2 hours ago

CNBC Daily Open: Slow hiring doesn't mean increased layoffs

Companies that the Chertoff Group works with are closely reviewing software development and update standards in the wake of the CrowdStrike outage. Sweet pointed to a set of protocols the government already provides, the SSDF (Secure Software Development Framework), that may give the market an idea of what to expect as Congress starts inspecting the issue more closely. That's likely after the recent string of incidents, from AT&T to the FAA and CrowdStrike, as the single-point technical failure has now clearly impacted citizens' lives and the operations of critical infrastructure on a widespread basis.

"Get ready on the corporate side," Sweet said.

Aneesh Chopra, Arcadia chief strategy officer and former White House chief technology officer, told CNBC on Friday that critical sectors including energy, banking, health care and airlines have separate regulations overseeing risk, and measures may be unique in the most regulated sectors. But for any business leader the question now is, "Assuming systems go down, what is plan B? We will see lots more scenario planning and if this is not Job No. 1, it is Job No. 2 or 3 to have those scenarios outlined," he said.

Unlike many issues in D.C., Chopra noted there is a bipartisan commitment to issues of critical infrastructure and systemic risk, and technical standards are a "hallmark" of the U.S. system. Chopra predicted efforts aimed at understanding interdependent digital systems and single-point of failure prevention.

Chopra believed "improving competition" could also strengthen accountability in the IT space. The business-to-business software space is highly concentrated and reliant on single providers, like CrowdStrike.

"If there is a mechanism to update in a more open and competitive way there might be pressure to make sure that that is done in a manner that has i's and t's dotted and crossed," Chopra said.

Sweet said that will inevitably lead to business world concerns about the risk of overregulation. While there is no way to know whether there was a way for CrowdStrike to operate using a more open process to detect the single-point failure, he said it is a legitimate question to ask.

The best method to avoid overregulation, according to Sweet, is to look to market-reinforcing mechanisms, like those in the insurance industry. "The short answer is, let the free market do it, through things like the insurance industry, which will reward good actors with lower premiums," he said.

Sweet also said more companies should embrace the idea of "anti-fragile" organizations, as he does with his clients, a term coined by risk analyst Nassim Nicholas Taleb. "Not just an organization that is resilient after a disruption, but ones that thrive and innovate and outpace competitors," he said.

In Sweet's view, any single legislation or regulation would be hard pressed to keep up with rising both malicious attacks and technical updates that are pushed through with unintended consequences.

"It's a wakeup call for sure," Chopra said.