An estimated 8.5 million Windows systems worldwide failed at the end of last week due to a faulty update from CrowdStrike. Experts are calling this incident the biggest IT glitch of all time. The US cybersecurity company has now outlined how this happened in an updated statement on its website.
Please follow us on Twitter and Facebook
The glitch was reportedly triggered by a configuration update for the CrowdStrike Falcon sensor, which was distributed using a process called Rapid Response Content by the manufacturer. This process is designed to address new cyber threats as quickly as possible. According to CrowdStrike, new configuration data is provided in the form of proprietary binary files.
On February 28, CrowdStrike introduced a new template type called Inter Process Communication (IPC) with version 7.11 of the Falcon Sensor. Its task is to detect new attack techniques that exploit named pipes. New detection data is provided via so-called IPC template instances. In March and April, CrowdStrike says it distributed several of these template instances without any issues.
Content Validator Did Not Detect Errors
However, things turned out differently on July 19. On that day, CrowdStrike distributed two more IPC template instances. According to the manufacturer, their contents were checked by a so-called Content Validator, but one of the two instances contained incorrect data yet still passed the validation.
Based on previous testing and the lack of issues with the deployment of previous IPC template instances, CrowdStrike relied on the checks performed by the Content Validator and released the two instances for use on July 19. They were distributed to customer systems between 6:09 a.m. and 7:27 a.m. German time.
explains CrowdStrike.
This exception could not be handled properly and occurred on all Windows PCs with the CrowdStrike Falcon sensor installed that were online during the mentioned time window. The result was blue screens (BSOD) and system failures.
Read Also: Elon Musk Bans Crowdstrike After Global Outage
CrowdStrike Promises Improvement
In its announcement, CrowdStrike promises to test updates via Rapid Response Content more extensively in the future to avoid such incidents. Tests will be carried out at various levels. The manufacturer mentions local developer tests, stress tests, fuzzing, fault injection, stability tests, interface tests, and additional tests for content updates and rollbacks. Additionally, error handling on the content interpreter side will be optimized, and CrowdStrike plans to expand the Content Validator’s tests.
In the future, updates will be distributed in stages via Rapid Response Content. Customers will be able to choose how quickly they receive updates. To help customers make informed decisions, CrowdStrike will provide details of individual updates via release notes. By monitoring sensor and system performance, the manufacturer also aims to detect faulty updates early to respond quickly.
CrowdStrike also promises to publish a full root cause analysis once the company completes its investigation into the incident. The US House of Representatives’ Homeland Security Committee is demanding details on the global IT outage and has invited CrowdStrike’s CEO to a hearing.