CrowdStrike blamed faulty testing software for a botched update 8.5 million Windows machines crashed anywhere in the world, in one post Submit an event review (PIR). “Due to a bug in the Content Validator, one of the two [updates] despite having problematic data, it passed the inspection”, the company promised to take a number of new measures to prevent the problem from recurring.
Big BSOD (blue screen of death) cut off it affected many companies around the world, including airlines, broadcasting companies, the London Stock Exchange and many others. The problem forced Windows machines into a boot loop, requiring technicians to have local access to the machines to recover (Apple and Linux machines were not affected). Like many companies Delta Airlinesstill recovering.
To prevent DDoS and other types of attacks, CrowdStrike has a tool called Falcon Sensor. It ships with kernel-level content (called Sensor Content) that uses a “Template Type” to determine how it defends against threats. If something new comes up, it sends “Rapid Response Content” in the form of “Template Examples”.
The Template Type for the new sensor was released on March 5, 2024 and performed as expected. However, two new Template Samples were released on July 19, and one (just 40KB in size) passed verification despite “problematic data,” CrowdStrike said. “When received by the sensor and uploaded to the Content Translator, [this] An out-of-bounds memory read resulted in an exception that was triggered. This unexpected exception could not be gracefully handled, resulting in a Windows operating system crash (BSOD).”
CrowdStrike has promised to take a number of measures to prevent the incident from happening again. First, local developer testing, content update and rollback testing, stress testing, stability testing, etc. including more comprehensive testing of Rapid Response content. It also adds validation checks and enhances error handling.
In addition, the company will begin using a staggered deployment strategy for Rapid Response Content to avoid a repeat of the global outage. It will also provide customers with greater control over the delivery of such content and provide release notes for updates.
However, some analysts and engineers think the company should have implemented such measures from the start. Engineer Florian Roth “CrowdStrike should be aware that these updates are interpreted by drivers and may cause problems” Posted in X. “They should have implemented a tiered deployment strategy for Rapid Response Content from the beginning.”