“Latent Bug” Was The Cause Of Cloudflare Global Outage, Says CTO Dane Knecht

Low Boon Shen
3 Min Read

If you’re surfing X/Twitter or asking a question on ChatGPT around late evening time on 18 November 2025, you likely encountered errors, or getting Error 500 outright on some websites stating that Cloudflare was downed. Indeed, the company responsible for the infrastructure behind many parts of the Internet was inoperable for several hours, though it wasn’t a result of cyberattacks – as Cloudflare CTO Dane Knecht attributed this to a β€œlatent bug” involved in a bot management feature.

The Reason Behind Cloudflare’s 6-Hour Outage

Latent Bug Was The Cause Of Cloudflare Global Outage, Says CTO Dane Knecht
A timeline on the error counts detected during incident period. Image: Cloudflare

In a detailed technical blog, Cloudflare stated that a β€œfeature file” that was used by its Bot Management system – a feature designed to manage incoming web crawlers and traffic – ended up doubling in size due to a faulty change in database system permissions, and the resulting file then ended up propagating to all the systems across the company’s web networks. As the software in these systems have a defined limit on how big a feature file should be (which the doubly-large file is now too large to fit), that β€œcaused the software to fail.”

This rare failure was initially thought as a cyberattack, as it coincided with a recent record-breaking Aisuru botnet attack; upon further investigation it was then determined the aforementioned cause was what broke the Internet. Once the error was found, the system was then recovered by restoring to a previous known-good feature file. The entire outage lasted around 6 hours since 11:20 UTC (7:20PM Malaysia Time), with most of the outrage resolved by 14:30 UTC (10:30PM MYT), and fully restored by 17:06 UTC (1:06AM 19/11 MYT).

Cloudflare said this is the worst incident it had been involved since 2019, and as a result, it promises fixes in its systems to prevent similar incidents in the future. These measures include hardening ingestion of its configuration files, implementing more global kill switches for features, removing error mechanisms that may overwhelm system resources, and reviewing failure modes for error conditions across all core proxy modules.

Pokdepinion: The resiliency of the Internet certainly isn’t something that can be taken for granted.

Share This Article
Leave a Comment

Leave a Reply

Your email address will not be published. Required fields are marked *