We had opened a book over whether Apple’s programming genii would cock up over New Year. Apple has a history of being unable to cope with changes to summer time and sometimes moving time zones. It looks like Apple pulled its socks up and its devices didn’t break.
The same could not be said for Cloudflare which saw some of its servers fail. For those who came in late, Cloudflare protects a site’s web traffic by routing it through its network which comes with its own enhanced security. The outfit which claims “we make the Internet work the way it should”, offers CDN, DNS, DDoS protection and security but found that some of its servers failed to handle the added second.
The result was that users received an error message to say that servers could not be reached instead of seeing the page that they wanted to visit.
Cloudflare said that it fixed the problem within 90 minutes and explained the problem by saying: “At midnight UTC on New Year’s Day, deep inside Cloudflare’s custom RRDNS software, a number went negative when it should always have been, at worst, zero.
“A little later this negative value caused RRDNS to panic. This panic was caught using the recover feature of the Go language. The net effect was that some DNS resolutions to some Cloudflare managed web properties failed.”
The problem is said to have affected about a percentage of the requests its servers process during the glitch which is not a great fall over, but given the size of Cloudflare was a quite a few people for 90 minutes.
Analysis of the problem revealed that a mismatch between the time-stamps Cloudflare servers were expecting and the ones they got caused the system to ‘panic’.
Cloudflare said: “This problem was quickly identified. The most affected machines were patched in 90 minutes and the fix was rolled out worldwide by 0645 UTC. We are sorry that our customers were affected, but we thought it was worth writing up the root cause for others to understand.”