Computers are so comfortingly binary. Software engineers work in a digital world, where most things in the end are in one of two states: 1 or 0, yes or no, true or false. It’s good to be digital.
Except when it’s not.
In the last week, we’ve seen two major Internet outages in which analog events caused major disruptions to the digital world. First, there were electrical storms in Northern Virginia. And this weekend, the earth officially started spinning a bit more slowly. So what happened?
The first thing that we have to know is that most of the Internet traffic in the United States goes through two hubs: Mae East and Mae West. Most of the major ISPs put their servers at Mae East and/or Mae West, and many of the large cloud providers have followed suit. When electric storms and the consequent power outages hit this weekend, well, it wasn’t pretty. Talk about analog intruding on digital! Amazon’s Web services took a big hit, along with sites like Netflix, Pinterest, Instagram, and Verizon’s FiOS service.
A few days later, the leap second happened. This one we could have known was coming. The International Earth Rotation and Reference Systems Service (IERS) announced that they would be adding a leap second to all atomic clocks. Basically, one minute on June 30th got an extra second. This happens occasionally, when the earth’s rotation changes. And it shouldn’t be a big deal, except that it was. It turns out that Linux, Java, and other software packages had a bug where an extra second caused big problems. So to account for the — very analog — earth’s rotation, the digital world saw big problems. Reddit, Mozilla, Gawker, FourSquare, Yelp, LinkedIn and others all had problems.
Most sites seem to have dealt with the problem using a time-honored debugging technique: rebooting the servers.
No matter how digital things seem in software, be aware of the analog world around us. We can’t say when or how, but it’ll come in and affect us one day!