Some tools help users store and analyze data. More than a few help diagnose problems with IT infrastructure.
And others unleash pure anarchy.
Netflix, in order to better stream bad 80’s action movies in high definition, built a tool known as Chaos Monkey, which deliberately initiates failures within applications running on Amazon Web Services (AWS). However, Chaos Monkey’s ultimate purpose isn’t to disable Netflix and thus spare the world the horror of Chuck Norris on streaming anywhere at anytime: instead, the tool is meant to help engineers make their cloud-based services more resilient.
“Chaos monkey is a service which runs in the Amazon Web Services (AWS) that seeks out Auto Scaling Groups (ASGs) and terminates instances (virtual machines) per group,” read a July 30 posting on The Netflix Tech Blog. “Chaos Monkey only runs within a limited set of hours with the intent that engineers will be alert and able to respond.”
In other words, it’s like a caffeinated chimpanzee loosed in a company’s IT department, armed with a pair of bolt cutters and a USB stick loaded with a particularly nasty virus.
“Within an ASG, Chaos Monkey will select an instance at random and terminate it,” the blog continued. “The ASG should detect the instance termination and automatically bring up a new, identically configured instance.” For those not using Auto Scaling Groups, “that should be the first step to making your application handle these isolated instance failure scenarios.”
Chaos Monkey is either Opt-In or Opt-Out; if one selects the latter and does nothing, Chaos Monkey will rampage. The IT administrator setting up the tool can also adjust the probability of it running wild at inopportune moments. “With a 20 percent probability, Chaos Monkey would terminate one instance a week on average,” the blog added. “In practice, it might be 2 days in a row followed by 2 weeks of no terminations, but given a large enough sample it will terminate weekly on average.”
Netflix is adapting its Chaos Monkey dashboard for open-source use. The company also provides a documentation wiki for the tool, which can be found here.
For IT vendors offering data-intensive applications over the cloud, including analytics and B.I. platforms, Chaos Monkey could help make things ultra-resilient.
Image: Richard Susanto/Shutterstock.com