Netflix Open-Sources ‘Janitor Monkey’ AWS Cleanup Tool

Netflix has released “Janitor Monkey,” an open-source tool for killing old Amazon Web Services (AWS) instances that began life as an in-house product.

While those hosting a private data center will have little use for this scrubbin’ simian, those enterprises with a public cloud can add Janitor Monkey to their administrative bag of tricks. The premise behind the tool is a simple one: while AWS allows for easy (and cheap) experimentation, it’s easy for even the most diligent IT pro to rack up unnecessary costs when they forget to shut off a particular instance.

“It is pretty easy to lose track of the cloud resources that are no longer needed or used,” Michael Fu and Cory Bennett of Netflix’s Engineering Tools team wrote in a recent corporate blog post. “Perhaps you forgot to delete the cluster with the previous version of your application, or forgot to destroy the volume when you no longer needed the extra disk.”

While Netflix’s Asgard tool—open-sourced in June, because this is how the company rolls—allows administrators to delete unused resources, Janitor Monkey takes things one step further by allowing those instances to be automatically found so that Asgard can clean them up. Over the past year, Janitor Monkey has deleted more than 5,000 resources running in the Netflix production and test environments, the company said.

Janitor Monkey detects AWS instances, EBS volumes, EBS volume snapshots, and auto-scaling groups. Each of these resource types has distinctive rules for marking unused resources. For example, an EBS volume is marked as a cleanup candidate if it has not been attached to any instance for 30 days. Janitor Monkey determines whether a resource should be a cleanup candidate by applying a set of rules on it. If any of the rules determines that the resource is a cleanup candidate, Janitor Monkey marks the resource and schedules a time to clean it up.

The tool will also flag the admin (or any other email address attached to an instance or resource) ahead of time, in case the resource should persist. Typically, the alert is sent out two days before cleanup; a simple REST interface can be used to either approve or exclude the resource due for deletion. Janitor Monkey is scheduled to run on non-holiday weekends at 11 AM, but can be configured for other times.

Janitor Monkey events are logged in an Amazon SimpleDB table by default, Fu and Bennett wrote, which should be small enough to fall inside Amazon’s free pricing tier.

Janitor Monkey is part of the so-called Simian Army of at least eight internal management tools, including Latency Monkey, which introduces artificial delays into the system, and Chaos Monkey. Some (but not all) of these tools have been open-sourced.


Image: jeep2499/