Genie features a complete set of REST-ful services for job management within the Hadoop ecosystem; it is built atop a variety of Netflix OSS components including Archaius (billed as “dynamic property management in the cloud”) and Karyon (diagnostics, runtime insights, and more). Getting started requires the administrator or IT pro to register a Hadoop cluster with it—i.e., spin up a cluster, upload the appropriate Hadoop and Hive configurations, use the Genie client to discover a Genie instance, make a REST-ful call, etc. (Administrators can deploy Genie into a servlet container such as Apache Tomcat, after cloning from Github and setup.)
Once the cluster has been registered, Genie can handle any number of Hadoop, Hive and Pig jobs; its unit of execution is a single job. It is not a task scheduler, unlike some other custom systems out there, nor an end-to-end resource management tool capable of launching or provisioning clusters.
“We think of Genie as a resource match-maker, since it matches a job to an appropriate cluster based on the job parameters and cluster properties,” read a June 21 note on The Netflix Tech Blog. “If there are multiple clusters that are candidates to run a job, Genie will currently choose a cluster at random. It is possible to plug in a custom load balancer to choose a cluster more optimally—however, such a load balancer is currently not available.”
Netflix uses Genie to run hundreds of concurrent Hadoop jobs, in conjunction with its Asgard cloud-management and deployment tool; total processing load is hundreds of terabytes of data. (Netflix stores its data on Amazon’s Simple Storage Service, also known as S3, and relies on other Amazon software to support its backend IT infrastructure.) Even though it’s been released to the open-source community, Netflix very much considers Genie a work in progress. “We think of the initial release as version 0,” the blog posting added.