Main image of article Project Savanna Brings Together OpenStack and Apache Hadoop
When it comes to open-source technologies, it’s hard to deny that OpenStack and Apache Hadoop have a lot of buzz behind them. Now, an ambitious effort named Project Savanna is attempting to bring the two together. Whether that proves the data-infrastructure equivalent of apple pie and vanilla ice cream—two great tastes that go well together—or an unworkable match along the lines of, oh, pancakes and pickles is something that only developers can determine. OpenStack is an open-source Infrastructure-as-a-Service (IaaS) platform developed by Rackspace as part of a joint effort with NASA. The technology has gained increasing favor among companies looking to build out cloud services and architecture; for example, IBM announced in March that it would use OpenStack to build out its cloud portfolio, while Dell has indicated that its OpenStack-based public cloud would go live in the fourth quarter. Apache Hadoop, meanwhile, has become increasingly popular as a way for companies to crunch massive amounts of data stored on large hardware clusters. In the first few months of 2013, firms ranging from EMC and Intel to Hortonworks and Cloudera have all issued Hadoop-related products of some sort. At the end of March, MapR technologies and Canonical announced a partnership to bring Hadoop to Ubuntu, the popular Linux-based operating system. Hortonworks, Red Hat, and Mirantis are among the groups contributing to Project Savanna, originally begun by Mirantis as an OpenStack project. “Savanna aims to provide users with simple means to provision a Hadoop cluster by specifying several parameters like Hadoop version, cluster topology, nodes hardware details and a few more,” read an introductory note on OpenStack.org. “After user fills in all the parameters, Savanna deploys the cluster in a few minutes.” In addition, the platform “provides means to scale already provisioned cluster by adding/removing worker nodes on demand.” In theory, that will serve developers and companies that want to quickly provision Hadoop clusters, which can help for massive analytic workloads that need to be done at speed. Designed as an OpenStack component, Savanna can be managed via REST API from the OpenStack dashboard, and will communicate with other OpenStack components including Horizon, Keystone, Nova, Glance, and Swift. “Savanna will provide two [levels] of abstraction for API and UI based on the address use cases: cluster provisioning and analytics as a service,” OpenStack’s note continued, before breaking down both of those processes under a section entitled General Workflow. Virtualizing Hadoop for faster and easier deployment on cloud-based infrastructure is clearly something the community wants. Project Savanna will face off against VMware’s Serengeti, which allows users to deploy Hadoop on virtualized infrastructure.   Image: OpenStack.org