VMware’s Serengeti Brings Hadoop to Virtual, Cloud Environments

by Nick Kolakowski Jun 13, 2012 3 min read

[caption id="attachment_1653" align="aligncenter" width="618" caption="VMware is positioning the open-source Serengeti as a "one click" deployment toolkit."]

VMware's Serengeti Brings Hadoop to Virtual, Cloud Environments

[/caption] VMware is heading for the Serengeti. Not the literal plains of Africa, of course: “Serengeti” is the company’s new open-source project for deploying Apache Hadoop in virtual and cloud environments. Hadoop is a framework for reliably running applications on large hardware clusters. Many large enterprises (such as Facebook and IBM) have come to rely on it as a vital part of their respective data-crunching infrastructures. Research firm IDC recently predicted that worldwide revenues from Hadoop and MapReduce, another framework for processing problems across huge datasets, could hit $812.8 million in 2016, a significant uptick from $77 million in revenues last year. Serengeti 0.5 is available as a free download under the Apache 2.0 license. It has been designed as distro-neutral, with support for Apache 1.0, CDH3, Hortonworks 1.0 and Greenplum HD 1.0. Future releases will support common Hadoop components such as Hbase, Sqoop, and Flume. Those interested in tinkering with the source code can visit Github. Architecturally speaking, Serengeti relies on SpringSource’s enterprise Java framework for its upper layer; Ruby programming language was used for the lower layer. According to a FAQ on the download site, “Serengeti extends fog to run on vSphere cloud, via RbVmomi, a Ruby interface to the vSphere VPI.” Other architectural elements include Spring Framework as the foundation for Serengeti Web Service, Tomcat for the Web server, Rabbit MQ for the message queue, Hibernate for data access, and PostgreSQL for data persistence. VMware has positioned Serengeti as a “one click” deployment toolkit that, when used in conjunction with its vSphere platform, can deploy an enterprise-level Hadoop cluster in a matter of minutes. The company claims that vSphere’s virtualization capabilities will boost the “availability and manageability” of Hadoop clusters. “Apache Hadoop has the potential to transform business by allowing enterprises to harness very large amounts of data for competitive advantage,” Jerry Chen, VMware’s vice president of Cloud and Application Services, wrote in a June 13 statement. “It represents one dimension of a sweeping change that is taking place in applications, and enterprises are looking for ways to incorporate these new technologies into their portfolios.” Of course, VMware isn’t the only company seeking to leverage the increased interest in Hadoop. Hortonworks, for example, recently unveiled Hortonworks Data Platform (HDP) 1.0, an open-source platform built on Apache Hadoop 1.0 that includes data management, monitoring, metadata and data-integration features; in a June 12 statement, Hortonworks CEO Rob Bearden claimed: “Unlike alternative Hadoop offerings, HDP is 100 percent open source with no proprietary code, eliminating vendor lock-in and expensive proprietary add-ons.” Earlier in June, vendors such as Datameer and Karmasphere also announced data-analytics platforms leveraging Hadoop. In addition to those midsize IT firms, tech giants such as Dell and IBM have integrated the framework into their own offerings. Hadoop seems to only be getting bigger. And every software producer out there wants a slice of the pie. Image: VMware

[caption id="attachment_1653" align="aligncenter" width="618" caption="VMware is positioning the open-source Serengeti as a "one click" deployment toolkit."]