Apache Hadoop is an open-source framework for crunching gargantuan amounts of data stored on large hardware clusters. It’s popular among firms large and small; research firm IDC suggested last year that the Hadoop market could hit $812.8 million by 2016.
Under the terms of the agreement, MapR Technologies (which offers its own Hadoop distribution) and Canonical will offer the MapR M3 Edition for Apache Hadoop with Ubuntu. In addition, MapR has made the source code for the component packages of the MapR Distribution for Apache Hadoop available via GitHub.
Canonical will bundle MapR M3 with Ubuntu 12.04 LTS and 12.10 through the Ubuntu Partner Archive. In addition, the MapR M5 and M7 editions for Apache Hadoop, each with their own particular features, will end up certified for Ubuntu at some at-yet-undefined point.
MapR and Canonical are also working on a Juju Charm that will facilitate the deployment of MapR M3 on private and public clouds standardized on OpenStack, an open-source Infrastructure-as-a-Service (IaaS) platform developed as part of a joint effort by Rackspace and NASA back in 2010.
This isn’t the first time that MapR has paired with another company on a Hadoop project. Back in August 2012, it joined forces with Nimbula to produce a turnkey solution for deploying a Hadoop cluster on the Nimbula Director private-cloud platform; Nimbula claimed that the result could deploy a Hadoop cluster in less than two minutes, launch those clusters into the cloud without the need to requisition hardware, and share infrastructure between Hadoop and non-Hadoop workloads.
The Rise of a Framework
Over the past several quarters, a sizable percentage of companies with a lot of data to analyze have embraced Hadoop as the answer to many of their problems. In a recent survey conducted by Dimensional Research and sponsored by RainStor, 24 percent of respondents indicated they had a Hadoop project in production, while 19 percent indicated they managed more than 500 TB of data with Hadoop.
But Hadoop also offers some significant challenges. In that same survey, around 37 percent of respondents complained that Hadoop wasn’t “real time,” while another 26 percent were concerned about the time needed to put their Hadoop platform into production. Nearly as many—25 percent—indicated manual coding as a challenge, while 18 percent thought that the cost of training and services was a significant hurdle.
Hadoop’s popularity has led a number of enterprise IT companies, from giants like EMC to smaller startups, to build up their respective Hadoop offerings. That’s the main reason why Hadoop-related revenues are expected to increase over the next few years; at the same time, however, the framework’s open-source roots may slow those earnings, as various client firms opt for “free software” solutions over proprietary ones.
There’s also the possibility, however remote, of a “Hadoop bubble,” in which proprietary Hadoop solutions oversaturate the market, leading to the inevitable pullback and the collapse of more than a few startups.