Hewlett-Packard and data-analytics company Cloudera will release a platform in the fourth quarter of 2012 that bundles Cloudera’s enterprise software with the HP AppSystem for Apache Hadoop. That’s in addition to HP agreeing to resell the Cloudera Enterprise platform (along with future Cloudera products) to clients.
Hadoop, an open-source framework for reliably running distributed applications on large hardware clusters (and symbolized by the cute elephant at left), has become a go-to technology for many companies, from tiny startups to large enterprises such as IBM and Facebook. Research firm IDC recently estimated that worldwide revenues for Hadoop-MapReduce ecosystem software will rise from $77 million in 2011 to $812.8 million in 2016.
“HP and Cloudera share the common goal of simplifying big data processing and analytics on Hadoop for businesses,” Tim Stevens, vice president of Business and Corporate Development for Cloudera, wrote in a July 26 statement. “With HP reselling the Cloudera enterprise platform, together we provide end users with a comprehensive big data analytics solution for data integration, analysis and visualization, built natively on open source Apache Hadoop.”
Cloudera recently unveiled the fourth iteration of its Hadoop distribution and management platform. Cloudera Enterprise 4 features two components: Cloudera Distribution for Hadoop V4 (CDH4) and Cloudera Manager 4.
CDH4 is an Apache-based open-source Hadoop stack combining the Hadoop Distributed File System (HDFS), the MapReduce Big Data programming construct, and overarching database HBase as well as nine other Apache-based tools.
Cloudera Manager 4 combines a variety of deployment tools for streamlining Hadoop cluster rollouts, as well as management tools for monitoring and reporting on failures in a Hadoop cluster.
Cloudera Enterprise 4 also offers features such as a high-availability option for the NameNode “master” Hadoop controller, which eliminates the need to use special RAID or HA hardware in the NameNode system. In addition, it includes HBase extensions that allow applications to run in real time even as data is being fetched, instead of waiting for tables to load completely before processing.