Amazon Web Services has updated its Elastic MapReduce (EMR) with access to not only Apache Hadoop 2.2.0, but also new versions of Hive, Pig, HBase, and Mahout. That expands the e-retailing giant’s analytics, data-warehousing, and Infrastructure-as-a-Service (IaaS) options.
Hive is a data-warehousing system for Hadoop, recently upgraded to 0.11.0.1 with new features such as support for Optimized Row Columnar (ORC) file format; the updated version of the Pig analytics platform features new operators, functions, and types; HBase is a NoSQL database for Hadoop, while Mahout is a machine-learning library for the same format.
Hadoop 2.2.0, the latest version of the popular open-source framework for crunching enormous amounts of data, comes with a variety of improvements over version one, including YARN (a general-purpose resource management system that makes it easier to work with MapReduce and other frameworks; the name stands for “Yet Another Resource Negotiator”), support for Windows, boosted integration with other open-source projects, binary compatibility for MapReduce applications built on Hadoop 1.x, and high availability for the Hadoop Distributed File System (HDFS).
Hadoop 2.2.0 also features support for NFSv3 access to data stored in HDFS, as well as Federation and Snapshots. The Apache Software Foundation, which hosts the software, is asking anyone who uses Hadoop to upgrade as soon as possible to 2.2.0, as it features significantly more stability while remaining compatible with existing APIs and protocols.
Given Hadoop’s popularity, it’s no surprise that Amazon would update its systems to support this latest version as soon as possible. In addition to giant firms such as Intel, a variety of startups such as Hortonworks and Pentaho are all pushing their own Hadoop distributions or supporting software; the biggest question now is whether, given the framework’s popularity, these IT firms can adopt their Hadoop-related products so that a broader subset of workers can actually work with the software—new applications such as Splunk’s Hunk suggests the industry is indeed moving in that general direction.
Image: Amazon Web Services