Hortonworks Data Platform 2.0 Plays with YARN

Hortonworks has released the beta of its Hortonworks Data Platform 2.0, which it bills as the first Hadoop platform to include the final beta version of Apache Hadoop YARN (in addition to blending in other Apache analytics software such as Hive, HBase, and others).

Apache Hadoop, an open-source framework for running data applications on large hardware clusters, has become a favorite of companies and developers wrestling with enormous amounts of data. MapReduce implementations of Hadoop are the backbone of many a company’s data infrastructure, although firms with skyrocketing information requirements—such as Facebook—have escalated to building their own customized frameworks in some cases.

With YARN, the Apache Software Foundation aimed to overhaul MapReduce. MapReduce 2.0 (MRv2) splits the resource management and job-scheduling/monitoring functions of the central JobTracker. “The idea is to have a global ResourceManager (RM) and per-application ApplicationManager (AM),” read the foundation’s note on the new version. “An application is either a single job in the classical sense of Map-Reduce jobs or a DAG of jobs.” The ResourceManager and NodeManager (the per-node slave) constitute the actual data-computational framework, with ResourceManager directing resources throughout the applications within the ecosystem.

The ResourceManager features two main components, the Scheduler and ApplicationsManager; the former is tasked with allocating resources to running applications (within constraints such as queues, of course), while the ApplicationsManager governs job submissions. The Apache Software Foundation offers a helpful map of how all the MapReduce 2.0 components interact:

In theory, YARN can allow firms with data-analytics needs to better handle search, graph processing, and other intensive functions.

Hortonworks argues that the latest version of its Hortonworks Data Platform combines YARN with other Apache products into a more seamless, integrated whole. It’s also capable of scaling into the petabyte range, in keeping with the improvements outlined in Hadoop’s Stinger Initiative. YARN, Hive, and other components can be automatically installed and configured.

While all that sounds very good, however, Hortonworks faces significant competition in the analytics space from pretty much every enterprise tech vendor with an interest in data; over the past several months, the sheer number of Hadoop-related products hitting the market has made it increasingly difficult for new ones to stand out.


Images: Shebeko/Shutterstock.com, Apache Software Foundation