Apache Hadoop has proven useful for many companies seeking to wrangle massive amounts of data, but most platforms lack much beyond the actual framework. Businesses still need to audit, manage, and—if disaster strikes—recover all that data and insights with a hodgepodge of tools from various IT vendors.
That’s an issue that Cloudera is trying to solve, to its profit: the company’s Cloudera Enterprise platform now features Cloudera Navigator and Cloudera Enterprise BDR (Backup and Disaster Recovery) tools, along with an updated version of its Cloudera Manager management interface—all designed to help clients meet their compliance and management requirements when it comes to data.
“Although Hadoop is a strong platform, its promise is inhibited by its sparse and immature tooling,” Philip Russom, research director for data management at The Data Warehousing Institute (TDWI), wrote in a statement released by Cloudera. “Its tooling must progress substantially to enable the core functionality enterprise buyers require in information platforms, namely, functions for data security, access control, usage auditing, backup, disaster recovery, notification of job failures, software updates and upgrades, and much more.”
Cloudera Navigator 1.0 (a complementary application to Cloudera Manager) includes tools for indexing and storing a full history of HDFS, Hive and HBase data access, essential to administrators and others with a need to audit processes and data. Cloudera Enterprise BDR offers centralized backup and disaster-recovery capabilities, allowing businesses to meet Service Level Agreements (SLA) and Recovery Time Objectives (RTOs).
The version 4.5 update to Cloudera Manager extends the platform’s management tools; administrators can perform rolling upgrades, better manage heterogeneous clusters, and integrate other IT management tools.
Research firms have attributed much of the rising interest in Hadoop to the cost savings and flexibility inherent in open-source software. In turn, Hadoop has birthed platforms like HBase, a non-relational distributed database modeled on Google BigTable and run atop HDFS. As Hadoop and its related systems evolve, there’s a continuing reduction in the amount of time needed to speedily process large amounts of data. But as the size of the datasets increases, along with the pressure on analysts and executives to deliver actionable insights from that data, the need for efficient systems to manage and audit all those systems and platforms becomes all too apparent.