Splunk and the Challenge of Easy-to-Use Hadoop

That Apache Hadoop is enjoying something of a heyday is beyond question at this point: from Intel down to the tiniest startups, it seems as if most IT firms have either a Hadoop distribution or supporting software in the works.

But Hadoop, despite the improvements in its latest version (2.2.0 just reached its general-release milestone), faces criticism that it’s still too complex, and that most workers simply don’t need that sort of firepower to crunch whatever unstructured data exists in their daily life—after all, the framework was built for running data applications on large and dispersed hardware clusters, something beyond the purview of most businesses.

Along with a handful of other firms, Splunk is working to make Hadoop more accessible to “regular people” (whatever that means): its new platform, Hunk, allows users to analyze and visualize historical data in Hadoop, sparing an organization from having to learn new analytics and programming skillsets (at least in theory). The focus is also on speed, including the ability to preview results while jobs are in process, and a degree of query flexibility that allows workers to begin a deep dive into a dataset without fully knowing exactly which data they’ll need.

The Issue with Hadoop

Tools that speed and visualize Hadoop-related analytics are good things, but are people too focused on Hadoop as a “must have” tool, while ignoring the limits of its capabilities? Some industry people think so. “The problem is that Hadoop is a technology, and big data isn’t about technology. Big data is about business needs,” Facebook analytics executive Ken Rudin told the audience at the recent Strata + Hadoop World conference in New York City, according to PC World. “In reality, big data should include Hadoop and relational [databases] and any other technology that is suitable for the task at hand.”

In a bid to more effectively handle its mountains of user data, Facebook has built custom tools such as Corona, a scheduling framework with a built-in cluster manager for tracking nodes in clusters and the free resources in play across the network. But when it comes to generating solid insights from all that data, Facebook still relies on old-fashioned people power: not only does it hire PhDs with extensive background in analytics (and good business savvy), it also runs “camps” where employees are taught the intricacies of data analysis.

While more companies are embracing analytics as a way to improve their business processes, many haven’t yet committed the resources to training their employees in data tools and processes. That could change as analytics tools become even more ubiquitous and easy to use (as Splunk is attempting); until that point, though, a gap may still exist between what IT firms promise their shiny software will do, and what businesses actually manage to accomplish with it. But that likely won’t dampen any of the hype over “Big Data.”


Image: Splunk