Hadoop Analysis Needs More Skilled Workers: Survey

A new survey by analytics platform vendor Karmasphere suggests a pressing need for workers skilled in analyzing data. That echoes previous findings from analysts and consulting firms such as McKinsey & Company, which have predicted the demand for analytical talent in the U.S. is rapidly outpacing supply.

The Karmasphere survey, conducted from May 5-15, involved a fairly limited pool of 376 respondents, all of them categorized as “North American data professionals.” Around 60 percent of those respondents apparently felt the data analysts in their respective organizations lacked technical skills to analyze data on Hadoop, an open-source framework for reliably running distributed applications on large hardware clusters. Hadoop has become something of a standard for many smaller companies and larger enterprises, which rely on it as a vital part of their data-crunching operations.

Around 70 percent of respondents either “strongly agreed” or “agreed” with this survey query: “We need a self-service way to access Hadoop; grab raw, unstructured, detailed data; and then create ad hoc queries and find insights.” Another 17 percent remained neutral on that statement, 5 percent disagreed, 2 percent strongly disagreed, and 6 percent were “N/A.”

It must be noted that Karmasphere markets an eponymous Big Data platform that emphasizes self-service analysis and data distribution. Karmasphere 2.0 features include a Web-based social interface, which in theory allows workers to more easily collaborate over data. The platform is also compatible with any Hadoop distribution, and leverages Apache Hive, the Hadoop SQL standard.

A variety of other IT vendors, ranging from midsize ones such as Datameer to behemoths such as IBM, also offer tools that rely on Hadoop for data analysis. That’s a reflection of Hadoop’s rising popularity, as well as the growing recognition on the part of many businesses that their mountains of in-house data can be reliably mined for all manner of insight.

Other Karmasphere survey results included:

  • Some 52 percent of those surveyed “either have Hadoop in production or have a Hadoop cluster running.”
  • Around 22 percent suggested that marketing is the business department benefitting most from Big Data analysis on Hadoop, followed by engineering at 19 percent, and product management and operations with 14 percent each.
  • Around 67 percent “are being asked to incorporate new data types, such as Web logs and click streams into their analysis.” The top four types of data being analyzed with Hadoop were, in descending order, Web logs (51 percent), click streams (35 percent), product usage data (33 percent) and transactional data (33 percent).

Faced with all that data in need of analysis, analyst firms have begun advocating the development of tools more easily used by workers with a modicum of training. According to Forrester analyst Boris Evelson, for example, business workers should carry out roughly 80 percent of all B.I. requirements, while IT pros handle the other 20 percent.

“Forrester by no means advocates that firms transfer complex, mission-critical, enterprise-wide B.I. applications—especially those that carry external exposure or other operational risk—into the hands of non-IT professionals,” he wrote in a June 12 corporate blog posting. However, he added, placing a significant amount of app control in the hands of the average worker can help keep data analysis running on schedule; most IT pros already have more than enough to do.


Image: Karmasphere