Splunk and Cloudera Alliance Hints at New Big Data Landscape

The Splunk Enterprise platform digests machine data from a variety of sources, including mobile devices and Websites.

Splunk and Cloudera are joining in a self-described “strategic alliance” that will see their respective enterprise platforms linked via the Splunk Hadoop Connect tool.

The Splunk Enterprise platform digests machine data—gathered from Websites, mobile devices, servers, and lots of other infrastructure—and spits out insights into how those systems are performing. In theory, users can rely on the platform to monitor end-to-end infrastructure, or even customer behavior. It features the aforementioned Splunk Hadoop Connect, billed as a way to “easily and reliably move data” between Splunk Enterprise and the Apache Hadoop framework.

Apache Hadoop has grown in popularity as a way for companies to crunch massive amounts of unstructured data stored on large hardware clusters. Right now, it seems as if every IT vendor in the industry wants to profit from the framework in some way: over the past few months, companies ranging from EMC and Intel to Hortonworks and Cloudera have all issued Hadoop-related offerings.

Under the terms of the alliance, Splunk Hadoop Connect will link Splunk Enterprise to Cloudera Enterprise, Cloudera’s Hadoop distribution (and associated projects). Which is certainly good for Cloudera: any number of companies have released Hadoop distributions over the past couple months, crowding the marketplace, and making it all the more vital for individual firms to sign “alliances” and other contracts for their respective offerings.

Such alliances could also prove vital for smaller firms seeking to hold the line, as it were, against IT giants such as IBM and SAP. The latter, of course, have untold millions of dollars and large amounts of other resources to deploy in the search for data-analytics customers; faced with that sort of competition, startups and midsize companies need to consider how partnerships can amplify the reach of their analytics products.

Cloudera recently announced that Impala, its open-source SQL query engine for data stored in HDFS (Hadoop Distributed File System) and HBase, had reached general availability. Cloudera released Impala as a public beta offering in October 2012 and spent subsequent months refining the platform for better real-world performance; it then packaged the software as the engine of Cloudera Enterprise RTQ. It’s not alone in trying to speed up analytics—over the next several quarters, expect other companies large and small to debut software that more speedily crunches structured and unstructured data.


Image: Splunk