IBM Doubling Down on Big Data with Hadoop, In-Memory Offerings

IBM is betting it can leverage its name to become a big player in the Apache Hadoop market.

Big Blue claims its PureData System for Hadoop can speed the time necessary to deploy the open-source framework, which has become increasingly popular over the past several quarters as a way for companies to crunch massive amounts of data stored on large hardware clusters. IBM has integrated its InfoSphere BigInsights onto the platform, resulting in a variety of tools for monitoring and integration, as well as analytics and visualization. PureData System for Hadoop will ship in the second half of the year.

“Big data is about using all data in context at the point of impact,” Bob Picciano, general manager of IBM Information Management, wrote in a statement. “With the innovations we are delivering, now every organization can realize value quickly by leveraging existing skills as well as adopt new capabilities for speed and exploration to improve business outcomes.”

IBM is also doubling down on Big Data with a new platform called BLU Acceleration, which it claims will boost the in-memory performance of datasets. The software can skip over data that doesn’t need to be analyzed (including duplicate data) and analyze data in parallel across different processors. That innovation is likely to make SAP, which is betting very heavily on its HANA in-memory technology as a competitive differentiator, a little bit nervous.

In fact, IBM is big enough to make all its competitors a little nervous whenever it starts investing heavily in a particular space. There are only so many clients to go around, increasing the likelihood that at least one major IT vendor will fail in its efforts to make a dent in the Big Data market.

Of all the Big Data offshoots, Hadoop is one of the most popular, at least based on the number of IT vendors that have decided to embrace it. At the end of March, MapR technologies and Canonical announced a partnership to bring Hadoop to Ubuntu, the popular Linux-based operating system. A few days later, data-security firm Dataguise announced DG for Hadoop 4.3, a system for masking and encrypting sensitive data within major Hadoop distributions. In the first few months of 2013, firms ranging from EMC and Intel to Hortonworks and Cloudera have all issued Hadoop-related products of some sort—raising the question of whether a “Hadoop bubble” of sorts is starting to inflate.

All these companies are rushing into the space, of course, because there’s money to be made. Analyst-firm estimates for the Hadoop market range from a few hundred million to somewhere in the billions of dollars over the next few years—but with one crucial caveat. Because Apache Hadoop is fundamentally open-source, companies don’t necessarily have to opt for proprietary products when it comes to using the framework; and that could drag down revenues for many of the companies offering the latter.

“The Hadoop and MapReduce market will likely develop along the lines established by the development of the Linux ecosystem,” Dan Vesset, vice president of Business Analytics Solutions for IDC, wrote in a statement last May. “Over the next decade, much of the revenue will be accrued by hardware, applications, and application development and deployment software vendors.”

IBM and other big IT vendors are betting their longstanding presence in the industry—along with their reputation for support—will drive customers into their camp when it comes to Hadoop, rather than plunge into the thickets of unstructured-data analysis on their own.

 

Image: Sergey Nivens/Shutterstock.com