Is Hadoop THE Answer for Big Data?

Are companies and Big Data professionals who embrace Hadoop putting themselves at risk by being at the mercy of Google?

Hadoop ElephantIn March 2013, after about four years, the U.S. Patent and Trade Office granted Google 10 foundational patents related to MapReduce, a programming model for processing large datasets with a parallel distributed algorithm on a cluster of computers. MapReduce is also the basis for the Hadoop framework.

Rather than transfer, assign, or license these foundational patents to the Apache Software Foundation – as it is the Apache Software Foundation that licenses the Hadoop software — Google decided to offer what it calls an Open Patent Non-Assertion (OPN) Pledge.

What does this mean?  What Google is saying is that it will not take legal action against users or developers who use these 10 MapReduce patents. But there are several caveats that Google points out, and a pledge by Google is not exactly binding in law.

The Impact

What does that mean for those who use and develop Hadoop for Big Data? One result is that many people are starting to look at alternatives where the licensing is more clear and they don’t have to rely upon a non-binding pledge from Google that it won’t sue them.

Are they being alarmist? Should we rely on Google’s good faith? The pledge has about the same weight and value as other pledges made by Google over the years. In other words, Google will continue to do what Google thinks is in its self-interest — not necessarily worrying about the interests of the Open Source community.

At any time, Google could flex its muscles and take legal action against any company that uses Hadoop and the MapReduce patents. And if a company starts to rely upon Hadoop to process Big Data, they could be putting their business at risk.

Currently, Hadoop is the most popular framework for performing Big Data processing – and the one generating most of today’s buzz. But it’s not the only solution out there. Other proprietary and open source solutions don’t use MapReduce for Big Data processing. Here are a few examples that are gaining popularity:

  • SAP HANA: An in-memory data platform for performing real-time analytics, and developing and deploying real-time applications.
  • Storm: A distributed and fault-tolerant real-time computation system. Similar to the set of general primitives Hadoop provides for doing batch processing.
  • Spark: From UC Berkeley’s AMPLab, Spark is an in-memory parallel processing framework that’s comparable to Hadoop MapReduce, except it is up to 100 times faster.

While these alternatives are at different levels of software maturity (and cost), none come with the specter of changes in Google’s approach to its MapReduce patents. With that, many companies who are serious about using Big Data as a core of their business are starting to look at them.

So, is Hadoop THE Answer to Big Data processing?  Apparently not.

5 Responses to “Is Hadoop THE Answer for Big Data?”

  1. Joe Rounceville

    I’m not sure I agree with the author’s assessment. If Google has made a pledge in a public manner like this, your company DEPENDS on this pledge, and then Google changes its mind and starts to sue, I’m pretty sure that you’d have a good case before a judge. In effect Google has created a “contract of adhesion”, much like an insurance policy, and disputes in contracts of adhesion (as opposed to negotiated contracts) generally are weighted toward the party depending on the language of the contract.

    In any event, I wouldn’t avoid Hadoop based on this article — I’d check with an IP lawyer before freaking out. Hadoop/MapReduce are important and groundbreaking technologies.

  2. kevin cain

    The author appears to be confused about Hadoop and Google MapReduce. MapReduce is not the basis for Hadoop, MapReduce is Google’s implementation of Hadoop. Hadoop is open source and if a company is building its data processing schemes using Hadoop, it has nothing to do with Google’s patents as related to its proprietary implementation which is MapReduce.