MongoDB, Hadoop-Hive Top Jaspersoft Big Data Index

MongoDB topped the list of most popular data sources for storing, analyzing and visualizing Big Data in software vendor Jaspersoft’s Big Data Index, which tracks native connector downloads from JasperForge, the company’s open-source community Website.

MongoDB, a document-oriented NoSQL database system, has remained at the top of Jaspersoft’s downloads list since November 2011 (it was downloaded 725 times in May 2012). Hadoop-Hive, an SQL interface to Hadoop MapReduce, currently stands at a distant second (downloaded 229 times in May 2012), followed by Cassandra, a high-availability NoSQL database system (183 times in May), and then Hadoop-HBase, a distributed real-time Hadoop database (8 times).

Other popular Big Data connectors—including CouchDB, Neo4j, Redis and VoltDB—generally experienced far fewer downloads in May 2012 than MongoDB, Hadoop-Hive or Cassandra.

“Several NoSQL environments have become exceptionally popular for reporting and analyzing Big Data,” Brian Gentile, Jaspersoft’s CEO, wrote in a June 13 statement. “Companies now have the opportunity to learn more about their operations and performance by analyzing data streams that were too big or too complex to process even a couple years ago.”

NoSQL and Hadoop

Exact numbers of downloads aside, the Jaspersoft data underscores some trends at work with regard to NoSQL and Hadoop.

Some Internet companies are migrating from the relational database approach utilized by SQL database systems, in favor of NoSQL databases that take a non-relational approach; common ways of organizing NoSQL databases are document-based (which allows the user to store entire structures of data, often modeled as JavaScript Object Notation) and key-value based (wherein each record features a key and a value).

Meanwhile, increased interest in Hadoop, a framework for reliably running applications on large hardware clusters, has driven a range of IT vendors, from tiny startups to tech giants such as IBM, to release products leveraging the framework. In June alone, companies including Datameer, Karmasphere and Hortonworks all unveiled platforms that rely on Hadoop to manage and analyze data; VMware announced “Serengeti,” an open-source project for deploying Apache Hadoop in virtual and cloud environments.

Companies prize Hadoop for its ability to scale from a relatively small server cluster to thousands of machines. Research firm IDC recently predicted that worldwide revenues from Hadoop and MapReduce, another framework for data-crunching, would rise to $812.8 million in 2016, as companies sought tools and platforms to wrestle with a rising tide of data from Web-based applications and social networks.

Whether that rise in revenue is due to companies leaping on a Hadoop bandwagon, or because the framework proves a viable long-term solution to data-analytics issues confronting businesses, remains to be seen. In any case, when it comes to NoSQL and Hadoop deployments, IT pros have a lot of tools to choose from.


Image: Jaspersoft