Managing huge amounts of data can prove an expensive proposition, particularly with regard to storage and infrastructure costs. Moreover, analysts (or whomever else needs to crunch the numbers) often need to transfer the data from dedicated storage to whatever platform is performing the computations, which can create bottlenecks depending on network configuration.
Cleversafe has an idea about how to solve some of those issues: the newly announced Dispersed Compute Storage, once built, will merge Apache Hadoop—an open-source framework for reliably running distributed applications on large hardware clusters—with its Dispersed Storage Network (dsNet) system on the same platform, effectively combining storage with computation.
As part of the project, Cleversafe will swap out the Hadoop Distributed File System (HDFS), which relies on three copies of data for protection, with Informational Dispersal Algorithms—potentially reducing overhead and management costs, in the case of particularly large datasets. The company’s dsNet system can slice up data and disperse it across multiple nodes, eliminating single points of failure along with potential bottlenecks.
Lockheed Martin and Cleversafe are developing a federal version of Dispersed Compute Storage for government agencies. “The Federal community has been out in front of Big Data, well ahead of many other market segments, and needs technology solutions today that are well suited for Exabyte scale storage as well as massive computation,” Tom Gordon, CTO and vice president of Engineering for Lockheed Martin’s Information Systems and Global Solutions-National, wrote in a July 10 statement tied to the release.
If it works as intended, Cleversafe’s solution could bring the best of both worlds to Hadoop users: retaining (via its own technology) Hadoop’s fault tolerance, scalability and distribution abilities while reducing some of the overhead and bulkiness of HDFS.
Because of its scalability, Hadoop has emerged as a go-to technology for many companies, from tiny startups to large enterprises such as IBM and Microsoft. Research firm IDC recently estimated that worldwide revenues for Hadoop-MapReduce ecosystem software will rise from $77 million in 2011 to $812.8 million in 2016.
Hadoop’s growing popularity has led all manner of IT vendors to jump into the proverbial fray with proprietary data-analytics platforms that leverage the framework.