Analysts insist that the amount of data stored by businesses will only increase over the next few years. Indeed, CIOs and IT administrators around the world are wrestling with a rising tide of emails, log and transaction records, video and audio files, and application data.
Much of this content is unstructured and thus largely unsearchable, leading pundits to term it “dark data.” Melodramatic name aside, dark data really does present a challenge to businesses, which could theoretically mine it for additional insights.
Lucid Imagination’s LucidWorks Big Data, just released in beta, is an integrated development stack meant to help tackle the unstructured-data issue, by giving customers the ability to implement projects that down into massive amounts of data in more comprehensive ways. Those projects can be executed in the cloud, on-premises, or in a hybrid environment.
Lucid Imagination as a company relies heavily on Lucene/Solr, an open-source search technology, which it claims gives its product an advantage over rivals in the space. With open source, “corporations can modify and enhance the underlying technology to meet their exact specifications, unlike Vivisimo and other proprietary software companies,” Grant Ingersoll, chief scientist for Lucid Imagination, wrote in an email. “Traditional business intelligence companies will always have an inherent disadvantage when it comes to analysis of unstructured data due to the limitations of SQL.”
In addition, the company utilizes Apache Hadoop in order to search massive amounts of content quickly and thoroughly. “Hadoop also backstops HBase, which we use in a variety of ways, including accessing the results of our analytic calculations over time,” Ingersoll added. “Finally, we incorporate tools like Apache Mahout and Pig, which both use Hadoop, to provide large-scale machine learning (clustering, classification, recommendations, etc.) and to run analysis over the content.”
Apache Hadoop is a framework for reliably running applications on large hardware clusters; it is prized for its ability to scale from relatively few servers to thousands; companies that rely on it include Facebook, eBay, Hulu, IBM, Microsoft and Twitter. A recent report by research firm IDC suggested that worldwide revenues for Hadoop-MapReduce ecosystem software would rise from $77 million in 2011 to $812.8 million in 2016.
Analytics applications and infrastructure as a whole are also primed to grow as a percentage of business spending. “IT continues to spend and earmark money to B.I., despite constrained budgetary environments,” Dan Sommer, principal analyst at research firm Gartner, wrote in an April 2 note. “Gartner’s CIO survey showed that analytics and B.I. is the No. 1 technology priority for CIOs in 2012.”
If that means an increased need for mining unstructured data, expect more firms to launch endeavors similar to Lucid Imagination’s in the months and years ahead.
Image: Lucid Imagination