How do you create real value from Big Data? First, get some high-quality Big Data that is held in an appropriate data store. Then, add (your) deep domain knowledge and, lastly, apply the right reporting and analysis tool. It is the proper application of all three elements that creates value. Conversely, getting any one of them wrong impairs your ability to unlock value from Big Data. Let’s look at each of these three value creators.
Start with the Right Big Data Store
I’ve written before that looking beyond the Big Data myths is necessary to choose the right data store that suits your business and application needs. Too many projects begin with the wrong technology, which delays or prevents success. So, matching your business problem (or opportunity) with the right technology is an important first step.
Big Data stores fit into one of several categories: Hadoop (which is a software framework that includes a Big Table clone called HBase), NoSQL (which is subdivided into several more categories, including Key Value Stores, Document Databases, Graph Databases, Big Table Structures, and Caching Data Stores), and Analytical Databases (e.g. Infobright, VectorWise, Vertica, Netezza, etc.). Each of these technologies has particular strengths and weaknesses, a summary of which should be clear to any project leader before launch.
Add Deep Domain Knowledge
Domain knowledge is the human intelligence that accumulates within a certain practice or process. A “domain” in this sense could be a functional application area (like CRM or Supply-Chain), a vertical industry (like financial services, pharmaceuticals, or energy/utilities), or a specific process (like after-sale support). Domain expertise is necessary to genuinely know which data, from all the possible sources, are valuable and which are not. Without the right domain knowledge, much time and effort is wasted in applying the right data in the right manner to solve the most important problem(s). Domain knowledge is the primary reason the Big Data opportunity requires business unit personnel to lead rather than follow more than ever before.
My favorite example of Big Data and domain knowledge at work comes from Eastern Europe, the site of a small (12-person) financial data services firm. Last year, I spoke with the director of this firm, who talked about how the company was applying its deep financial services knowledge, mathematical skills, and tech-savviness to create big value from Big Data. By paying minimally for access to huge amounts of intra-day trading information (equity exchanges, commodity exchanges, etc.) and applying sophisticated, low-latency algorithms created within a cloud-running, open source business intelligence tool, this small firm generates and packages new insight about market movements and performance that it then sells back to major investment firms in New York, London, Frankfurt, etc. The company is brilliantly turning raw data into very rich information—creating big value out of Big Data with incredibly low operating costs.
Apply the Right Reporting & Analysis Tool
Many modern tools exist to build reports and analyses to uncover new Big Data insight. Like most Big Data software, some of the available reporting and analysis tools are open source, which further the affordability (and invite more experimentation) with any solution. In any case, choosing the right reporting and analysis tool that enables the right overall big data approach (or architecture) is perhaps the most important step.
Fundamentally, a capable Big Data reporting and analysis tool should deliver a modern, powerful experience across three key attributes:
a. Intelligent access to all necessary data sources, Big and traditional
Real insight and value will come from combining Big Data and traditional data types, an intersection where, more and more commonly, new discoveries will be borne. So, using a reporting and analysis tool that works adeptly (better still, identically) with both is preferred.
b. Agile, modern architecture to scale out when necessary
Big Data projects often start modestly, with the variety and velocity of the data more important to project success than the volume. In this regard, terming it “Big Data” is often a misnomer. In fact, according to our recent Big Data survey, which garnered more than 600 responses from Jaspersoft’s worldwide open source community, 63 percent of respondents said the estimated daily volume of their projects was gigabytes, not terabytes or petabytes. That said, with project success comes an appetite for growing data volumes and more uses. Therefore, choosing an overall architecture that includes a reporting and analysis tool that can scale out, ideally using cloud computing or cloud-like architectural techniques that deliver continuously efficient and affordable resources to store, process and analyze data, becomes more important at each step. Citing data from Jaspersoft’s Big Data survey, respondents use the Cloud as their deployment environment 40 percent of the time, proving the early value of this efficient computing infrastructure.
c. The right latency-based approach to unlock the value in Big Data
Harnessing the new business opportunity presented by Big Data is most often about understanding how quickly it must be used (to improve the business) before it becomes stale. In this sense, faster (or low-latency) approaches are NOT always better; it completely depends on the business need. In many cases, a higher-latency, batch-oriented approach will deliver the right time-to-insight and be easier to manage. Toward the goal of implementing the best Big Data architecture, I’ve previously written about matching the Big Data job to the Big Data solution. Like any data analysis project, by understanding the business use for the data, the key requirements of the users, and the latency acceptable with the necessary data, an ideal solution can be created.
Big Data Everywhere
It’s not just the typical industries and companies that are creating high volume-velocity-variety data: it’s nearly everyone. Again citing data from Jaspersoft’s Big Data survey, two-thirds of the respondents were from seven different industries, from Computers & Software to Government and Healthcare. And even if a company doesn’t create it, putting some Big Data to work for you can be the best next step to build your business. To illustrate the breadth of power of Big Data, an example can be helpful.
Recently, Wired Magazine authored a story about Climate Corporation, a 6-year-old crop insurance company based in San Francisco that is using its access to Big Data, its deep domain knowledge and some simple analyses to bring a 21st century approach to its business. Climate Corp has built a vast network of sensors, enabling it to analyze and predict temperature, precipitation, soil moisture, and yields for 20 million U.S. farm fields.
Its expertise enables it to make accurate predictions with hyperlocal precision. The company has every incentive to get those local predictions right: Payouts are automatic based on whether key factors (temperature and soil moisture, for example) reach a crop-harming threshold on a day-to-day, farm-to-farm basis.
Climate Corporation offers compelling testimony to the powerful use of Big Data, its own domain expertise, and some straightforward analyses (that are built based on its domain knowledge). Deriving new value from this model then becomes an ingrained part of its business plan.
Today, we look at Big Data as just that—data. Tomorrow, we should see it as Big Value. There are many lessons to learn because our use of very large data sets is still so new. Hopefully this simple, three-step approach will help you determine if you have at least the foundation to start creating Big Value from Big Data. Your thoughts and comments are appreciated.
Brian Gentile is the CEO of Jaspersoft, a provider of business intelligence software. Brian joined Jaspersoft as its first independent Board member in 2005 and then as CEO in 2007. Prior to Jaspersoft, Brian was Executive Vice President and Chief Marketing Officer at Informatica Corporation. He will be speaking at JasperWorld September 24 to 26th on “The New Factors of Production and the Rise of Data-Driven Applications.”