The mix of “new world” and “old world” data has left us with a strange feeling of allure and apprehension about approaching Big Data. Each world comes with its own set of challenges and diverse sets of analytic tools designed to mine insight from their depths. There is no single platform built to handle all of the different kinds of workloads across structured and unstructured, streaming and static data.
The Big Data challenge is more than a data integration challenge; it’s an analytic integration challenge. Divergent platforms tend to exist in silos, each one doing heavy lifting in a specific niche, mostly unaware of analytics outside of a walled-in data ecosystem.
Companies who want to gain the most from Big Data and diverse data need to be able to combine analytic results in a simple, quick, and dynamic way.
One Size Does Not Fit All
In spite of recent claims by both traditional database vendors and emerging Big Data platform vendors that they will respectively rule the world someday, neither are capable of handling complex analytics without significant amounts of effort. Row-wise databases require the creation of indexes, materialized views, and extreme workload management to be able to handle simple analytics. Big Data platforms require a major programming effort to support complex analytics.
When you bring in the element of change, the administrative overhead makes it even more difficult to utilize these platforms in dynamic analytic environments. “Mean time to change” and “mean time to results” prevent organizations for responding quickly in time-sensitive matters such as risk and target marketing.
When mortgage-backed securities imploded in 2009, most investment banks were using older technology and couldn’t move quick enough to assess the risk. Their legacy data warehouses weren’t agile enough and couldn’t hold enough data to allow risk managers to drill down into their mortgage portfolios and see individual risk profiles. Most of the affected banks didn’t even know what hit them until after the fact. An analytic platform alongside their data warehouse would have given them access to more detailed portfolio risk data. In addition, a Big Data platform might have captured sentiment that could have warned of the collapse.
Enter the new analytic platforms, designed from the ground up to run complex analytics quickly on massive amounts of data at any scale. Is the analytic platform the answer to Big Data? With buzzwords like “columnar,” “compression,” and “massively parallel processing,” you’d think they were here to save the day.
And they do save the day when it comes to complex analytics, but most require add-ons or alterations to support operational reporting, text analytics, transformations, and big data filtering.
Works Well Together
Architecting systems to collaborate on the different kinds of analytics required in a Big Data environment is a much better approach. Traditional data warehouse technology tends to be very good at handling operational reporting and supporting dashboards. Big Data platforms excel at filtering, transformation, text analytics, and batch analytics. Analytic platforms speed through complex analytics and support the kind of rapid iteration required by data scientists and business analysts. Technology leaders are already making a move toward collaborative computing, where workloads gravitate to the best-suited platform.
Collaboration extends the capability of analytic platforms with both new world and old world analytic engines across all kinds of data. Newest analytic platforms support both data and analytic integration. On-demand integration gives users the freedom to bring in structured, semi-structured, unstructured, and streaming data. Analytic integration enables them to call third-party analytic engines to run analytics and return the results.
For example, the analyst can bring in the most recent data from an operational data store and call Hadoop to run extensive sentiment analysis; at this point they are using an analytic platform to recalculate segmentations and serve up the next best offer for a top customer or prospect. When this kind of collaboration is possible without additional administration, it yields immediate response times and speeds time to value for enterprise analytics.
The real win for collaborative computing environments comes in the realm of value creation. Each of the different platforms creates a different kind of value for organizations. The data warehouse continues to provide operational value. The analytic platform provides optimization value. And the Big Data platform provides informational value. In addition, the combination of analytics across the platforms yields even more value than any one by itself.
The impact of bringing analytics together in a collaborative environment has the potential to transform companies into leaders in their industry. Financial services companies are already using multifaceted analysis to manage enterprise risk and conduct context-aware customer interactions. Retailers are increasing their merchandising accuracy and their ability to deliver just-in-time products to market. But perhaps the most fascinating explosion of analytics is taking place in the world of digital media, where companies are using “next best action” engines to generate ads and offers or automating decisions with complex algorithms.
John Santaferraro is the Vice President of Solutions at ParAccel. With 16 years of experience in business intelligence and analytics, John has co-founded a data warehouse startup company and held executive business intelligence marketing positions in top tech companies like Tandem, Compaq, and HP.