The Multi-Billion-Dollar Data Management Challenge

Back in the day, data only came on paper, and companies could place that paper in convenient files. Those days are long over.

Organizations collect tons of data, but most of it lacks real context. The data is difficult and costly to manage, and that lack of context makes a healthy percentage of it redundant.

That’s one of the reasons why Hewlett-Packard, IBM, Oracle, and other IT vendors are spending billions of dollars to acquire companies with technologies that tackle that very issue to one degree or another. Solving it would potentially earn any of those companies the loyalty of hundreds of organizations seeking to wrangle data into more manageable form.

In what’s considered one of the most controversial of those acquisitions, HP spent $10.3 billion last year to acquire Autonomy, whose products included an Intelligent Data Operating Layer (IDOL) that leverages enterprise search technology—creating what amounts to a metadata repository for the enterprise.

“One of the goals of IDOL engine is to unify data management,” said Paul Miller, vice president of converged application systems for the HP Enterprise Group. “That’s why there are already over 400 IDOL connectors to different data sources.”

IBM, meanwhile, responded in part by acquiring Vivisimo, a provider of federated data discovery and navigation tools, for an undisclosed amount of money. Both those moves came on the heels of Oracle’s decision last fall to acquire Endeca, a provider of management tools for unstructured data, for just over $1 billion.

Chasing the Next Big Thing

HP, IBM, Oracle and others are chasing data that has become fundamentally more distributed across relational, columnar and object-oriented databases and the emerging open source Hadoop framework. Rather than rely on a single SQL data warehouse built on relational databases, IT organizations are trying to correlate the relevance of that data as it becomes distributed across every nook and cranny of the enterprise.

“If you take the long view of IT, organizations are going to need to create some type of semantic capability across fluid sources of data,” said Robin Bloor, founder of the IT consulting firm The Bloor Group. “Right now there is no metadata coherency to do that.”

Bloor also notes that, while vendors are investing billions in the metadata problem, organizations are still going to have to create ontologies to define different types of data entities and the relationships between those sets of data.

But even as HP, IBM and Oracle (along with companies such as Queplix, SAS Institute and ASG Software Solutions) move to more aggressively apply virtualization concepts to metadata management, not everyone is sure that such a goal is even attainable.

Howard Dresner, chief research officer at Dresner Advisory Services, believes that from a pragmatic standpoint, different stovepipes of data are going to required dedicated tools to manage for the foreseeable future. In the near term, IT organizations are going to have to continue to focus on tactical approaches to managing data in different formats, he said.

In addition to obviously raising the total cost of managing data, Dresner added that the expansion in the types of systems needed to store data is going to create any number of data quality and governance issues not easily solved any time soon.

On top of that, while there is much interest these days in the role data scientists might play in the organization, there has been no corresponding focus on data governance. “Data governance is suffering,” he said. “IT organizations today have less budget and fewer resources available to address them.”

Vendors are clearly betting tens of billions of dollars in order to ultimately address what they see as a major data management opportunity. Yet despite all those investments, it doesn’t seem as if any of these issues are going to be completely resolved any time soon.


Image: Plus69/