Business Data Lake and the Dangers of Vendor Lock-In

Business Data Lake: just like this, only filled with bytes.

Consulting firm Capgemini and software company Pivotal are pushing something they call a “Business Data Lake,” which will apparently combine legacy data with incoming datasets in a single “pool” for easier analysis.

In theory, this pool will bring together structured and unstructured data in a way that facilitates analytics. “This new offer represents our belief that the future of information insight within enterprises requires a new operating model,” Pivotal CEO Paul Maritz offered in a standard-issue PR statement, “as both data volumes increase and real-time intelligent-response becomes a necessity of doing business.”

Capgemini and Pivotal will also collaborate on a Pivotal Center of Excellence within a Capgemini Business Information Management (BIM) center in India, with plans to fill it with 500 Pivotal “product experts” by 2015. That’s in addition to executives from both firms working together at Pivotal’s headquarters in Palo Alto, CA.

The Business Data Lake is meant to address companies’ frequent complaints about data ending up in “silos,” unknown and therefore unusable beyond the borders of a particular division or business unit. The Lake will leverage Pivotal’s storage and analytics tools to “store” (however virtually) all of a company’s data within a particular interface; from there, it’s a relatively efficient process of mining it for insight. The platform will support Hadoop query interfaces such as Hive and SQL, but the emphasis is clearly on Pivotal tools. (For a fairly in-depth breakdown of the technology involved, Capgemini offers a handy PDF.)

Uniting all corporate data on a single platform carries some significant risks, of course, including the ever-present specter of vendor lock-in. For many firms, blending analytics into the daily workflow could boil down to whether executives (and the IT pros and data analysts who report to them) are comfortable with handing over their infrastructure to one tech firm, or whether they’d prefer to jury-rig a solution from multiple sources, including free software. Security is another issue, as well; sometimes companies deliberately silo some data, the better to preserve secrets.

In any case, Pivotal wants to become that one data solution for firms, especially in the still-nascent Platform-as-a-Service (PaaS) segment. The company started life as an offshoot of EMC, an aggregation of its Greenplum and Pivotal Labs organizations along with VMware’s vFabric, Cloud Foundry and Cetas units. (VMware remains an EMC subsidiary.) In its inaugural year of existence, Pivotal has touted platforms such as Pivotal One, its set of application and data services running atop its enterprise version of Cloud Foundry, the open-source Platform-as-a-Service (PaaS) offering. Despite its pedigree and some very smart people onboard, however, it might take more than a little luck for the company to carve out a significant space in the increasingly crowded analytics market.

 

Image: 4Max/Shutterstock.com

Related