Is Crowdsourcing the Future of Data Analytics?

Organizations face a shortage of data analysts. That’s been the conclusion of several surveys and studies over the past few quarters, including a recent Gartner research note that concluded filling such jobs could become a “challenge” in coming years.

But what if organizations didn’t need to hire an in-house data scientist or analyst? What is there was a system similar to an IT help desk, capable of providing all sorts of analytics assistance to anyone stumped by a particularly vexing data problem?

Earlier this week, Greenplum (a subsidiary of EMC) announced a system that could help organizations in exactly that sort of Big Data predicament: an integration of Greenplum’s Chorus platform with startup Kaggle’s community of 55,000 data scientists.

In theory, the collaboration helps both parties. Kaggle opens its scientist community to organizations in need of their skills, which one imagines could prove quite lucrative; meanwhile, Greenplum enhances the Chorus platform, which offers everything from federated searches across data assets to workspaces and sandboxes for analytics endeavors. Chorus also includes visualization tools such as histograms, heat-maps, time series, and box plot charts.

Kaggle bills itself as a Website that makes “data science a sport.” It lets organizations post data problems to an extensive community of data scientists, who can provide an effective solution in exchange for a prize. Kaggle Prospect lets those scientists propose ideas for the best uses of a particular dataset; Kaggle competitions offer up the aforementioned problems for a solution and fabulous prizes; Kaggle Engine (also known as hosted solutions) lets organizations integrate the winning models and algorithms from those competitions into their systems.

“Those who are part of Kaggle’s community can choose to opt-in to doing contract work through Chorus,” read Greenplum’s Oct. 23 release on the integration. “From within the Chorus interface, Chorus users wishing to engage the Kaggle community will search, browse, and drill into profiles of Kaggle community members who are interested in collaborating together.”

While crowd-sourcing has proven an effective solution for many organizational problems, the biggest hurdles of the Chorus-Kaggle integration could be privacy and data security—two things Greenplum rushed to address in its initial materials: “Through secure integration of Chorus and Kaggle APIs, users can expose relevant information from Chorus Workspaces and send secure messages. Kaggle certifies Chorus as the source of these messages and forwards messages to the appropriate recipients.”

Whether that persuades organizations to expose potentially sensitive information remains to be seen; but if it works, this sort of crowdsourcing could provide a model for other data-analytics platforms—and help solve a pressing lack of data analysts.


