Google Flu Trends Suggests Limits of Crowdsourcing

These are some hard buggers to track down.

Crowdsourcing is the way of the future, argue some pundits. Why rely on trained analysts or a handful of elected officials, their logic goes, when you can obtain similar—if not better—results by opening a process to input from the masses?

That idea has powered well-known Web properties such as Wikipedia and Kickstarter, and it’s even gaining attention as a governing tool. But there are also examples of when crowds maybe aren’t so wise: a new article in Nature suggests Google Flu Trends, a Website that uses search-engine data to track the spread of seasonal flu, may have “drastically overestimated” this year’s peak infection levels.

In order to map the spreading flu, Google isolates and analyzes millions of flu-related search queries (i.e., “How do I know I have the flu?”), then visualizes that data on a map. This past season, for example, the system displayed “intense” or “high” levels of flu activity across the entire United States. In a bid to keep search information as private as possible, Google also strips all individual identifiers from the millions of search queries before aggregation and analysis.

“We have found a close relationship between how many people search for flu-related topics and how many people actually have flu symptoms,” read an explanatory note on Google.org. “Of course, not every person who searches for ‘flu’ is actually sick, but a pattern emerges when all the flu-related search queries are added together.”

Google has argued that its ability to rapidly update data based on search queries, and trace the spreading of flu worldwide, makes it a significant complement to more “traditional” flu trackers such as the Centers for Disease Control and Prevention (CDC). In turn, many of those flu trackers and researchers have generally perceived Google’s work as accurate.

But this year, Google’s “estimate for the Christmas national peak of flu is almost double the CDC’s (see ‘Fever peaks’), and some of its state data show even larger discrepancies,” read Nature’s article. “Several researchers suggest that the problems may be due to widespread media coverage of this year’s severe US flu season, including the declaration of a public-health emergency by New York state last month.”

That media hype may have sent more people—healthy and sick alike—scrambling for their laptops and phones to input flu-related search queries, which could have skewed the Google Flu model. Google may have to adjust its algorithms going forward, in order to take media bias and crowd panic into account.

In other words, while crowdsourcing does hold promise for many different fields, the supporting model (and math) must be constantly monitored and refined. Maybe experts are still needed, after all.

 

Image: Sebastian Kaulitzki/Shutterstock.com

Post a Comment

Your email address will not be published.