Data Analytics’ Next Big Feat: Sarcasm Detection

Thousands of data scientists are spending millions of dollars to ensure that analytics platforms someday get this joke.

French tech firm Spotter has apparently devised an analytics platform capable of identifying sarcastic comments, according to the BBC.

Spotter’s platform scans social media and other sources to create reputation reports for clients such as the EU Commission. As with most analytics packages that determine popular sentiment, the software parses semantics, heuristics and linguistics. However, automated data-analytics systems often have a difficult time with some of the more nuanced elements of human speech, such as sarcasm and irony—an issue that Spotter has apparently overcome to some degree, although company executives admit that their solution isn’t perfect.

“One of our clients is Air France. If someone has a delayed flight, they will tweet, ‘Thanks Air France for getting us into London two hours late’—obviously they are not actually thanking them,” Spotter executive Richard May told the BBC. “We also have to be very specific to specific industries.”

Spotter faces some significant competition in the sentiment-analysis arena. IBM, for example, has created a Social Sentiment Index that judges public opinion based on data from social media; Big Blue claims its software is sophisticated enough to determine if a source is being sarcastic. Oracle and Salesforce offer Big Data platforms with sentiment analysis—the latter’s software was even used by Obama’s re-election campaign to determine moods of core voters. SAP’s HANA in-memory technology also judges sentiment based on social networks, Websites, blogs, wikis and CRMs.

More nuanced sentiment analytics could help companies better respond to customer complaints or more accurately judge how a product or service is performing on the open market. However, Big Data is also a business of uncertainty: despite the precise mathematics of its underlying algorithms, and the clear business goals those algorithms are designed to support, the human experience being fed into those platforms is a far murkier creature, full of ambiguities. It may be some time—if ever—before Big Data becomes powerful enough to really absorb and accurately handle those equivocations.


Image: editha/