How Facebook Built Natural Language into Graph Search

It takes a lot of work to build a system capable of showing which of your friends have truly insane hobbies.

Facebook’s Graph Search is an ambitious project: give users the ability to search through the social network’s vast webs of data via natural-language queries.

But that’s much easier said—so to speak—than done. Although human beings think nothing of speaking in “natural” language, a machine must not only learn all the grammatical building-blocks we take for granted—it needs to compensate for the quirks and errors that inevitably pop up in the course of speech.

The Facebook team tasked with building Graph Search also knew that the alternate option, keyword-based search, wasn’t a viable one. “Keywords, which usually consist of nouns or proper nouns, can be nebulous in their intent,” Facebook engineering manager Xiao Li wrote in an April 29 posting on Facebook’s blog. “For example, ‘friends Facebook’ can mean ‘friends on Facebook,’ ‘friends who work at Facebook Inc,’ or ‘friends who like Facebook the page.’” To put it another way, keywords aren’t very accurate when it comes to figuring out the connections between objects, largely because it forces the user to become very precise with regard to phrasing and intent.

That left the team with building a natural-language interface, based on three components: entity recognition and resolution (“finding possible entries and their categories in an input query and resolving them to database entries”), lexical analysis (“analyzing themorphological, syntactical and semantic information of the words/phrases in the input query”), and semantic parsing (“finding the top N interpretations of an input query given a grammar expressing what one can potentially search for using Graph Search”).

The engineers used a weighted context-free grammar (WCFG) to represent Graph Search’s query language. Think of a tree, with the root or base as the “Start” of a particular query. Facebook calls this the “parse tree,” and the various “limbs” branching from the root include verbs, objects, etc. The “leaves” at the top are the terminal symbols, or entities such as users, cities, employers, groups, and the phrases that link those entities together. It’s perhaps easier to diagram than explain:

There’s also a “semantic tree” that mirrors the “parse tree,” given how all the production rules broken out in the latter also have a semantic function of some sort: “For example, the parse tree that generates “My friends who live in {city}” has a semantic intersect(friends(me),residents(12345))). Such semantics can be transformed to the Unicorn language and then executed against search indexes.”

Facebook also introduced something called “parameterization” (say that three times fast), giving grammar a “cost structure” in order to rank the parse trees; it also built in entity detection and resolution, which can “identify query segments that are likely to be entities and classify those segments” into one of the 20-plus entity categories (which include city, group, application, and so on). That’s in addition to lexical analysis, a system capable of recognizing synonyms (i.e., “where to eat sushi” is equivalent to “sushi restaurants”), inflections, and much more. The blog entry is worth checking out, if only as an example of how much detail-oriented thinking goes into massive platforms such as Graph Search.

And Graph Search is only in its beginning stages. “Making Graph Search available to mobile and international users will give all users equal opportunities to enjoy the power of Graph Search,” Li wrote. “The grammar coverage can be expanded drastically if we inject semantic knowledge from the outside of the Facebook graph, and connect it with the Facebook world.”

Facebook has offered insights into Graph Search before: back in January, for example, it offered a peek into how it adjusted its hardware infrastructure to deal with any spikes in traffic from the system (the solution centered on the Disaggregated Rack, which breaks up hardware resources and scaled them independently of one another). In March, the company posted a follow-up detailing some of the software underlying Graph Search, including Unicorn, a system for building the platform’s underlying index.


Image: Facebook