Editor’s Note: Updated with a response from Facebook.
Over the past couple of weeks, Facebook has made a lot of hoopla about its new “Graph Search.” Regardless of one’s personal feelings about Facebook itself, the social network is deeply baked into the fabric of the Web—so whenever they introduce anything new, it has the potential to greatly impact Web developers.
Privacy advocates are also concerned about Facebook’s new search tool. What does it really do? How does it compare to sites like Google with regard to tracking personal information.
Let’s get under Graph Search’s hood and see whether those privacy concerns are justified.
A Powerful Search Engine
On the simplest level, Graph Search is a search engine that traverses the Facebook data, accessible via a box at the top of Facebook. (A solid explanation of the backend infrastructure required to make it work is available on SlashDataCenter.) The system allows users to make lengthy natural-language queries in search of Facebook-based information about photos, friends, and other content. For example, you could input “Friends of friends who like trail running” and receive a list of people who meet that description—provided their information is public, and they indicated to Facebook that they “Like” trail running.
Should you input “Friends of friends who like trail running,” you’ll also see a related search: “People who like trail running.” This is interesting, because it goes outside your list of friends, traversing further into Facebook’s enormous data tree. From there, you can refine the search still further, via a list of dropdown boxes on the right side of the page. Want to know which of those “People who like trail running” actually live near you? Simply click on the appropriate box.
When it comes to finding very specific people, how deep does this thing go?
A Brief Digression into Graph Theory
As soon as Graph Search was released, I heard a lot of people—especially those with marketing backgrounds—saying it’s a silly or stupid name. But there’s logic behind it: in its most technical form, a graph is a set of nodes that may (or may not) be connected to what are known as edges. Nodes often represent data. (I was fortunate enough to do graduate work 20 years ago under some of the foremost modern graph theorists, but I never expected that “graph” would become such a commonplace term.)
In its purest form, graph theory doesn’t concern itself with what the nodes on the graph necessarily represent. But as soon as you incorporate an application, it becomes quite useful. For example, it can be used to determine the maximum number of colors needed to color a map whereby no adjacent country shares the same color.
I suspect people think “Graph Search” is silly because they assume “graph” means plotting nodes on a two-dimensional plane. For Facebook, though, the term is much more; an early version of Facebook showed how you were connected to users who weren’t on your friend list, clearly relying on graph theory to determine the connections. In 2007 the social network rolled out Facebook Platform, which included a mention of the “social graph.”
In Facebook’s initial conception of the graph, the nodes were just the people inside the social network, and the edges were the friend connections. Facebook later expanded the concept to represent generic “things” on the broader Web that Facebook members could interact with via apps; this concept was deemed the Open Graph.
In Open Graph’s documentation, the example used is a food recipe: individual recipes become nodes in the entire graph, and the Facebook user can connect to those nodes by clicking the Facebook “Like” on a particular recipe’s Webpage, or interacting with it in some other way. These connections are inevitably recorded in the database underlying the social graph.
The edges of the graph are now refined into types: since the nodes are nouns (such as a recipe), the types of edges are verbs (such as cooking). If I’m signed into Facebook, and I go to a recipe Website that makes use of the Facebook API through a Facebook app, and I click a button that I cooked the recipe, then a new edge is added connecting my profile to that recipe object with the action of being “cooked.” At that point, the recipe app might post to my wall an announcement that I cooked the recipe, and the edge connection is stored somewhere deep inside the Facebook database servers.
The more Facebook promotes its platform, and the more web developers use the platform, the more Webpages become a part of Facebook’s enormous graph. If you haven’t studied the platform at all, you might be unaware of the pervasiveness of Facebook’s graph: the articles on CNN are part of the graph for example. The way you can tell is the HTML source contains meta tags that include a property attribute starting with og:, such as this:
<meta content=”Monster blizzard could slam Northeast” itemprop=”headline” property=”og:title” />
Even Slashdot is a part of this; look at the meta-tags and you’ll see the og: attributes. Although a site can use their own Facebook application to take full advantage of the application (denoted by an AppId), many don’t—for example, an AppId isn’t present on Slashdot pages. Here’s CNN’s AppID, discoverable in the meta tags:
<meta content=”80401312489″ property=”fb:app_id”/>
These applications can either define their own actions and nouns (like the “cooking a recipe” example earlier), or use any of a set of predefined actions created by Facebook, such as “reading an article.” The noun or node is the article; the action or edge is reading. When the action takes place, a REST POST is used. Here’s the actual example from the Facebook docs:
Sometimes these actions take place automatically, such as the annoying junk you see flowing on your friends’ Facebook walls about their latest horoscope or they bought a virtual tractor. These happen because the user installed an app and granted that app permission to post to their walls.
CNN currently has a list of buttons for sharing an article, one of which is for Facebook. When you click that button, a post is sent to Facebook to make the connection between you and the article. The action is “read,” while the object is the article you read (or, more precisely, the URL for the article). CNN changes their layout quite often; as I write this, the button for sharing is called “Recommend.”
When I click that, a popup window appears, and I recommended the article about the blizzard approaching the Northeast United States.
The article gets shared on Facebook.
And nobody tells you about the connection that was just made: I’m now connected to that article in Facebook’s database, presumably forever.
Got that? When you recommend an article, it doesn’t just show up in your wall. It gets saved in the central database Facebook has built—and, as far as anyone knows, it does not go away. Do you ever stay up late surfing the web and share something inappropriate on your Facebook wall—only to wake up the next morning, horrified at your impudent action, and quickly remove it from the wall? Delete all you want, but that connection has quite possibly been saved in the Open Graph database.
Before we tackle the privacy implications of what Facebook’s doing, let’s take a moment to look at Google’s graph.
Sure, Google doesn’t explicitly promote the concept of a graph, but the search-engine giant has one too. In its case, the connections are bit more basic regarding the literal meaning of “Web” in “World Wide Web.” Web pages are interconnected through links, which are traversed by Google’s robots, which save the connections between the pages. That allows you to search for all the pages that link to a particular page.
Google has been criticized because they store your search history if you’re logged into a Google account. The mechanism they use is a bit simpler than Facebook’s app system; the links inside the search results start with a link back to Google, which then redirects you to the destination page. (But the status bar at the bottom or top of the browser only shows the final URL, somewhat obscuring the pass through Google.)
Although Google has tracked this information for several years, its tracking wasn’t broadly known until last year, when it sparked a lot of anger across the Web. Google provided a way to supposedly disable the tracking of this information. The Electronic Frontier Foundation has a page on its Website showing how to disable the saving of your history—at the end of the document, includes a caution that Google could still be saving that data and using it internally handing it over to law enforcement if necessary.
What Facebook Records
Now consider this: Facebook doesn’t just record Web searches. Just because I click on a link offered by Google doesn’t mean I actually read the page that pops up. But with Facebook, the connections go much deeper. Suppose a man in his 50s is accused of being a child predator, and the court requests records from Facebook. They’ll dig up everything: Facebook Pages he Liked, or temporarily Liked; Facebook groups to which he belonged, or used to belong; outside articles visited or shared; his friends and their friends, along with all their activities. While courts can’t convict you for associating with people of questionable character, a jury could certainly be swayed to feel that, if you associate with such people, you may be of that character. And it’s all stored in Facebook’s servers.
It can and does happen. I know a man who is serving a life sentence for murder. I haven’t talked to him since he went to jail about seven or eight years ago, but one of the key pieces of evidence in the trial was that he had done a lot of Google searches on ways to kill somebody; the person who died, did so by one of those methods found in this man’s searches. I imagine if the guy I knew committed the crime today, the investigators would have used all the Facebook data they could find as well.
And finally, we’re back to Graph Search. Why did I wait until so late in the article to finally get here? Because it’s nothing particularly groundbreaking, even though Facebook is heavily promoting it: Graph Search is a limited interface into Facebook’s broader social graph, which (as outlined above) has existed for several years.
Facebook has offered an API for the graph for quite some time (documentation for searching it can be found here). Using a REST-based search, I could quickly find which of my friends checked into a particular restaurant I like, and I was able to see the comments they posted on it. While the results were JSON—in other words, not nearly as pretty as the “regular” Graph Search—that didn’t bother me much as a programmer.
But this API doesn’t let me dig as deep into the graph data as I can with Graph Search. Having watched the Facebook API for a few years now, my guess is they’ll update the API to return as much as they do with the new Graph Search. Certainly the information is there, and it would be easy for them to expand the REST interface; in fact, I wouldn’t be surprised if it’s already available, just undocumented.
So what conclusions can we draw? How does this thing compare to Google’s search, especially in the realm of privacy?
When a billion people are listing everything they like (or “Like”) on Facebook, from travel and games to photos and people, all that information is stored in one gigantic graph. That’s somewhat more ominous than Google’s tracking.
The kicker is that, for years, people had no idea that Facebook saved this information. Some of the news articles I’ve read talk about how Graph Search will start small and slowly grow as it accumulates more information. This is wrong—Graph Search has been accumulating information since the day Facebook opened and the first connections were made in the internal graph structure. I did a search of people who like trail running and have ever visited my hometown, and the system produced several dozen people. The information is already there. (And these people weren’t on my Friends list, and the few I checked didn’t have any mutual friends with me.)
For users of Facebook looking to meet more friends, Graph Search might prove interesting and useful. And for law enforcement and other “Big Brother” analyses, it could be a gold mine. People were nervous about Google storing their history, but it pales in comparison to the information Facebook already has on you, me, and roughly a billion other people.
A Response from Facebook
A few days after this article was published, a Facebook executive sent an email. “Your article implies that you can’t delete the information on Facebook,” wrote Frederic Wolens, the social network’s Public Policy Associate Manager. “This is not true. Not only do hard delete content whenever a user deletes content, but also Graph Search works by indexing available information[,] not crawling Facebook.”
He added: “Graph Search makes finding things easier, but you can only see what you could already view elsewhere on Facebook. You control who you share your interests and likes with on Facebook. Each category of interests and likes has its own privacy setting.”
Wolens also took issue with the idea that Facebook potentially saves user information to a central database—where it exists in perpetuity: “This is also not true, when you view a social plugin on another site, but none of your information is shared, and no information about your actions is provided to advertisers.” He claims that Facebook does not “receive information about the visit” because of the way “the Internet works,” following that up with a link to a Facebook page.