Big Sites Covertly Track Browsers by Their Fingerprints

One hundred-forty-five of the top 10,000 Websites use systems to covertly “fingerprint” the devices of users who visit them, collecting data that could be useful both to marketers and hackers.

A 2013 analysis of 1 million of the busiest sites on the Web for the presence of device-fingerprinting software found the practice is much more common than previous studies estimated, according to researchers at KU Leuven (a university in The Netherlands) and the Flemish research firm iMinds.

The bulk of those sites didn’t disclose in their privacy statements that they were using fingerprinting to track users; nor did they disclose the types of data they gathered. They often disregarded do-not-track (DNT) request settings and could bypass most privacy-protecting technologies users chose to protect themselves. (full report, PDF)

Device- or browser fingerprinting allows Web sites to identify repeat visitors by collecting data contained in HTTP requests and other interactions that signal characteristics including screen size, versions of the browser or other applications installed, list of installed plugins, and the list of installed fonts.

Of the top 10,000 sites, 1.5 percent fingerprint users based on the configuration of their Flash installations; of the top million sites, 404 do the same using Java. (Click Here to take a test the EFF offers to identify how unique your browser is and show what data can be collected from it.)

Creating profiles based on specific browser configurations makes end users far more individually identifiable than planting cookies, tracking IP addresses, or other means of tracking that rely on data that could easily be cleaned out or changed by the end user, according to the study. By tracking the Flash and Java plugins in most browsers, as well as the specific combination of thousands of fonts that may be installed in a standard browser, it is possible to uniquely identify as many as 96 percent of the browsers visiting the site, according to the study.

Mathematically, it takes only 33 bits of data to certainly identify one individual from 7 billion others, according to a 2010 study from the Electronic Frontier Foundation (EFF) that showed 84 percent of Web users can be identified from the information provided in HTTP headers and User Agent identifications. Browsers with Java or Flash plugins installed were identifiable 94 percent of the time.

User agents in Web browsers provide, on average, between five and 15 bits of identifying information when they make a request to a Web site – but that does not include geolocation data (ZIP Codes), lists of installed plugins, IP addresses and other data that are routinely available in any HTTP request, the EFF found.

Browser- or device-fingerprinting is common in security applications designed to block fraud, phishing or other activity based on stolen or false identities.

“Fraudsters will try to get around any detection system,” according to a June 2012 article from security software developer ThreatMetrix. “However, typically good customers present themselves in predictable ways which causes fraudsters trying to cover their tracks to stand out like a sore thumb.”

Depending on the number of identifying attributes available, fingerprinting can be accurate enough to be used as the initial identifying step in automated risk-analysis processes that use activity data from many sites to judge the likely intentions of a particular user, according to instructional material on IBM’s Federated Identity Manager Business Gateway.

The immediate problem with fingerprinting is not its accuracy or frequency, but its covert nature and the difficulty even savvy users have in avoiding it, according to researchers at KU Leuven/iMinds.

Tracking end users using cookies provides more certain identification and more accurate information about the user’s activities, they wrote. Unlike cookies, whose presence or absence can change the behavior of a site or the information a user sees, fingerprinting has no obvious impact on the user’s activity and usually provides no indication that the profile has been stored.

Fingerprinting gives users little, if any, opportunity to opt out, bypasses the private-browsing mode of most browsers and is often camouflaged further by sites that remove the fingerprinting script once it has run, or that run the scripts from third-party widgets that could conceal which site is collecting the data even from users who realize it is happening.

Like other forms of web activity data, browser and device fingerprints are aggregated into large marketing databases and linked with demographic or other data to even more closely identify and target specific users, the report found.

Fingerprinting itself isn’t inherently risky, according to researchers at both EFF and KU Leuven/iMinds.

It is inherently intrusive, however, and, so far, almost uncontrollable, even by security conscious organizations that would prefer anonymous third parties not be able to track their employees’ locations and online activities without their knowledge.

So far, there’s no legislation in the wings that could curtail the practice, however, and not much outcry against it – a situation that won’t change without some negative publicity, according to an advertising company executive quoted in a July Forbes story on condition he remain anonymous. “At the end of the day, there isn’t really a legal case against it, there isn’t really a privacy case against it,” he said about fingerprinting. “If you don’t want anybody to know anything you’ve done online, don’t go online. If you are going to commit murder, don’t research your weapons on Google.” Maksim Kabakou