Browser Fingerprinting: A Survey

paulus_magnus2 · on May 6, 2019

https://en.wikipedia.org/wiki/Stalking

Stalking is unwanted and/or repeated surveillance by an individual or group towards another person.[1] Stalking behaviors are interrelated to harassment and intimidation and may include following the victim in person or monitoring them. The term stalking is used with some differing definitions in psychiatry and psychology, as well as in some legal jurisdictions as a term for a criminal offense.

This should also apply in the digital world. Doesn't it?

closeparen · on May 6, 2019

Noticing and remembering people's faces, clothes, tics, etc. when they walk up to you and interact is not stalking.

Panino · on May 6, 2019

That is a fair point, for sure. But I don't think it's the whole story.

What about this: Noticing and remembering people's faces, clothes, tics, etc. when they walk up to you. Also you have blanketed the planet in cameras that all feed back to a giant database that builds a detailed dossier on almost everyone, based on faces, clothes, tics, etc.

The way you put it sounds almost accidental, like no effort was made, with no real goal in mind. That's not a faithful analogy of the pervasiveness of Google Analytics, Facebook buttons, etc.

dwild · on May 6, 2019

Stalking is toward another person, that person is specific and it's voluntarily towards that person.

ardy42 · on May 7, 2019

> Noticing and remembering people's faces, clothes, tics, etc. when they walk up to you and interact is not stalking.

It is when you place yourself in a position to be nearly every one of the hundreds of people that person interacts with in a day.

Quantity has a quality all of its own. You can't property reason about tracking and surveillance through analogies to conventional social interaction.

paulus_magnus2 · on May 7, 2019

Noticing in itself isn't but we're talking about amassing surveillance intel on people, bundling into packages and selling to the highest bidder. Or selling to hackers on black market. Normally you'd need a private detective licence and a good reason for each tracked person.

vbuwivbiu · on May 6, 2019

human memory isn't comparable to digital memory. When we recall a person's face we've interacted with we don't beam that information telepathically to others and we (most of us) gradually forget the face over time.

codetrotter · on May 6, 2019

Well, there is something called cyberstalking.

https://en.wikipedia.org/wiki/Cyberstalking

I don’t think you will be able to convince a court that browser fingerprinting is cyberstalking though, but you could certainly try.

z1r011 · on May 6, 2019

Cynical mode: The law suit would be very expensive, but imho. possible to get to the next round. This elevates for several instances, until you're out of money and the lawyers can burry it.

munk-a · on May 6, 2019

And, if it came to a lawsuit, you'd be up against Amazon, Facebook and Google all at once (among others).

dredmorbius · on May 7, 2019

You can choose your battles. There's no need or obligation to take on all targets, and the legal strategy of first pursuing a tractable, or even sympathetic (to your interests) target is well established.

The cyberstalking industry has many players. Most are small. A target-rich environment.

Shifting the investment or financing calculus, via increased investor risk, possibly including legal liability, would be huge.

fghtr · on May 7, 2019

So we need an organization like EFF for that.

pmiller2 · on May 6, 2019

This is really more like sousveillance[0]. I'm not sure how I feel about it. On the one hand, recording server logs has been standard and accepted practice pretty much since there were web servers. On the other hand, we have the capability to record and discern so much more information now and to do more with that information once we've recorded it.

Ultimately, I think it should require user consent. (And, speaking of consent, I wonder if simple fingerprinting and associating a browser with an ID would violate the GDPR. It's clearly intended to be "personal" to one particular user, but is it "identifiable" information? Does it matter?)

[0]: https://en.wikipedia.org/wiki/Sousveillance

wongarsu · on May 6, 2019

According to the paper the most useful attributes for browser fingerprinting are:

- User agent

- List of Plugins

- Screen resolution and color depth

- Canvas

- WebGL Renderer (~GPU Model)

Maybe Canvas can be fixed to a degree, but the others feel like things that will inherently be different. Firefox with the privacy.resistfingerprinting setting solves all of these at least partially (and more), but not without some usability tradeoffs.

Jonnax · on May 6, 2019

Browser plugins really need to go away. If it's not implementable normally then it shouldn't be on the web. In my opinion at least.

GPU models also should not be given, browsers need to abstract features and performance into something else. Like if I'm writing a WebGL app, I should just be able to test a few baselines configs.

User agent is a tricky one, Chrome on Android gives too much info because it's built by an advertising company.

Abishek_Muthian · on May 6, 2019

Even Firefox, Firefox Privacy on Android gives away android version in the User Agent. New Tor browser (android) fakes it, I remember even old Orfox used to give away device name as well & there used to exist an open issue to address it in their issue tracker.

ekianjo · on May 6, 2019

> Browser plugins really need to go away. If it's not implementable normally then it shouldn't be on the web. In my opinion at least.

So no more ad blockers?

saagarjha · on May 6, 2019

There’s a difference between plugins and extensions. Plugins add support for new types of media (think Flash, Java) while extensions are your adblockers and userscripts.

Spivak · on May 6, 2019

Plugins also provide support for new video codecs. Plugins are definitely the wrong place for it but I would be so happy to see codecs factored out of individual browsers.

AgentME · on May 6, 2019

That would make the list of supported video codecs a better fingerprinting mechanism.

Ajedi32 · on May 6, 2019

Is there any reason why we couldn't have all browsers on all platforms use the same user agent header? I realize some sites use the user agent as a crude form of feature detection, but it's my understanding that that's generally considered to be bad practice.

dumbmatter · on May 6, 2019

Sometimes there are browser bugs that are impossible to do feature detection for, in which case have no better option than looking at the user agent.

For instance, several versions of Firefox had a serious bug in shared workers, which appeared intermittently when users opened your site in multiple tabs https://stackoverflow.com/questions/51092596/feature-detecti... - I had to use the user agent to work around this.

EastSmith · on May 6, 2019

Sure, this is a fair use case. But then can't the browser tell the end user that someone wants to read this browser data so the user has a chance to say "no"?

dredmorbius · on May 7, 2019

Changing user-agent is virtually always possible, though a vanishingly small fraction of users will do so.

Bundling into adblockers or browsers directly would help.

wongarsu · on May 6, 2019

Apart from the difficulty of obscuring this information from JavaScript, it's politically impossible.

Chrome could change its header to the header of a competitor or a neural one, but has no reason to (Google's main business is delivering ads). For other big browsers like Firefox and Safari it would be a huge loss to change the header: they would vanish from browser market share statistics, and with that the business case to make websites work well with their browser dwindles. That in turn would reduce their actual market share. That only leaves the browser with a market share so tiny they are ignored by developers, and those browsers do often spoof the user agent header (or at least make it very convenient to do so).

dwild · on May 6, 2019

> GPU models also should not be given

You don't need the model to fingerprint it. It's performance is probably more than enough.

dillondoyle · on May 6, 2019

you can get both primitives values+settings in addition to entropy similar to canvas fingerprint using shaders which will render slightly different across different devices.

modzu · on May 6, 2019

thats like saying programs should go away if its not implementable by the OS. its just a different layer. browser plugins are tremendously useful

edit: also your comment seems to imply that plugins are used for websites trying to do something the browser doesnt support (ex flash) -- while this is sometimes done, most plugins add additional functionality for the user, ex adblockers, usability improvements, etc

kalleboo · on May 6, 2019

Plugins and extensions are different things.

The `navigator.plugins` used for fingerprinting is just for NPAPI/PPAPI stuff like Flash, Java, Silverlight. Just about all browsers are also in the process of deprecating support for them.

(it can also be possible to fingerprint browser extensions, but that is a bit trickier as it's not served up directly by the browser as a handy array)

modzu · on May 6, 2019

oh yes my bad, thanks. i was thinking extensions (maybe thanks to all the ff news!). wrt fingerpritning plugins, it seems to be the least of the issues and hopefully not an issue for long :)

3xblah · on May 6, 2019

The first "attribute" is simply an HTTP header. As such it can be selected and changed by the user. The user may choose not to send a User-Agent header at all. IME, few sites actually require a User-Agent header in order to retrieve a page. Developers sometimes try to vary page content based on this header. For example, send a certain string containing the name of a "bot" and you might get nothing. The detection occurs before any content is returned.

The rest of these "attributes" are not HTTP headers, and occurs after the resource is fetched. We do not need to give away this information in order to retrieve the resource. The fingerprinting problem arises because the software we use to do the retrieval does far more than retrieve web pages. These other attributes cannot be detected unless the software used to retrieve the web page also includes other features that process what is retrieved, including a Javascript engine that runs code straight from the internet ("browser").

Looking to the authors of a popular browser to solve the problem they themselves have created. Is it rational to expect that this is where the solution will come from?

If one is willing to make some "useability tradeoffs" then one might use a simple http client to retrieve web pages instead of a popular browser. After the pages are retrieved, then one can use a popular browser to view them if desired; the browser need not have access to the internet. This is what I do. Not to avoid "fingerprinting" but just because it is faster and more robust.

With the help of a trivial program that transforms urls into http (I wrote one), one can retrieve the text/html of web pages, without requesting all the third party advertising cruft, with any tcp client such as netcat. HTTP pipelining becomes easy, making consumption of large quantities of information more efficient -- many pages, one connection. The two programs together form an "http client" with, relative to a popular browser, or even a program like curl, very low code complexity.

Whenever I see the "fingerprinting" debate come up, I always go through the same thought experiment: For situations where "fingerprinting" is a problem, web users all use the same simple http client. All send the same minimal headers. The web must accomodate users not the other way around.

gruez · on May 6, 2019

>The first "attribute" is simply an HTTP header. As such it can be selected and changed by the user.

Maybe if you have script disabled, but if you have javascript it's trivial to detect which browser you're on based on javascript implementation quirks alone.

>We do not need to give away this information in order to retrieve the resource.

You clearly haven't seen mandatory javascript sites.

3xblah · on May 6, 2019

"Maybe if you have script disabled."

You are making the assumption that I am using a browser to retrieve the page. I can use a simple http client with no features such as JS to retrieve the page. I have full control over what headers I send and the contents of those headers.

"You clearly haven't seen mandatory javascript sites."

I have seen many sites that prod the user to enable JS yet still return the same text/html even when JS is disabled. Sometimes I am only after the content of the page: text, URLs pointing to images or other resources, URLs pointing to other sites, etc.

I have seen some sites where there is "no content" in the initial page fetched, with or without JS enabled. These are dummy sites, hollow shells, skeleton pages. The technique is simply misdirection. All the useful content is pulled from another site via a second request (JS triggering the request). Maybe this is a CDN. I simply read the JS or use a debugger to find this "endpoint". Then use the simple http client to request the content as usual. These dummy sites are not the majority of sites on the web, nor even the majority of sites posted to HN.

The endpoints supporting these "no content", dummy sites often deliver the text with fewer or no tags, without HTML. Maybe it is JSON. This makes it easier for me to format it the way I like it. What many HN readers may perceive as a growing nuisance I believe could be a blessing in disguise. I prefer delivery of content in a more "raw" format allowing the user to process it as she pleases client-side, with the software of her choice. This leaves no room for advertising. It is more efficient and the user becomes the creative one, deciding how she wants to present the content to her own eyes.

muxator · on May 7, 2019

Is this really your day-to-day way of browsing the internet? Like if you were web scraping all the time?

If so, I want to shake your hand, sir.

3xblah · on May 7, 2019

For commercial use of the web, things like internet banking, purchasing, etc., I use what is "recommended". The popular web browser.

For non-commercial use of the web, I use what I want to use. A simple http client and/or text-only browser.

My day-to-day use of the internet is almost always 100% non-commercial.

I never want to use a Javascript-enabled browser for simply retrieving information from the web. For example, reading. This is non-commercial activity.

For simple information retrieval, I find a large, complex, popular, recommended "browser", a single, "do-everything" program, is overkill and, counterintuitively perhaps, such browsers loaded with "features" actually limit what I can do. Using smaller, simpler "one-purpose" programs to retrieve web pages allows me more flexibiilty. I can be more productive.

Using single-purpose programs, I also find the retrieval process to be more reliable and robust, not to mention more transparent. I know exactly what I am sending. Unlike the popular browsers, these programs are not fetching and running code from the internet automatically. I feel a greater sense of control.

alkonaut · on May 6, 2019

What plug-ins I have shouldn’t concern the server on the other end. Why does is it transferred?

Canvases should never be readable by the other end without explicit permission. Basically transmitting anything rendered to a canvas should be equivalent to using my web camera from an integrity standpoint.

Time zone, operating system (from a short list) and language is fine, because it’s not a lot of entropy and my ip is known anyway unless on a vpn.

Gl renderer/graphics card, desktop resolution - nah. The viewport dimension is all that should be known.

Basically the browser should by default have an entropy budget for my entire fingerprint that ensures I’m not personally identifiable.

9HZZRfNlpR · on May 6, 2019

How can you not transfer the info about plugins? You test them the in front, can he done in million ways with JavaScript and send it to a server of your choice.

username223 · on May 6, 2019

> - User agent

This could be fixed with a bit of courage. Firefox could use "Firefox", Safari "Safari," etc. UA sniffing and UA strings are completely absurd at this point, and need to go away.

> - List of Plugins

JavaScript should not see this, especially not their versions.

The last three are harder to avoid for "responsive" web design. One attribute you didn't mention was the font list. There is no good reason for JavaScript to be able to see this. The model should be that the page requests a font, and the browser uses either that or its best try at a substitute.

ken · on May 6, 2019

It's a nice idea, but major sites still do user agent sniffing. Try setting your UA to one of those strings and visiting google.com -- the SERP looks like it came from 2009.

If Firefox tried that, it would instantly become that weird browser that makes Google look bad. The internet would be flooded with tips on how to set a custom UA that made it pretty again.

username223 · on May 6, 2019

> Try setting your UA to one of those strings and visiting google.com -- the SERP looks like it came from 2009.

I just tried, and it's great! I had forgotten that search could be so snappy and simple. The more likely outcome, unfortunately, is that Google would adjust its UA-sniffing to serve the same 2019 bloat to "Firefox."

thejohnconway · on May 6, 2019

Surely user agents could give out way less information (virtually none in fact?).

wongarsu · on May 6, 2019

You can't get around giving nearly the entirey of the User Agent string simply thorugh feature detection (well, except for the phone model when on mobile). Canvas and WebGL can be fixed by limiting their functionality, or at least that's what Firefox does in the aforementioned mode. Canvas could conceivably be fixed to just behave the same everywhere, but for WebGL you have to limit it to some common denominator to hide the GPU model, so privacy nessesitates less functionality. The list of plugins can be spoofed to be empty. Screen resolution (or more importantly, window size) has to be somewhat accurate because lots of websites use it for responsive design. Firefox can round the window size to the nearest multiple of 200x100 which doesn't break too many things, but that's still far from giving the same value for everyone.

https://wiki.mozilla.org/Security/Fingerprinting

zzzcpan · on May 6, 2019

> Canvas could conceivably be fixed to just behave the same everywhere, but for WebGL you have to limit it to some common denominator to hide the GPU model

You can just measure performance of those things then.

bartimus · on May 6, 2019

Unless two browsers are identical there's almost always a way to detect some particularity.

munk-a · on May 6, 2019

And even if the browsers are identical there are still some ways to detect different user based on client side behavior (assuming JS is enabled, or some particularly crazy CSS)

wongarsu · on May 6, 2019

Just giving the same value for everyone on the latest version of Firefox using Windows 10 with a maximized window on a 1920p display would seriously hamper fingerprinting. It can still be enough when used with other information (IP, visited sites etc), but at least it would dilute the signal a lot.

bartimus · on May 6, 2019

Section 3.2 'Advancing fingerprinting' shows some more ways you can find particularities which will be hard to hide.

Jonnax · on May 6, 2019

Yes it really should.

Chrome on Android reports your phone model and build number. If you're using a lessor known phone with a carrier specific ROM, then you're in a really tiny population.

Fnoord · on May 6, 2019

Yes, but it will break functionality because servers use it to apply hacks on the website before the content is served.

WebRTC is another one which you can disable. JavaScript, in general, as well, requiring uMatrix.

Chrome leaks the extensions you use, Firefox does not (AFAIK). As for blocking canvas, see [1]

To test your fingerprinting I can recommend IPleak [2]

The problem of this all is, its a usability nightmare.

[1] https://addons.mozilla.org/en-US/android/addon/canvasblocker...

[2] https://www.ipleak.net

tinus_hn · on May 6, 2019

They’ll manage. Just standardize to a fixed value.

1024core · on May 6, 2019

> Chrome leaks the extensions you use,

Sure... "leak", wink wink.

zaarn · on May 6, 2019

Not without breaking Grandma's favorite online banking website.

mobjack · on May 6, 2019

User agents are not that great of tracking.

If you use a common device and browser kept up to date to the latest versions, then you are not going to be unique. Using Safari on an iPhone will make you similar to 30% of web traffic.

Browsers also update all the time changing the user agent so it isn't that reliable tracking over extended periods of time.

The am I unique websites seem to compare your user agent to a database with outdated browser versions from the past data, making your score look much scarier than it really is. If you keep things up to date, chances are that you are not that unique.

everdrive · on May 6, 2019

I'm a bit old school when it comes to browser tracking, and as such I'm not as familiar with Canvas. What exactly is Canvas? What information is provided which can be tracked? Are there ways to disable Canvas without installing browser extensions?

matthewaveryusa · on May 6, 2019

Basically different devices and browsers render high-level graphical objects, such as fonts, differently. When you visit a site that does canvas tracking, a small javascript snippet will ask your browser to render a few graphical things on an html5 canvas and export that canvas to a png. The hash of that png is a good fingerprint of your hardware, browser and rendering software.

rubidium · on May 6, 2019

Just to be clear, the only reason to do this is fingerprinting, correct?

dillondoyle · on May 6, 2019

which has legitimate use cases in fraud detection and login security (eg recaptcha)

mindslight · on May 7, 2019

It's a bit of a stretch to cast those as "legitimate use". A security vulnerability is a security vulnerability, regardless of who exploits it.

kalleboo · on May 6, 2019

The Wikipedia page has a pretty good summary https://en.wikipedia.org/wiki/Canvas_fingerprinting

Basically Canvas is a JavaScript API for programmatically drawing graphics and text. Subtle differences in color, font rendering, antialiasing etc will produce different rendering on different hardware depending on OS, browser, GPU, drivers, etc. This lets a site running JavaScript to generate a hash unique to everyone with your specific combination of software and hardware and uniquely identify you without cookies.

titzer · on May 6, 2019

I've given up on technological solutions to this. We need a "do not track" bit for browsers and it needs to carry the weight of law behind it. Violators need to be named, shamed, and fined.

gregw134 · on May 6, 2019

Ironically, the do not track setting itself can be used in browser fingerprinting.

55555 · on May 6, 2019

In fact, it is _primarily_ used as a point of entropy for tracking, and does almost nothing to prevent tracking, so all-in-all it's a huge net negative. And because it's mostly turned off, turning it on to be tracked less actually causes you to be tracked more.

titzer · on May 7, 2019

Exactly, this is why it needs the force of law behind it.

paulryanrogers · on May 6, 2019

With robust enforcement. Otherwise it's a technical cat and mouse game, and few large players are on the users' side.

EDIT: typo

Santosh83 · on May 6, 2019

Can the major browser vendors not get together and implement a set of standard values and defaults that can be activated to reduce one's fingerprint, at least enough to make it non-unique? Something like what Tor Browser does but expanded to include all the major browsers. The way is open, the question is if there is will.

wongarsu · on May 6, 2019

You can have what Tof browser does by using Firefox and setting privacy.resistfingerprinting in about:config. But be prepared to solve a lot of recaptchas, and don't hold your breath on Google implementing something similar in Chrome. Google is still an adtech company. They didn't spend millions (if not billions) on making and promoting Chrome only to support fingerprint resistance.

muxator · on May 7, 2019

Maybe minor nuisance (probably major for some): with privacy.resistfingerprinting, Wastsapp web does not even allow authentication.

eeZah7Ux · on May 6, 2019

That's what distributions are for: patching out anti-features among other things.

zzzcpan · on May 6, 2019

This will only shift efforts to extract uniquely identifiable information through side channels instead of directly. So you also need to disable javascript or redesign it to prevent side channel attacks (it's possible, but would be a completely new javascript compiler with not entirely compatible language, also this would mitigate spectre attacks).

However first you need some serious regulations to split Google and take away its browser "business". As it is Google is doing everything to preserve and advance tracking and fingerprinting in web browsers.

givinguflac · on May 6, 2019

Apple has done this with safari, now we need other vendors to follow suit. I wouldn’t hold your breath given the near-monopoly chrome has as of late.

user17843 · on May 6, 2019

it hasn't, it was PR. look up amiunique.org or panopticlick.eff.org with Safari.

benplumley · on May 6, 2019

Apparently one in 900 browsers has my list of system fonts, by far the most identifying thing Panopticlick could detect. However, the font list looked completely vanilla to me. I wonder if there's a telltale one in there that I installed manually.

lmkg · on May 6, 2019

Are they in alphabetical order? The order of installed fonts used to be a dominant attribute in browser fingerprinting, before some browsers took steps to normalize it.

Other than that, things like presence/absence of common variants like bold, and possibly some common fonts come packaged with office software rather than the OS.

merpnderp · on May 6, 2019

How does this work? For instance I have an iPhone model, so why would screen resolution help increase identifying bits coupled with the user agent? Everyone with my phone version and OS version will have the exact same values.

gruez · on May 6, 2019

>For instance I have an iPhone model, so why would screen resolution help increase identifying bits coupled with the user agent? Everyone with my phone version and OS version will have the exact same values.

The user agent for mobile safari doesn't identify the iphone model, only that it's an iphone[1]. Knowing the precise model definitely helps to fingerprint more.

[1] random search: Mozilla/5.0 (iPhone; CPU iPhone OS 12_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0 Mobile/15E148 Safari/604.1

Wowfunhappy · on May 6, 2019

Yep, I was super disappointed when I tested this.

user17843 · on May 6, 2019

the marketing department probably decided that it needs to be done and the safari engineers realized this would completely destroy usability for a majority of users, so they quietly abandoned it.

Especially as it isn't as simple as a standardized list of fonts when the Canvas hash is 100% unique for everyone.

Still very disappointing.

saagarjha · on May 6, 2019

The Safari team has not "abandoned" this effort: they are still working on reducing the fingerprinting surface area (of recent note, Safari has removed its "do not track" preference, which was used as a fingerprinting datapoint, no longer presents installed third-party fonts to websites, and no longer supports most plugins). The issue is that they have not "solved" the issue yet.

tinus_hn · on May 6, 2019

Intelligent Tracking Protection is a work in progress in which Apple actively checks what trackers are using and disables it. The next iOS/MacOS releases will have the next version with new restrictions on tracking.

The issue will be solved when all tracking companies have collapsed and that industry is dead. Clearly we aren’t there yet.

user17843 · on May 6, 2019

the test shows unique system fonts for me.

Here's a quote: "There will also be new security measures to prevent digital fingerprinting, or the use of things like installed fonts and plug-ins to help track users across the internet even with privacy settings active. Websites will be given a stripped down, simplified system configuration so every user's Mac looks like every other user's Mac."

saagarjha · on May 6, 2019

This is what my modern Safari exposes:

Andale Mono, Arial, Arial Black, Arial Hebrew, Arial Narrow, Arial Rounded MT Bold, Arial Unicode MS, Comic Sans MS, Courier, Courier New, Geneva, Georgia, Helvetica, Helvetica Neue, Impact, LUCIDA GRANDE, Microsoft Sans Serif, Monaco, Palatino, Tahoma, Times, Times New Roman, Trebuchet MS, Verdana, Wingdings, Wingdings 2, Wingdings 3 (via javascript)

I have hundreds more fonts installed. Panopticlick says that about 1 in 14 browsers have this value, so your browser should have it too.

user17843 · on May 6, 2019

I have to withdraw my original statement, Looks like I have the same set of fonts as you.

So that part seems to work, I was just confused because the effect on overall uniqueness is very low. Still I applaud the efforts by the WebKit Team.

jwandborg · on May 6, 2019

The engineering department should have told the marketing department that they didn't implement it.

zzzcpan · on May 6, 2019

But this is how big companies behave. They have this unwritten policy to justify any anti-competitive, anti-user behavior with "security" and "privacy" PR.

tptacek · on May 6, 2019

Probably not; that task is equivalent to coordinating between browser vendors to make their software behave in every way identically, and these are generally radically different pieces of software.

saagarjha · on May 6, 2019

Ideally every browser implements web standards the same and looks identical…

tptacek · on May 6, 2019

You can't even get TCP/IP implemented consistently, which is why we can passively identify operating systems just by observing packets. What hope do you have of getting browsers to be indistinguishable?

saagarjha · on May 7, 2019

Practically, not much. Ideally, though…

tptacek · on May 7, 2019

This is the problem with anti-fingerprinting. It's a genuinely hard problem, not in the "it takes a lot of code and a lot of will to get it done" sense, but in an "it's unclear what computer science has to say about this problem right now" sense. It's up there with DRM/antimalware (two sides of the same coin).

overcast · on May 6, 2019

Out of curiosity, how are you guys handling anonymous user actions without fingerprinting? For example if you want to setup a user poll, without requiring a user account? Storing it in a cookie can just be deleted, which the agent can just clear and resubmit.

dredmorbius · on May 7, 2019

You're looking for an anti-Sybil defence. Generally, some chain-of-trust involving nonduplicable demonstration of identity.

Depending on the strength of defence required, anything from a low-cost registration fee (see Metafilter) to some form of recommendation-and-vetting or simultaneous in-person token-granting event (simultaneous to avoid entities being in two places at once). And auditing for abuse.

There have been several proposals for such systems also assuring some level of anonymity.

It's unlikely any approach will be perfect, though an arbitrary level of assurance is likely possible at some cost.

Note that surreptitiously fingerprinting and preemptively certifying to establish entity uniqueness are vastly different from a user awareness and concent perspective.

wyattjoh · on May 6, 2019

Definitely think that it would have been a good tool to counter fraud/abuse. The issue of course would be that abuse of said tooling and the rise of ad-tech is what's caused the major issues we see today. I'd imagine that it would be useful to perhaps, flag accounts that were created on the same machine as a tool to help detect when users are trying to circumvent abuse control tools.

Sure, someone can create another account using a new browser, within a VM, from another computer, inside a VPN. It's all about making it much harder. If the primary use of fingerprinting was to protect community from bad actors, like those violating a set of community guidelines, then maybe the extra effort it would take to get around those issues might give them enough time to diffuse.

TheAceOfHearts · on May 6, 2019

Fingerprinting is not an effective deterrent either, as you can just open a different browser. The user could also have multiple devices with internet access.

You could use IP address, although that only works if the user isn't on a public / shared network. It's also easily bypassed by spinning up a VM on a cloud service provider and using an SSH tunnel.

Since you used polls as an example: StrawPoll.me [0] is an online poll site which lets you select different duplication checks based on your requirements. The choices are: IP, browser cookie, none, or require user sign in. They also give you an option to add a CAPTCHA.

[0] https://www.strawpoll.me/

OrgNet · on May 6, 2019

I can change my semi-static IP address whenever I want by spoofing my router's MAC address (Comcast)

lisper · on May 6, 2019

Browser fingerprints can be spoofed too. Sock puppet accounts are equally easy to create. There is no easy way to insure that people don't cheat on on-line polls. This is why (for example) on-line voting in IRL elections is a Really Bad Idea.

sdoering · on May 6, 2019

There are multiple ways. All can be cheated.

* Cookies: Can be deleted * IP-Adress: Not unique, because ISPs rotate them; also VPN * Login: Well create a second one * Methods from Universities using nth letter of name and nth digit of birthdate: Just make up a new name.

Sorry - but unless you are using an analog medium or asking the questions in person the numbers can be inflated and there is no way to have 100% data quality.

But in most contexts this is ok. So I would probably go the most easy way: Cookie.

I would - at least not in the European Union go with fingerprinting and such stuff, as I am not sure how this plays out regarding GDPR as this would be PII you are storing.

bfirsh · on May 6, 2019

If you’re on a phone, here’s a web version: https://www.arxiv-vanity.com/papers/1905.01051/ (now with linked citations!)

user17843 · on May 6, 2019

It looks like the prevalence is low and it can easily be blocked with blocking the scripts in question.

Nevertheless, what needs to happen is that all major browser makers come together and simply create a set of standard API values that do not harm daily browsing and make it possible for users to blend in with the masses, if they opt-in to activate

It would be sufficient to create a couple of uniform user agents, list of fonts, list of plugins, canvas hash, platform and webgl data to bring the uniqueness down.

viseztrance · on May 6, 2019

There are SaSS companies that have their entire business model around creating databases of fingerprints - for example determining bots. These scripts track you over multiple websites and are very well concealed, so you won't find them on any ad blocker blacklist.

Also, living in the EU won't protect you from getting fingerprinted.

user17843 · on May 6, 2019

why do the studies referenced in the above study only show a very low <1% prevalence? Do they omitt the "good fingerprinters"?

viseztrance · on May 6, 2019

I don't think so. A good fingerprinter will use a honeypot - an easy to detect fingerpriting script in addition to the real one.

amelius · on May 6, 2019

I suspect it will always be possible to detect at least the brand of the browser, regardless of what the user-agent string says. It's just too damn hard to implement the entire web specification consistently, and malevolent actors can easily exploit any small deviations in the implementation. And of course things which are not in the web specification (such as timing) can be exploited.

wongarsu · on May 6, 2019

After you figured out the browser, it will also always be possible to find out the browser version, because newer versions implement features not present in older versions. Then you just need a hint to the OS (at least distuingishing Linux from Windows seems easy) and you have reconstructed the entire User Agent string.

amelius · on May 6, 2019

> it will also always be possible to find out the browser version, because newer versions implement features not present in older versions

True, but most browsers have an auto-update scheme, so basically everybody is on the same version at a given time (by approximation).

FabHK · on May 6, 2019

Is anyone aware of fingerprinting mitigation (extensions?) for Safari on macOS? I'd rather not abandon Safari.

harryking · on May 6, 2019

Well a nice initiative i would say

badrabbit · on May 6, 2019

Can anyone with a legal background explain to me why I can't file a restraining order that prevents companies from stalking me online?

Primarily for companies that develop shadow profiles of users but also companies like google and cloudflare where their tracking is a result of an opt-in by the site operator.

Would it be difficult to prove I am in fear (rational) as a result of their stalking?

Erik816 · on May 6, 2019

I don't believe you can get a restraining order against a company (just an actual person). You would ask for injunctive relief, which is basically the court telling the company not to do something, in this case, tracking you online.

I'm not aware of the law in this area. I'm a lawyer, and I could at least see a plausible argument, but the only way to know for sure would be to try to sue them, or find other cases where someone has successfully done so.

amelius · on May 6, 2019

Can't we give penalties to companies that use fingerprinting scripts (or include them by third parties)?

I'd like to see an organization that checks for this, and gives fines, and perhaps even withdraws the right to use a domain name. And I'd like to see more responsibility with site owners for using third party code.

By the way, I think a withdrawal of the right to use a name (brand) is a very appropriate way to penalize a serious privacy violations. Brands are all about customer trust, and if that trust is violated, then it seems to me only fair that the right to use a name is taken away.

user17843 · on May 6, 2019

One of the referenced studies (1-million study withOpenWPM, the most recent from 2016) notes that the usual fingerprinting scripts have basically disappeared since a public outcry and media attention following a lawsuit.

Prevalence:

- 1.4% for canvasfingerprinting

- 0.325% for canvasfont probing

- 0.0715% for WebRTC

- 0.0067% forAudioContext

The above were found only on the most shady of all websites, and good content blockers block all those scripts.

My bet is GDPR was the death blow for this kind of scripts. For small companies without a room full of lawyers data has become a liability.

So fingerprinting is now basically in the hands of google, amazon, etc.

mobjack · on May 6, 2019

Browser fingerprinting has been overhyped and is not a reliable method of tracking.

If it was useful, it would be much more prevalent but too many people have the same fingerprint.

IP address is a better method of tracking for comparison.

user17843 · on May 6, 2019

It is not in use because it's basically illegal, and due to relieance on JavaScrip, a simple script blocker can take down your entire business.

The industry used fingerprinters, but for one it didn't really help them make more money (because you want to track users, not systems), and there was a big backlash.

the_jeremy · on May 6, 2019

> too many people have the same fingerprint.

According to whom? My browser is unique according to panopticlick.

amelius · on May 6, 2019

> My bet is GDPR was the death blow for this kind of scripts.

Curious, was the lawsuit in that study also based on GDPR?

So far I haven't seen much coverage of GDPR based lawsuits in the media. Also, I'd like to know if there have been studies that show that EU residents can now (after GDPR) browse the web without leaving a trail of information.

user17843 · on May 6, 2019

No.

Quote about canvas fingerprinting:

"Comparing our results with a 2014 study [1], we find three important trends. First, the most prominent trackers have by-and-large stopped using it, suggesting that the public backlash following that study was effective. Second, the overall number of domains employing it has increased considerably, indicating that knowledge of the technique has spread and that more obscure trackers are less concerned about public perception. As the technique evolves, the images used have increased in variety and complexity, as we detail in Figure 12 in the Appendix. Third, the use has shifted from behavioral tracking to fraud detection, in line with thead industry’s self-regulatory norm regarding acceptable uses of fingerprinting."

[1] G. Acar,C. Eubank, S. Englehardt, M. Juarez, A. Narayanan,and C. Diaz. The web never forgets: Persistent trackingmechanisms in the wild. InProceedings of CCS, 2014.

Other references:

[10] W. Davis. KISSmetrics Finalizes Supercookies Settlement.http://www.mediapost.com/publications/article/191409/kissmet.... [Online; accessed 12-May-2014].

[15] Federal Trade Commission. Google will pay $22.5 millionto settle FTC charges it misrepresented privacy assurancesto users of Apple’s Safari internet browser. https://www.ftc.gov/news-events/press-releases/2012/08/googl..., 2012

kalleboo · on May 6, 2019

I don't think GDPR results in lawsuits - it results in complaints to a regulator resulting in (escalating) fines. The regulators are still trying to apply the legislation to the complaints they have received.