Stalking is unwanted and/or repeated surveillance by an individual or group towards another person.[1] Stalking behaviors are interrelated to harassment and intimidation and may include following the victim in person or monitoring them. The term stalking is used with some differing definitions in psychiatry and psychology, as well as in some legal jurisdictions as a term for a criminal offense.
This should also apply in the digital world. Doesn't it?
That is a fair point, for sure. But I don't think it's the whole story.
What about this: Noticing and remembering people's faces, clothes, tics, etc. when they walk up to you. Also you have blanketed the planet in cameras that all feed back to a giant database that builds a detailed dossier on almost everyone, based on faces, clothes, tics, etc.
The way you put it sounds almost accidental, like no effort was made, with no real goal in mind. That's not a faithful analogy of the pervasiveness of Google Analytics, Facebook buttons, etc.
Noticing in itself isn't but we're talking about amassing surveillance intel on people, bundling into packages and selling to the highest bidder. Or selling to hackers on black market. Normally you'd need a private detective licence and a good reason for each tracked person.
human memory isn't comparable to digital memory. When we recall a person's face we've interacted with we don't beam that information telepathically to others and we (most of us) gradually forget the face over time.
Cynical mode: The law suit would be very expensive, but imho. possible to get to the next round. This elevates for several instances, until you're out of money and the lawyers can burry it.
You can choose your battles. There's no need or obligation to take on all targets, and the legal strategy of first pursuing a tractable, or even sympathetic (to your interests) target is well established.
The cyberstalking industry has many players. Most are small. A target-rich environment.
Shifting the investment or financing calculus, via increased investor risk, possibly including legal liability, would be huge.
This is really more like sousveillance[0]. I'm not sure how I feel about it. On the one hand, recording server logs has been standard and accepted practice pretty much since there were web servers. On the other hand, we have the capability to record and discern so much more information now and to do more with that information once we've recorded it.
Ultimately, I think it should require user consent. (And, speaking of consent, I wonder if simple fingerprinting and associating a browser with an ID would violate the GDPR. It's clearly intended to be "personal" to one particular user, but is it "identifiable" information? Does it matter?)
According to the paper the most useful attributes for browser fingerprinting are:
- User agent
- List of Plugins
- Screen resolution and color depth
- Canvas
- WebGL Renderer (~GPU Model)
Maybe Canvas can be fixed to a degree, but the others feel like things that will inherently be different. Firefox with the privacy.resistfingerprinting setting solves all of these at least partially (and more), but not without some usability tradeoffs.
Browser plugins really need to go away. If it's not implementable normally then it shouldn't be on the web. In my opinion at least.
GPU models also should not be given, browsers need to abstract features and performance into something else.
Like if I'm writing a WebGL app, I should just be able to test a few baselines configs.
User agent is a tricky one, Chrome on Android gives too much info because it's built by an advertising company.
Even Firefox, Firefox Privacy on Android gives away android version in the User Agent. New Tor browser (android) fakes it, I remember even old Orfox used to give away device name as well & there used to exist an open issue to address it in their issue tracker.
There’s a difference between plugins and extensions. Plugins add support for new types of media (think Flash, Java) while extensions are your adblockers and userscripts.
Plugins also provide support for new video codecs. Plugins are definitely the wrong place for it but I would be so happy to see codecs factored out of individual browsers.
Is there any reason why we couldn't have all browsers on all platforms use the same user agent header? I realize some sites use the user agent as a crude form of feature detection, but it's my understanding that that's generally considered to be bad practice.
Sometimes there are browser bugs that are impossible to do feature detection for, in which case have no better option than looking at the user agent.
For instance, several versions of Firefox had a serious bug in shared workers, which appeared intermittently when users opened your site in multiple tabs https://stackoverflow.com/questions/51092596/feature-detecti... - I had to use the user agent to work around this.
Sure, this is a fair use case. But then can't the browser tell the end user that someone wants to read this browser data so the user has a chance to say "no"?
Apart from the difficulty of obscuring this information from JavaScript, it's politically impossible.
Chrome could change its header to the header of a competitor or a neural one, but has no reason to (Google's main business is delivering ads). For other big browsers like Firefox and Safari it would be a huge loss to change the header: they would vanish from browser market share statistics, and with that the business case to make websites work well with their browser dwindles. That in turn would reduce their actual market share. That only leaves the browser with a market share so tiny they are ignored by developers, and those browsers do often spoof the user agent header (or at least make it very convenient to do so).
you can get both primitives values+settings in addition to entropy similar to canvas fingerprint using shaders which will render slightly different across different devices.
thats like saying programs should go away if its not implementable by the OS. its just a different layer. browser plugins are tremendously useful
edit: also your comment seems to imply that plugins are used for websites trying to do something the browser doesnt support (ex flash) -- while this is sometimes done, most plugins add additional functionality for the user, ex adblockers, usability improvements, etc
The `navigator.plugins` used for fingerprinting is just for NPAPI/PPAPI stuff like Flash, Java, Silverlight. Just about all browsers are also in the process of deprecating support for them.
(it can also be possible to fingerprint browser extensions, but that is a bit trickier as it's not served up directly by the browser as a handy array)
oh yes my bad, thanks. i was thinking extensions (maybe thanks to all the ff news!). wrt fingerpritning plugins, it seems to be the least of the issues and hopefully not an issue for long :)
The first "attribute" is simply an HTTP header. As such it can be selected and changed by the user. The user may choose not to send a User-Agent header at all. IME, few sites actually require a User-Agent header in order to retrieve a page. Developers sometimes try to vary page content based on this header. For example, send a certain string containing the name of a "bot" and you might get nothing. The detection occurs before any content is returned.
The rest of these "attributes" are not HTTP headers, and occurs after the resource is fetched. We do not need to give away this information in order to retrieve the resource. The fingerprinting problem arises because the software we use to do the retrieval does far more than retrieve web pages. These other attributes cannot be detected unless the software used to retrieve the web page also includes other features that process what is retrieved, including a Javascript engine that runs code straight from the internet ("browser").
Looking to the authors of a popular browser to solve the problem they themselves have created. Is it rational to expect that this is where the solution will come from?
If one is willing to make some "useability tradeoffs" then one might use a simple http client to retrieve web pages instead of a popular browser. After the pages are retrieved, then one can use a popular browser to view them if desired; the browser need not have access to the internet. This is what I do. Not to avoid "fingerprinting" but just because it is faster and more robust.
With the help of a trivial program that transforms urls into http (I wrote one), one can retrieve the text/html of web pages, without requesting all the third party advertising cruft, with any tcp client such as netcat. HTTP pipelining becomes easy, making consumption of large quantities of information more efficient -- many pages, one connection. The two programs together form an "http client" with, relative to a popular browser, or even a program like curl, very low code complexity.
Whenever I see the "fingerprinting" debate come up, I always go through the same thought experiment: For situations where "fingerprinting" is a problem, web users all use the same simple http client. All send the same minimal headers. The web must accomodate users not the other way around.
>The first "attribute" is simply an HTTP header. As such it can be selected and changed by the user.
Maybe if you have script disabled, but if you have javascript it's trivial to detect which browser you're on based on javascript implementation quirks alone.
>We do not need to give away this information in order to retrieve the resource.
You clearly haven't seen mandatory javascript sites.
You are making the assumption that I am using a browser to retrieve the page. I can use a simple http client with no features such as JS to retrieve the page. I have full control over what headers I send and the contents of those headers.
"You clearly haven't seen mandatory javascript sites."
I have seen many sites that prod the user to enable JS yet still return the same text/html even when JS is disabled. Sometimes I am only after the content of the page: text, URLs pointing to images or other resources, URLs pointing to other sites, etc.
I have seen some sites where there is "no content" in the initial page fetched, with or without JS enabled. These are dummy sites, hollow shells, skeleton pages. The technique is simply misdirection. All the useful content is pulled from another site via a second request (JS triggering the request). Maybe this is a CDN. I simply read the JS or use a debugger to find this "endpoint". Then use the simple http client to request the content as usual. These dummy sites are not the majority of sites on the web, nor even the majority of sites posted to HN.
The endpoints supporting these "no content", dummy sites often deliver the text with fewer or no tags, without HTML. Maybe it is JSON. This makes it easier for me to format it the way I like it. What many HN readers may perceive as a growing nuisance I believe could be a blessing in disguise. I prefer delivery of content in a more "raw" format allowing the user to process it as she pleases client-side, with the software of her choice. This leaves no room for advertising. It is more efficient and the user becomes the creative one, deciding how she wants to present the content to her own eyes.
For commercial use of the web, things like internet banking, purchasing, etc., I use what is "recommended". The popular web browser.
For non-commercial use of the web, I use what I want to use. A simple http client and/or text-only browser.
My day-to-day use of the internet is almost always 100% non-commercial.
I never want to use a Javascript-enabled browser for simply retrieving information from the web. For example, reading. This is non-commercial activity.
For simple information retrieval, I find a large, complex, popular, recommended "browser", a single, "do-everything" program, is overkill and, counterintuitively perhaps, such browsers loaded with "features" actually limit what I can do. Using smaller, simpler "one-purpose" programs to retrieve web pages allows me more flexibiilty. I can be more productive.
Using single-purpose programs, I also find the retrieval process to be more reliable and robust, not to mention more transparent. I know exactly what I am sending. Unlike the popular browsers, these programs are not fetching and running code from the internet automatically. I feel a greater sense of control.
What plug-ins I have shouldn’t concern the server on the other end. Why does is it transferred?
Canvases should never be readable by the other end without explicit permission. Basically transmitting anything rendered to a canvas should be equivalent to using my web camera from an integrity standpoint.
Time zone, operating system (from a short list) and language is fine, because it’s not a lot of entropy and my ip is known anyway unless on a vpn.
Gl renderer/graphics card, desktop resolution - nah. The viewport dimension is all that should be known.
Basically the browser should by default have an entropy budget for my entire fingerprint that ensures I’m not personally identifiable.
How can you not transfer the info about plugins? You test them the in front, can he done in million ways with JavaScript and send it to a server of your choice.
This could be fixed with a bit of courage. Firefox could use "Firefox", Safari "Safari," etc. UA sniffing and UA strings are completely absurd at this point, and need to go away.
> - List of Plugins
JavaScript should not see this, especially not their versions.
The last three are harder to avoid for "responsive" web design. One attribute you didn't mention was the font list. There is no good reason for JavaScript to be able to see this. The model should be that the page requests a font, and the browser uses either that or its best try at a substitute.
It's a nice idea, but major sites still do user agent sniffing. Try setting your UA to one of those strings and visiting google.com -- the SERP looks like it came from 2009.
If Firefox tried that, it would instantly become that weird browser that makes Google look bad. The internet would be flooded with tips on how to set a custom UA that made it pretty again.
> Try setting your UA to one of those strings and visiting google.com -- the SERP looks like it came from 2009.
I just tried, and it's great! I had forgotten that search could be so snappy and simple. The more likely outcome, unfortunately, is that Google would adjust its UA-sniffing to serve the same 2019 bloat to "Firefox."
You can't get around giving nearly the entirey of the User Agent string simply thorugh feature detection (well, except for the phone model when on mobile). Canvas and WebGL can be fixed by limiting their functionality, or at least that's what Firefox does in the aforementioned mode. Canvas could conceivably be fixed to just behave the same everywhere, but for WebGL you have to limit it to some common denominator to hide the GPU model, so privacy nessesitates less functionality. The list of plugins can be spoofed to be empty. Screen resolution (or more importantly, window size) has to be somewhat accurate because lots of websites use it for responsive design. Firefox can round the window size to the nearest multiple of 200x100 which doesn't break too many things, but that's still far from giving the same value for everyone.
> Canvas could conceivably be fixed to just behave the same everywhere, but for WebGL you have to limit it to some common denominator to hide the GPU model
You can just measure performance of those things then.
And even if the browsers are identical there are still some ways to detect different user based on client side behavior (assuming JS is enabled, or some particularly crazy CSS)
Just giving the same value for everyone on the latest version of Firefox using Windows 10 with a maximized window on a 1920p display would seriously hamper fingerprinting. It can still be enough when used with other information (IP, visited sites etc), but at least it would dilute the signal a lot.
Chrome on Android reports your phone model and build number.
If you're using a lessor known phone with a carrier specific ROM, then you're in a really tiny population.
If you use a common device and browser kept up to date to the latest versions, then you are not going to be unique. Using Safari on an iPhone will make you similar to 30% of web traffic.
Browsers also update all the time changing the user agent so it isn't that reliable tracking over extended periods of time.
The am I unique websites seem to compare your user agent to a database with outdated browser versions from the past data, making your score look much scarier than it really is. If you keep things up to date, chances are that you are not that unique.
I'm a bit old school when it comes to browser tracking, and as such I'm not as familiar with Canvas. What exactly is Canvas? What information is provided which can be tracked? Are there ways to disable Canvas without installing browser extensions?
Basically different devices and browsers render high-level graphical objects, such as fonts, differently. When you visit a site that does canvas tracking, a small javascript snippet will ask your browser to render a few graphical things on an html5 canvas and export that canvas to a png. The hash of that png is a good fingerprint of your hardware, browser and rendering software.
Basically Canvas is a JavaScript API for programmatically drawing graphics and text. Subtle differences in color, font rendering, antialiasing etc will produce different rendering on different hardware depending on OS, browser, GPU, drivers, etc. This lets a site running JavaScript to generate a hash unique to everyone with your specific combination of software and hardware and uniquely identify you without cookies.
I've given up on technological solutions to this. We need a "do not track" bit for browsers and it needs to carry the weight of law behind it. Violators need to be named, shamed, and fined.
In fact, it is _primarily_ used as a point of entropy for tracking, and does almost nothing to prevent tracking, so all-in-all it's a huge net negative. And because it's mostly turned off, turning it on to be tracked less actually causes you to be tracked more.
Can the major browser vendors not get together and implement a set of standard values and defaults that can be activated to reduce one's fingerprint, at least enough to make it non-unique? Something like what Tor Browser does but expanded to include all the major browsers. The way is open, the question is if there is will.
You can have what Tof browser does by using Firefox and setting privacy.resistfingerprinting in about:config. But be prepared to solve a lot of recaptchas, and don't hold your breath on Google implementing something similar in Chrome. Google is still an adtech company. They didn't spend millions (if not billions) on making and promoting Chrome only to support fingerprint resistance.
This will only shift efforts to extract uniquely identifiable information through side channels instead of directly. So you also need to disable javascript or redesign it to prevent side channel attacks (it's possible, but would be a completely new javascript compiler with not entirely compatible language, also this would mitigate spectre attacks).
However first you need some serious regulations to split Google and take away its browser "business". As it is Google is doing everything to preserve and advance tracking and fingerprinting in web browsers.
Apparently one in 900 browsers has my list of system fonts, by far the most identifying thing Panopticlick could detect. However, the font list looked completely vanilla to me. I wonder if there's a telltale one in there that I installed manually.
Are they in alphabetical order? The order of installed fonts used to be a dominant attribute in browser fingerprinting, before some browsers took steps to normalize it.
Other than that, things like presence/absence of common variants like bold, and possibly some common fonts come packaged with office software rather than the OS.
How does this work? For instance I have an iPhone model, so why would screen resolution help increase identifying bits coupled with the user agent? Everyone with my phone version and OS version will have the exact same values.
>For instance I have an iPhone model, so why would screen resolution help increase identifying bits coupled with the user agent? Everyone with my phone version and OS version will have the exact same values.
The user agent for mobile safari doesn't identify the iphone model, only that it's an iphone[1]. Knowing the precise model definitely helps to fingerprint more.
[1] random search: Mozilla/5.0 (iPhone; CPU iPhone OS 12_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0 Mobile/15E148 Safari/604.1
the marketing department probably decided that it needs to be done and the safari engineers realized this would completely destroy usability for a majority of users, so they quietly abandoned it.
Especially as it isn't as simple as a standardized list of fonts when the Canvas hash is 100% unique for everyone.
The Safari team has not "abandoned" this effort: they are still working on reducing the fingerprinting surface area (of recent note, Safari has removed its "do not track" preference, which was used as a fingerprinting datapoint, no longer presents installed third-party fonts to websites, and no longer supports most plugins). The issue is that they have not "solved" the issue yet.
Intelligent Tracking Protection is a work in progress in which Apple actively checks what trackers are using and disables it. The next iOS/MacOS releases will have the next version with new restrictions on tracking.
The issue will be solved when all tracking companies have collapsed and that industry is dead. Clearly we aren’t there yet.
Here's a quote: "There will also be new security measures to prevent digital fingerprinting, or the use of things like installed fonts and plug-ins to help track users across the internet even with privacy settings active. Websites will be given a stripped down, simplified system configuration so every user's Mac looks like every other user's Mac."
But this is how big companies behave. They have this unwritten policy to justify any anti-competitive, anti-user behavior with "security" and "privacy" PR.
Probably not; that task is equivalent to coordinating between browser vendors to make their software behave in every way identically, and these are generally radically different pieces of software.
You can't even get TCP/IP implemented consistently, which is why we can passively identify operating systems just by observing packets. What hope do you have of getting browsers to be indistinguishable?
This is the problem with anti-fingerprinting. It's a genuinely hard problem, not in the "it takes a lot of code and a lot of will to get it done" sense, but in an "it's unclear what computer science has to say about this problem right now" sense. It's up there with DRM/antimalware (two sides of the same coin).
Out of curiosity, how are you guys handling anonymous user actions without fingerprinting? For example if you want to setup a user poll, without requiring a user account? Storing it in a cookie can just be deleted, which the agent can just clear and resubmit.
You're looking for an anti-Sybil defence. Generally, some chain-of-trust involving nonduplicable demonstration of identity.
Depending on the strength of defence required, anything from a low-cost registration fee (see Metafilter) to some form of recommendation-and-vetting or simultaneous in-person token-granting event (simultaneous to avoid entities being in two places at once). And auditing for abuse.
There have been several proposals for such systems also assuring some level of anonymity.
It's unlikely any approach will be perfect, though an arbitrary level of assurance is likely possible at some cost.
Note that surreptitiously fingerprinting and preemptively certifying to establish entity uniqueness are vastly different from a user awareness and concent perspective.
Definitely think that it would have been a good tool to counter fraud/abuse. The issue of course would be that abuse of said tooling and the rise of ad-tech is what's caused the major issues we see today. I'd imagine that it would be useful to perhaps, flag accounts that were created on the same machine as a tool to help detect when users are trying to circumvent abuse control tools.
Sure, someone can create another account using a new browser, within a VM, from another computer, inside a VPN. It's all about making it much harder. If the primary use of fingerprinting was to protect community from bad actors, like those violating a set of community guidelines, then maybe the extra effort it would take to get around those issues might give them enough time to diffuse.
Fingerprinting is not an effective deterrent either, as you can just open a different browser. The user could also have multiple devices with internet access.
You could use IP address, although that only works if the user isn't on a public / shared network. It's also easily bypassed by spinning up a VM on a cloud service provider and using an SSH tunnel.
Since you used polls as an example: StrawPoll.me [0] is an online poll site which lets you select different duplication checks based on your requirements. The choices are: IP, browser cookie, none, or require user sign in. They also give you an option to add a CAPTCHA.
Browser fingerprints can be spoofed too. Sock puppet accounts are equally easy to create. There is no easy way to insure that people don't cheat on on-line polls. This is why (for example) on-line voting in IRL elections is a Really Bad Idea.
* Cookies: Can be deleted
* IP-Adress: Not unique, because ISPs rotate them; also VPN
* Login: Well create a second one
* Methods from Universities using nth letter of name and nth digit of birthdate: Just make up a new name.
Sorry - but unless you are using an analog medium or asking the questions in person the numbers can be inflated and there is no way to have 100% data quality.
But in most contexts this is ok. So I would probably go the most easy way: Cookie.
I would - at least not in the European Union go with fingerprinting and such stuff, as I am not sure how this plays out regarding GDPR as this would be PII you are storing.
It looks like the prevalence is low and it can easily be blocked with blocking the scripts in question.
Nevertheless, what needs to happen is that all major browser makers come together and simply create a set of standard API values that do not harm daily browsing and make it possible for users to blend in with the masses, if they opt-in to activate
It would be sufficient to create a couple of uniform user agents, list of fonts, list of plugins, canvas hash, platform and webgl data to bring the uniqueness down.
There are SaSS companies that have their entire business model around creating databases of fingerprints - for example determining bots. These scripts track you over multiple websites and are very well concealed, so you won't find them on any ad blocker blacklist.
Also, living in the EU won't protect you from getting fingerprinted.
I suspect it will always be possible to detect at least the brand of the browser, regardless of what the user-agent string says. It's just too damn hard to implement the entire web specification consistently, and malevolent actors can easily exploit any small deviations in the implementation. And of course things which are not in the web specification (such as timing) can be exploited.
After you figured out the browser, it will also always be possible to find out the browser version, because newer versions implement features not present in older versions. Then you just need a hint to the OS (at least distuingishing Linux from Windows seems easy) and you have reconstructed the entire User Agent string.
Can anyone with a legal background explain to me why I can't file a restraining order that prevents companies from stalking me online?
Primarily for companies that develop shadow profiles of users but also companies like google and cloudflare where their tracking is a result of an opt-in by the site operator.
Would it be difficult to prove I am in fear (rational) as a result of their stalking?
I don't believe you can get a restraining order against a company (just an actual person). You would ask for injunctive relief, which is basically the court telling the company not to do something, in this case, tracking you online.
I'm not aware of the law in this area. I'm a lawyer, and I could at least see a plausible argument, but the only way to know for sure would be to try to sue them, or find other cases where someone has successfully done so.
Can't we give penalties to companies that use fingerprinting scripts (or include them by third parties)?
I'd like to see an organization that checks for this, and gives fines, and perhaps even withdraws the right to use a domain name. And I'd like to see more responsibility with site owners for using third party code.
By the way, I think a withdrawal of the right to use a name (brand) is a very appropriate way to penalize a serious privacy violations. Brands are all about customer trust, and if that trust is violated, then it seems to me only fair that the right to use a name is taken away.
One of the referenced studies (1-million study withOpenWPM, the most recent from 2016) notes that the usual fingerprinting scripts have basically disappeared since a public outcry and media attention following a lawsuit.
Prevalence:
- 1.4% for canvasfingerprinting
- 0.325% for canvasfont probing
- 0.0715% for WebRTC
- 0.0067% forAudioContext
The above were found only on the most shady of all websites, and good content blockers block all those scripts.
My bet is GDPR was the death blow for this kind of scripts. For small companies without a room full of lawyers data has become a liability.
So fingerprinting is now basically in the hands of google, amazon, etc.
It is not in use because it's basically illegal, and due to relieance on JavaScrip, a simple script blocker can take down your entire business.
The industry used fingerprinters, but for one it didn't really help them make more money (because you want to track users, not systems), and there was a big backlash.
> My bet is GDPR was the death blow for this kind of scripts.
Curious, was the lawsuit in that study also based on GDPR?
So far I haven't seen much coverage of GDPR based lawsuits in the media. Also, I'd like to know if there have been studies that show that EU residents can now (after GDPR) browse the web without leaving a trail of information.
"Comparing our results with a 2014 study [1], we find three important trends. First, the most prominent trackers have by-and-large stopped using it, suggesting that the public backlash following that study was effective. Second, the overall number of domains employing it has increased considerably, indicating that knowledge of the technique has spread and that more obscure trackers are less concerned about public perception. As the technique evolves, the images used have increased in variety and complexity, as we detail in Figure 12 in the Appendix. Third, the use has shifted from behavioral tracking to fraud detection, in line with thead industry’s self-regulatory norm regarding acceptable uses of fingerprinting."
[1] G. Acar,C. Eubank, S. Englehardt, M. Juarez, A. Narayanan,and C. Diaz. The web never forgets: Persistent trackingmechanisms in the wild. InProceedings of CCS, 2014.
I don't think GDPR results in lawsuits - it results in complaints to a regulator resulting in (escalating) fines. The regulators are still trying to apply the legislation to the complaints they have received.
Stalking is unwanted and/or repeated surveillance by an individual or group towards another person.[1] Stalking behaviors are interrelated to harassment and intimidation and may include following the victim in person or monitoring them. The term stalking is used with some differing definitions in psychiatry and psychology, as well as in some legal jurisdictions as a term for a criminal offense.
This should also apply in the digital world. Doesn't it?