A particularly bad instance of link tracking I've found is in TikTok's link sharing feature.
If you share a link from the TikTok app, it gives you a vm.tiktok.com/[xyz] link to send/post elsewhere. It gives you no indication that this isn't a generic link to the post, nor does it give you an option to expose the generic link to the post.
Instead, when you share that link and someone clicks on it and does not have the app, it opens with a header saying "[First Last] is on TikTok." On the other hand, once you do click on that link (if and only if you don't have the app installed), you get redirected to the static link to the video and finally obtain it.
This is an anti-pattern that enables further tracking and potentially unknowingly exposes user data when links are shared publicly. And there's no indication to the user that this is happening, since the link is structured as if it does not contain any tracking. Ie a tool like this wouldn't be able to "strip out" the tracking since it isn't tacked on in any way, but embedded as the generated link itself.
Any company running out of mainland China is going to have serious privacy problems due to CCP influence and their need to comply with both local laws and the government’s interest in influencing public sentiment.
I will not defend Urbit, though I count myself lucky to read fossuser's posts where I find them. I would argue about digital land ownership and what the internet's users (and possible future users) need though. I'm interested in what the internet is for (like everything else). Would you like to have a serious conversation about that? That does seem a worthy topic, even if Urbit does not address it effectively enough (I'm still not convinced it won't centralize power). I suggest the workers must own the means of production. What do you think?
- IDs stop the spam problem and give people control over something that keeps its reputation (and they're cheap).
- Federated systems normally suck because administering the servers and keeping decentralized versions in sync is hard. Urbit's design fixes this.
- Encrypted by default, ability to be as easy to run as FB (eventually, not right now). Peer to peer with the address space and key issues solved from first principles.
- Stability over long time horizons due to design (goal being indefinite), the urbit abstraction layer doesn't change and state can always be recomputed - changing pieces are implemented via jets to communicate with whatever underlying OS is doing the normal stuff.
It's a clever design and solves a lot of problems with modern computing, people often dismiss it out of hand because Yarvin's politics are stupid (he's no longer involved in the project and hasn't been for some time). Peter Thiel's Trump support was stupid too, but that doesn't mean he doesn't get a lot of other stuff right.
Still, it’s obvious that the CCP is more competent in executing centralized, longterm plans, and has much less cultural/institutional pressure not to seriously screw over people.
You are correct that China cannot project power in the sense that they can't easily invade a country or level shattering economic sanctions but they have proven themselves quite capable of targetting individuals in other nations both online and in the real world.
Either way, there is a moral imperative to prevent China from gaining the ability to project power the way the US can. The US being able to project that kind of power is shitty, and the two entities being able to do that is even shittier.
I appreciate that you recognize the US being able to do it is shitty.
I wonder how many non-Americans think two entities being able to do it is better than one, because at least they can counter-balance each other.
Not just with force; I was recently thinking about how the US during the cold war tried to be "nice" to the "third world" to keep them out of the "sphere of influence" of the Soviet Union. Currently China is trying to project it's "soft power" that way too to get less developed countries into it's patronage, but the US isn't really doing that at the moment (see for instance approaches to distributing covid vaccine...).
I (who is a usa citizen) personally am not really sure which is preferable, only one super-power, or two. Either way the world is in for a rough ride.
Let me assure you, Chinese power is BAD. And I personally hate using that word, but here it is warranted.
At least in the U.S., we have the intent of moral fiber permeating through our founding document, and probably at least half the population still fervently believes in these principles or tries to behave as though they matter.
China doesn't give a fuck. Their government is a communist dictatorship, and their sole concern is the expansion of their power through force.
It's so hard to take you seriously when you make claims like this. Nobody cares about your founding documents, folks look at what the US does. And what it does is send drone strikes to hit schools, it bullies countries into doing what it wants without offering anything in return, etc. I genuinely cannot currently see a worse superpower.
We also invented air conditioning and cars and airplanes and refridgerators and the internet. Global poverty is at a historic low right now. Violent crime, at least here in the U.S., is at an all time low.
With gratitude as a dominant force in ones's life, one is able to step back and see pictures other than what one wishes to see. One is able to stop buying into hysteria.
It is sad that some politicians use military as their play tool for nefarious purposes. Biden sent troops into Syria almost immediately upon assuming his current position. Reprehensible.
The main reason global poverty is at a historic low is that China has lifted a record number of people out of extreme poverty. On the other hand, the Middle East has become more impoverished, especially in Yemen and Syria, where the US has been instrumental in destabilization: "These numbers confirm a downward trend in poverty rates in East Asia and the Pacific, reducing the poverty headcount ratio at the international poverty line from 2.1% in 2015 to 1.0 % in 2019, driven by decreases in poverty in China and the Philippines. In contrast, spurred by the conflicts in Yemen and Syria, the Middle East and North Africa region has seen a sharp reversal, with the poverty rate increasing from around 2.1% in 2013 to 4.3% in 2015 and 7% in 2018." Source: https://blogs.worldbank.org/opendata/march-2021-global-pover...
> With gratitude as a dominant force in ones's life, one is able to step back and see pictures other than what one wishes to see. One is able to stop buying into hysteria.
But... this doesn't apply to looking at China, only to looking at the US, for some reason?
This is a very confusing conversation, you seem to be switching the parameters of what we're talking about.
We started out talking about the general dangers of a superpower in the world, especially to people not citizens of that superpower. Is that still what we're talking about? Are the purported invention of air conditioning or the internet in the US relevant to that conversation? Is "gratitude as a dominant force in one's life" relevant to it? How does "gratitude as a dominant force in your life" effect your view of China? How should it effect the view of someone in a country getting significant investment or foreign aid (or cough vaccines) from China? Are you asking us to have a different attitude toward evaluating the danger of the US as a superpower to the rest of the planet vs evaluating the danger of China as a superpower to the rest of the planet? With one we should center gratitude and avoid hysteria, but with the other we should.... center hysteria and avoid gratitude?
That “intent of moral fiber permeating through our founding document” failed to do achieve much good for the the peoples of Yemen, Iraq, Iran, Afghanistan, Cuba, Vietnam, African slaves, or Native Americans.
Those who live in glass houses shouldn’t throw stones.
Many countries have their own domestic social networks with varying popularity - some are way more active than FB, others are more or less dead. Russia is an example of the former.
Other countries don't use social media much, because they are culturally just not as interested in it.
There's a few countries where you can't really avoid being on any social network and that social network is not domestic, but those you can probably count on one hand. Off the top of my head I can just come up with Australia, India and Indonesia.
Curious in which category you think the majority of European countries fall into? The domestic social network one, or the don't use social media one?
Because as far as I can tell US social networks are the norm there and there very much is a social expectation that you are reachable on them, even if it's not government mandated.
> there very much is a social expectation that you are reachable on them
I disagree. People will (somewhat) expect you to have a WhatsApp in many European countries, but hardly anyone will expect you to use Facebook.
At least in my circles Facebook is a wasteland. Many people haven't even posted anything in years, and if I was trying to reach anyone via Facebook I'd settle in for a long wait - until they check it in a month or two.
You won't notice it if you just open Facebook, because Facebook will fill your feeds with people who are active, but when I go through my list of contacts there it's obvious less than one in five are still actively using it.
I don't claim total knowledge of the situation everywhere, but I do keep in contact with people of a lot of different countries.
Europe has been deeply colonised by US tech companies.
I do not agree with Chinese stance on democracy or human rights but I admire their willingness to play by their own rules. Not opening their markets and rolling out the red carpet for Silicon valley was wise.
Any company running out of mainland USA is going to have serious privacy problems due to USA influence and their need to comply with both local laws and the government’s interest in influencing public sentiment.
Yes, if you care about privacy both the large Chinese services and the large American services are bad.
If you use Facebook or Instagram assume that the NSA has all your data, and that someone might try to manipulate you. If you use TikTok assume that China has all your data, and someone might try to manipulate you. You either choose your poison, or you stay on services that aren't in the limelight
One big difference is in the US the companies are not required to manipulate content to serve USG interests. TikTok may downrank or censor HongKong videos because the government forces them to - the same does not happen at American companies.
I think the 'assume they have all of your data' is paranoid (particularly for encrypted stuff like whatsapp), but people should probably more careful about this kind of thing than they are anyway. The US has laws and rules around access, you may not agree with them - but they are far and away better than the CCP's approach.
The CCP is running concentration camps for a minority population of their own citizens, invading and taking over neighboring countries (HK with an eye towards Taiwan), and censoring pooh bear from the internet because of a light hearted comparison to Xi. The police call foreign students in the US to threaten them over their internet activity: https://www.vice.com/en/article/jgxdv7/chinese-police-are-vi...
End-To-End encryption is useless if like in the case of WhatsApp you don't control the client, but a company beholden to US secret courts does. "“For the past decade, N.S.A. has led an aggressive, multipronged effort to break widely used Internet encryption technologies,” said a 2010 memo describing a briefing about N.S.A. accomplishments" [1]
> The US has laws and rules around access
I'm not a US citizen and reside outside the US, which from my limited legal understanding means that the US law doesn't give a crap about me
I agree that in recent decades China has a worse human rights record, which is a major factor when you "choose your poison".
All reasonable points, though I think whatsapp is secure - I think for most people the best choice is Signal for general messaging and assuming everything else is largely public.
Even in Signal people can and do take screenshots, so really probably just best to be cautious of anything in writing that you wouldn't want published.
This is one reason I'm excited about Urbit - I think it'll be cool to get out of the dependence on centralized services.
You're right about Chinese government behavior. But “the same does not happen at American companies.” --- no, but they censor the internet in obedience to Pakistani demands.
That was excellent. Especially the dialog, totally on point. The horrible thing is that it is barely an exaggeration. US companies are in bed with a government that is conducting an actual genocide, as you point out. And then there is the middle east....
But as far as Google and Pakistan goes, most people who have an inkling of this think that the censorship only affects results served within Pakistan. But, in fact, the censorship affects search results served within the US. Google has allowed the Pakistani government, as well as various pressure groups and other governments, to influence what US people see within the US.
Why is it frustrating when others point out that the most popular services, which are usually from the USA, also have the same kind of problem of being under the influence of the respective government, but nobody seems to be as worried about it? The criticism almost always comes up whenever a service provided by a Chinese company is mentioned in any context. China has shown no interest that I know of in spying on non-Chinese citizens, so I feel like it's probably less problematic to use a chinese service than an American one if your only worry is that someone is spying on you, specially considering how there's plenty of evidence of the USA spying on the whole bloody planet, including heads of state of allied countries, for f'sake...
>It often feels like the work of bots or government shills
Do you think I'm a bot because I disagree with you? Maybe you are the bot... how can we verify you're not? :D good luck getting to the bottom of that.
> China has shown no interest that I know of in spying on non-Chinese citizens
Assuming that this assertion is true what motivates China to be so authoritarian towards their citizens but not so to the rest of the world? Is it altruism or inability?
Does China only spy on their own citizens but not the rest of the world because they like the rest of the world more than their own citizens and they want the rest of the world to have rights and freedoms that they believe their own citizens don't deserve?
If it comes down to an inability to spy on the rest of the world what do you think will happen when China does develop the ability to spy on the rest of the world?
> China has shown no interest that I know of in spying on non-Chinese citizens
I believe China has kept tabs on 2 groups of non-Chinese citizens: 1. foreign nationals within China borders, and 2. foreign nationals who are ethnically Chinese.
It's frustrating because it's not the topic of conversation, and it only serves to derail it as we're doing now.
If we're discussing the high cost of apples and someone brings up oranges, it doesn't change the fact that apples are expensive.
> Do you think I'm a bot because I disagree with you?
No, but a lazy comment doing s/China/USA/ certainly reads like it. And if you've seen some of the threads on Reddit or Twitter it becomes pretty clear some accounts search for any negative discussion about China and interject with whataboutism, which would be pretty easy to automate.
Yes it is, because the topic invariably includes "alternatives" or things that are "better". Those things are nearly always things the US has made. They are not better. Yes, it's bad. Everyone knows this. Adding value to that conversation is giving alternatives or offering some new insight into the nature of badness and how it's different flavours should be looked upon.
Introducing ideas associated to the topic that expand it and offer new perspectives is of course welcomed.
The issue with the China apologist comments is that their intent is to damage control by attacking the negative sentiments that would make the country look bad and steering the conversation towards the evils of other countries, i.e. whataboutism. This behavior is so prevalent online that you'd be hard pressed to find any discussion that criticizes China without it.
I'd also find it frustrating if in any thread that criticizes the US there would be comments about how China is worse. That might be the case, but it doesn't minimize US' problems and only distracts from discussing them. Yet such comments are much less frequent--I'm not sure if I've ever seen one.
With websites, at least you can just copy the URL from the address bar and clean it. Of course, people are being slowly dumbed down by browser's (mostly Chrome, but Firefox seems to follow its stupid trends not long afterwards) attempts at removing or hiding the URL, which is no surprise when you realise that herding the userbase to use dedicated "share" buttons (complete with tracking) is one of the reasons they're doing that.
The dedicated share buttons will often give you a link generated on the fly, with all the tracking info on the back end. For example, if google was to do this (which they thankfully don't), the link might look like "google.com/?query=cce1602b-5af6-4d95-965b-e88450afc266", and in the database there would be all sorts of tracking info tied to it. I can't edit that URL to dissaciate from that information, so if I share it, they would know it was me who shared it, and not someone else visiting it on their own.
Of course, companies can and do track you via less obvious means all the time, but this is just one small way you can foul a data point for them.
> They absolutely do that, but when you copy-paste them to share elsewhere, you can manually strip all the tracking info out.
If they create custom urls for everyone that look like https://website.com/uuid/ and don't redirect you to the real url... it is not possible to strip anything unless you do some research to find another URL that redirects you to the same page. Not sure what that would do to your search engine rankings though...
Stack Overflow does something similar, and adds a user tracking ID to any shared link, though apparently it's possible to remove it without breaking the link[1].
I only noticed when I received a badge for how many times it was clicked, and even though it's not nefarious I'd still prefer it to be opt-in rather than done by default.
Yes, I regularly warn people on Reddit that their full name is being leaked in the TikTok link they shared. I have an iOS shortcut that expands the URL and chops off the gross tracking stuff so I can share links in private/public without exposing my TikTok "name" (I don't link any accounts and my name is made up).
> I have an iOS shortcut that expands the URL and chops off the gross tracking stuff
Ooo, that's pretty neat. I wonder if something similar can be achieved on Android. I usually manually paste it in chrome and copy the redirect, although I also enable desktop view to not get the mobile link.
You might want to look at some basic Android automation tools. I'm pretty sure I've seen some before but I don't know any off the top of my head. It was really simple to write the iOS Shortcut, all I did was:
* Accept a URL as input
* Expand the URL to the full link
* Find the "?" in the new url and snip everything after and including it.
Originally I looked for ".html?" but some TikTok links don't have the ".html" anymore so I had to switch to just "?". Tasker for Android [0] might be what you are looking for but I can't be sure. You might want to ask on the subreddit [1] for help or search there for something similar.
VRBO is another egregious example. My friend asked what I thought about a house she was thinking of renting for a trip. VRBO wouldn't let me view the link on my phone unless I downloaded their app. I had her copy and paste the house's description which I then Googled to get to the right listing.
When twitter's snowflake was lengthened recently I was worried they might be doing this too. I'm afraid of the big ones moving to this. Spotify, instagram, twitter, etc
Assuming any certificate pinning can be defeated, it is easy to manipulate URLs with a loopback-bound forward proxy. Would be great if someone provided example of one of these TikTok URLs so we could investigate.
I have a self-written set of userscripts that does this, as well as unsetting javascript link rewrites and including bitly link expansion and Amazon URL decluttering. I would love to be able to use it on Firefox for Android again, but I don't see them enabling e.g. Tampermonkey any time soon.
If any shortlink uses bitly as a backend, you can expand it yourself by copying the link and adding a "+" at the end, bringing you to the bitly properties page for that link.
ClearURLs is being discussed. It changes the URL you are visiting to remove tracking info. There are preexisting plugins that do the same thing with shortened URLs—unmasking them and thus untrackifying them.
So mock and downvote all you want. I don't see why ClearURLs couldn't add this functionality.
Edit: Or am I just being downvoted by people who don't want anybody to know that it's possible to stop this form of tracking?
I fail to understand how you'd "unmask" a backend-obfuscated URL (where you just have an ID, and there's no way to get the target URL by just looking at the URL) without opening the URL, defeating the purpose of improving privacy we're discussing here.
Or maybe you and OP don't care about the privacy part of the problem, and you just want to automate getting the "canonical" / "non-personal" one from the "masked" one?
a service expanding that link one time to give you the underlying static url without tracking before sharing is far better than even one real person clicking it, wouldnt you agree? the trackers would know at least one person clicked it but thats about it?
To you and sibling comment: oooookay, you're thinking from the position of an obfuscated link sharer/sender, not receiver.
You want ClearURLs (or something else) to always resolve to a canonical link, so that you're easily able to share this canonical URL, and to never have a tracked URL in your URL bar so that you don't share it by mistake. Makes sense.
This works if the sender of the shortened link wants to protect other's privacy preemptively. Then they could certaintly follow the link, log a single click, then grab the final url and share that.
But the average person isn't going to do that. They will share the nice, short, pretty url that tiktok gives them. But once someone gives you that shortened url, there is no way for you to view the video on the other end of that URL without being tracked. You would need to follow the link, tiktok would track you, only after they have logged the data will they send your browser a redirect to the proper url.
1. When I go to share a link, automatically trace it and remove all tracking so I get the final URL without any tracking parameters attached.
2. When I am sent a link with tracking parameters as a part of it, or a shortened link, send it to a remote server which will follow the links until it finds the final destination and removes tracking parameters, then send it back to me.
Both approaches have downsides. The first is nice for when I send a link to a friend but not when I get a link in an email from a company. This happens to me all the time and since I use NextDNS to block trackers I often can’t even get to the final website because of the various trackers I would have to go through to get to it which are blocked at the DNS level. I am still trying to figure out a good solution to this.
The second has the obvious privacy problem: who is watching the watchers?
ronjouch explains how it's not really possible to stop this form of tracking below. In order to unmask the URL, you need to pretty much visit the URL, which registers the tracking data, so even if you, as a user, gets a stripped URL that's safe to use, you will still have effectively clicked the link.
If you are "unmasking" the URL it's because either you already visited or you are going to visit it? The masked URL and the unmasked URL are hosted by the same entity.
Unmasking (by the sender or a trusted intermediary, such as Tor) removes the risk of leaking the sender data to the (transitive) recipient
>I don't see why ClearURLs couldn't add this functionality.
I think the problem is that, for security reasons, ClearURLs can't change URLs arbitratily. It can only remove parts of it, so the actual URL would have to be a parameter. See [1] for a relevant comment by the extension's author.
> Or am i just being downvoted by people who don't want anybody to know that it's possible to stop this form of tracking?
I think you are confused how this works. Because it would NOT be possible to stop this type of tracking. That is why you are being downvoted. The downvotes are because you are simply wrong, not because there is a conspiracy on HackerNews of people that don't want other people to know that it is possible to stop tracking.
Here's how it works: In the example given above, you only have the url vm.tiktok.com/[short-url-id]. This URL does not represent anything on its own. When you click the link, it goes to a tiktok server that looks up the `[short-url-id]` portion of the url in a database, which contains the actual video id/url that is trying to be shared, along with additional metadata about the share such as the person that shared it and the device the user is coming from, etc. This information is then logged in a data warehouse or sent down a data firehose to eventually perform advanced analytics to TikTok. All of this is happening while you are waiting to get the real url of the video back. Yes it's only a few milliseconds, but by the time you get the url of the video back so that you can actually watch the video, the data has already been logged. Your privacy is already compromised.
So your suggestion is to "unmask" the url and "untrackify" it and then give the user the end-url with the actual video. The problem is that the only way to get the real url and to "untrackify" it, you need to contact TikTok and they will already log the data before you can get the real url back. You can't simply "unmask" it. Only TikTok knows what the real URL is. In order to get the real url you need to ask them (by following the short url link) and they will log your data before they give you the real url. There isn't any way around this (other than not using the vm.tiktok share links).
I am not sure if the "real url" that tiktok gives you contains url parameters in it or not. It probably does. So you could theoretically remove those. For example turn tiktok.com/video-url?sharing_user=username123&device=iphone into tiktok.com/video-url. This would be possible. But it wouldn't do anything to protect your privacy. It would simply remove the "[First Last] is on TikTok" message. But the data already got logged when you exchanged the short-url for the long-url. So the privacy damage has already been done. This is why "unmasking" simply doesn't do anything other than give you the illusion of privacy, without any change to real privacy.
By contrast, when you see a url like cnn.com/news-story-url?utm_source=facebook and you remove the parameters from that type of link, you can actually overt a certain level of tracking because the tracking hasn't been logged yet when you remove the parameters. So removing the params into the link cnn.com/news-story-url and following that, will avoid the tracking because the tracking is done on the actual visit with that specific url. Since you removed the tracking parameters, the website now has no data to actually track.
As others have mentioned, it does depend on what exactly you're defending against.
Preemptively opening the link as the sender will send a request to TikTok, but they're not really gaining any useful data there since you just watched the video, hit share (this is what they know so far), and now you opened the link that you had generated. So their database only learned that you shared a video with yourself, which you immediately opened.
The more valuable data is when various intended recipients open the link, allowing TikTok to associate you with them to serve more targeted videos based on implicit social graph, etc.
Moreover, opening the link yourself to get the "canonical url" protects yourself if you're sharing the link broadly since others can't obtain your name [and potentially more?] from the shortlink.
Now, if you're the recipient, there's not much you can do to avoid the tracking link, besides opening it up in as much of an anonymous environment as possible. But interestingly enough, I find the privacy threat greater to the sender. The sender has a TikTok account to aggregate data quite straightforwardly, unlike the recipient. The sender is also being associated with a number of recipients, vs. the recipient with only one sender, and again only through cookies, IP, or something of that sort.
It's entirely possible for Apple or Mozilla (just for example) to run a service checking URLs. In fact they already do this IIRC. They could easily replace all of these redirect links with the real link. Thus every unique link would be visited exactly once. By Apple. Not tracked.
And that's actually stopping it. Even if you don't want to do that, there's real utility in an incremental step where if I go to re-share a Tiktok video I don't accidentally help them track others.
that's not the problem, you can easily expand the url with curl (it will probably be a redirect) and manually remove the parameters. the problem is that it is not obvious to you that the link contains personally identifiable information.
A short video platform can hardly be expected to be a paragon of security and privacy. It has no utility whatsoever. I don't see where the concern comes from. A video of someone drinking coffee does not particularly invoke a point of concern.
What may be the real concern is China and the fact that the app is tied to it. Thats more race/geo-politics/war-mongering issue than a privacy concern.
You can't be serious. If what the gp says is true, then tiktok leaks your full name to anyone you share a link to. I see your HN username, nor bio, mentions your full name. Perhaps you are comfortable sharing this with anyone you communicate with online, but I'm not.
Well, my grandmothers logic and wise advice still holds. You have a problem with it - don't use it. It's genius.
Just like you wouldn't stand there listening to a drunk person complain about alcohol related health issues, I'm not about to entertain people complain about privacy when they have the agency and choice.
People don't know that they're sharing personal info when they're sharing the links. It's like spiking someone's drink and then blaming them for getting drunk.
I don't. This is one of the many reasons why I never will. I don't see how that is relevant to the discussion, however. Say hi to your grandmother for me.
Also, I see a few interesting comments in this HN thread; this evening when the dust settles, I'll aggregate & bring them to the bug for consideration if/when fixing this bug is considered.
I don't really know how I feel about having the browser mess with URLs without the user engaging it deliberately. It feels to me something that should perhaps be approached with caution. On the other hand, it does make sense. It's a tricky one.
It is at best neutral for the user. Sometimes these source trackers help companies know that affiliate links are more often drivers of traffic. Other times it helps with A/B testing because they discover the main logo was more often clicked than the "click me" button or whatever.
Affiliate links are often hidden, and depending on the system might even lead to higher prices, because the shop is offsetting the affiliate program cost.
Whether the company does A/B testing, what does that have to do with me? That also can be implemented without external trackers and just be set in a session.
Firefox already positioned itself when it gave the user a possibility to block tracking cookies and fingerprinting techniques. It's engaged even further now with Site Isolation.
If utm_* query arguments are used solely for tracking, then it only makes sense that Firefox goes the next step
This is a fair point. At this point, it's unlikely to break anything. But if it became the default, then any sites that do use them for something important would likely stop. Although there's nothing preventing sites from just renaming these parameters and modifying tracking code to keep tracking anyway.
Well that's exactly the kind of job that an (opinionated) User Agent should do for you. <aybe configurable, maybe not. You can always change your agent (so, browser) if you don't like its opinion.
I'm not so sure, by this logic we should have ad blocking by default as well, however that's a recipe for getting your browser banned by popular sites.
Ad blocking by default is absolutely the way to go. Spoof the Chrome user agent if this actually becomes a problem (which would help with fingerprinting anyway).
This is a bit like antivirus software authors worrying about being "banned" by the virus creators.
I'm all for that approach, but it opens a serious cat and mouse game where any rendering difference between Firefox and Chrome is quickly turned into a major problem for average users. Firefox would no longer be something you can recommend to your parents as they'd be constantly fighting bans. Ad blocker detection is bad enough as is with the current number of users.
Popup blocking is definitely not standard and expected. There is a 90% chance of every website you visit to show a popup and not be "blocked" by the most privacy-conscious browsers. But. They're technically not "popups", they're just divs overlaying over the content that you were served but can't see. Or they're little slide-banners that nag you about signing up for a newsletter email or agreeing to tracking cookie non-sense. Oh and let's not forget about the popups asking you to allow "notifications" from this site, or to allow "location info" to be shared.
> that's a recipe for getting your browser banned by popular sites
Good luck with that. They have no choice but to believe whatever data the browser sends them, data that we control. If their precious content leaves their server at all they've already lost.
I think an automatic alert (which can be set to ignore by the user) which flags such links, and offers the option to turn on a config flag which enables URL manipulation like this, would be a good compromise.
Yes, maybe some flag that comes up at the end of the address bar to indicate that it sanitized your URL. Then you can click the flag to see details about what it changed and have the option to navigate to the original URL.
> On the other hand, it does make sense. It's a tricky one.
All attempts by Mozilla to bake-in addon-like behavior so we don't have to install 'yet another damn addon' is welcoming, but as with any of these features, they come with caveats already present in the addons.
For example, Firefox's HTTPS-Only mode (that is basically the HTTPS-Everywhere addon) breaks some sites, and also their anti-tracking feature will break some sites too. But then again: if a site is serving HTTP only then they're doing it wrong (with the exception of captive portals). As for the anti-tracking feature: I rarely see sites asking me to disable my AD-Blocker, and when I do I never give-in, no matter how desperate I am to see hidden content.
I think the problem with this is that ClearURLs can break legitimate uses for URL params. I need to disable it when I do things like online payment. That's not intuitive for users and means an integrated solution needs to take laypersons into account who wouldn't know how to solve the problem (or even what the actual problem is). Is that realistically solvable?
1. First, by Mozilla analysts & developers making a good job at rolling out a potential implementation in a safe progressive way, with the easiest stuff first (`fbclid`, `gclid`, etc), and then going deeper / per-site, maybe re-using (part of) existing filterlists.
1.1. Also, note that ClearURLs is quite aggressive (as noted by a few commenters, and I confirm): it strips lots of non-URLbar requests, strips ETags, etc. A sibling comment mentions that alternative NeatURL is less aggressive. As with all cat-and-mouse games, this is a trade-off, and an implementation in core Firefox doesn't have to go as far as ClearURLs, at least initially. Offering a strictness knob to users is also an option.
2. Then, Firefox already has UI to disable Tracking Protection and work around sites broken by it: click the shield at the left of your URL bar, then toggle off "Enhanced Tracking Protection is ON for this site" to see if it was ETP that broke the site. This UI maybe need adjustments / more granularity (and maybe not), sure.
I am not a fan of making such functionality part of the browser.
I use the HTTPS only mode in Firefox - it breaks some sites, and telling Firefox to disable the mode for a specific site doesn't always work.
I feel like a plugin (HTTPS Everywhere) can deal with this a lot better than something that's integrated and reduced to a single checkbox in the settings.
And I am a fan of making such functionality part of the browser :P : one less addon to manage & trust! Aside: the amount of insecure code in addons is scary, see https://palant.info/categories/security/ , not to mention that addons are also a frequent cause of performance trouble (source: me, experienced several times). Thus, the more dubious addon code I'm able to replace with somewhat-well-maintained-and-audited Firefox code with many eyeballs on it, the better.
(At this point, you or a passerby will point at the Pocket fiasco and argue that there's too much stuff shoved into our browsers and just stahp it already. Fair, and I love lean software too. I'd still like this specific feature because A. it's not Pocket, B. it aligns well with what Firefox is doing these days, and C. it aligns with what I expect from my user agent of choice).
Then, supposing this ever makes its way into Fx, you can choose not to use it. And by the way, maybe like you, I will make the same choice if the Fx feature is too basic! But it will remain a win, for the users for whom it's good enough and who would never have bothered with an addon in the first place. Just like ETP vs. uBlock / PrivacyBadger / etc: ETP is a good basic "80%/20%" risk-less step in the right direction, and addons remains way ahead if you the user decide to bother a bit more.
> "telling Firefox to disable [HTTPS only] mode for a specific site doesn't always work."
I understand the sentiment, but disagree with it in the case we're talking about, for reasons just as factual and pragmatic as yours. See my reply at https://news.ycombinator.com/item?id=27056751
I would love to see an 'educational' mode on this - rather than just removing the tracking elements, put some info on-screen that shows what was removed and why, so people can use this as a tool to learn more about what types of tracking exist online and how common it is. Hopefully that would lead to a more knowledgeable end user community online and we can have more nuanced discussions in the future about where tracking is benign, and where it is not.
Not exactly what you requested but there's the ability to log all requests that are processed: if you click the extension icon and then under "Configs" enable logging, then at the bottom of the ui there's a button for checking the logs. This will show you the before and after processing urls, the rules that were triggered, and when.
I agree. I uninstalled this add-on precisely because I couldn’t quite figure out what it was doing or where it was doing it. Unlike an add blocker there’s very little tangible difference when it’s on or off
While I greatly value my privacy to the point where I donate to noyb.eu, removing utm campaign tags feels too much. Those do not commonly contain private information. I believe that marketers should feel free to use those to measure the effectiveness of their campaigns, instead of relying on more privacy-intrusive and opaque methods (e.g. cookies, fingerprinting, IP address collecting, etc.).
> I believe that marketers should feel free to use those to measure the effectiveness of their campaigns
I don't. I believe marketers should have exactly zero ways to measure the effectiveness of their mind hacking efforts. Any data they try and collect should have negative value by virtue of being completely randomized by the browser.
Actually I believe marketers shouldn't even exist. Nothing they say is trustworthy by virtue of conflict of interest. The internet would be much better off without these constant attempts to subvert it for their purposes.
Think of your favorite site with the best experience possible. That is possible because people tested countless times what works, what didn't, what is the most efficient path to a rewarding UX, and so on.
Yes, there are a ton of garbage lazy marketers in the world. Saying that marketing shouldn't exist would immediately render every refined UX you have navigated, purchased from, and or loyally stream content from.
Throwing out the good because of the bad is too far of a reach IMO. Anywho, that's just little old me and my opinion doesn't mean much.
I disagree. You can improve your product without the extensive use of trackers, especially external ones. Hire UX and PM that know what they are doing, do UX research, talk to your customers, do competitive analysis.
Just accruing swaths of data doesn’t help, you need to interpret it correctly. I think qualitative data will bring you a long way. Once you need to do A/B testing, you can also do it privacy friendly.
If you market your product and run a campaign? Why not offer discount codes or something to figure out how you got them.
How do you think a UX person knows what works? It's not that they were born jedi's worthy of understanding good design and human intuition. They test, test, test, and you know what...they tested more.
Tracking what button or page layout works better from a conversion perspective is not a privacy issue. It's a user experience benefit.
Having a SaaS business and not understanding the exact user funnel, conversion, abandonment, etc. will directly translate into a loss of your job and/or the failure of your business.
This isn't about personal preference which you have every right to. This is about building a business, which is why we're all here, and understanding how to successfully delight our customers.
> Think of your favorite site with the best experience possible. That is possible because people tested countless times what works, what didn't, what is the most efficient path to a rewarding UX, and so on.
Funny; those kinds of sites are my least favorite. All those colors and buttons are an information overload, and the animations make my laptop fans spin like crazy. Not everyone bought their computer under a decade ago.
Please, blue links and black text aren't evil. We need to make interfaces functional and stop rather than continuously A/B test them to maximize addictiveness ("engagement").
You ignored my statement. I asked you to think of your favorite site...not mine. I'm in no way saying that what I prefer is somehow supposed to be preferred by you.
"Think of your favorite site with the best experience possible."
Regardless of the site experience that you prefer, I can assure you that thought, testing, and iterations have occurred to deliver the experience that you personally prefer.
One of my least favorite aspects of websites is change and redesigns when the original design worked perfectly well. Given that change is bad when a site has already hit the "meh, good enough" threshold, I doubt that testing and iterations would do anything positive for existing users' experience.
Furthermore, I don't want hyper-optimized experiences. These experienced tend to be addictive, whether intended or not. Using an interface shouldn't feel "magical", it should work. I know that you weren't implying addictiveness or engagement, yet these values are (consciously or unconsciously) prevalent enough in the field that they've lowered my level of trust in analytics-driven iteration. Other responses in this thread should also show that I'm far from the only one who feels this way. Earning back broken trust in these situations typically requires going above and beyond past expectations.
User research is research, and should require informed consent held to the same standards of consent as actual research. In any human research, participation should be opt-in. Participants should be given complete information about analytics and how they will be used (with the option to see source code), own their data, be able to revoke their data, and see conclusions of the studies in a format they can understand. Any questions they have should be responded to before and after they opt-in. This shouldn't be buried in a confusing privacy policy but provided upfront, in a language they can speak. People frequently learn interfaces in languages that are foreign to them, but reading details of user research is a different story: you might need a translator. Otherwise, your sample will be even more heavily biased.
This is a lot of work, and might make analytics more trouble than they're worth.
I agree. I don't think marketers are trustworthy: their sole purpose is to "hack" my mind in order to buy stuff, so sure it's a job someone has to do but if I can avoid marketers to get more data from me, I'm in.
That can be simplified even further: If their incentives don't align with mine at least tangentially, then I'm not using their product. Out goes marketers, social media, and most ads.
... and I thought I was harsh with marketers. I don't want their calls, their unsolicited email, etc. I don't want them to have my personal information or be able to buy or sell my personal information. But I don't begrudge their ability to get word out for their product or service while funding content that I would otherwise get nickeled and dimed for or not have produced at all.
Nothing they say is trustworthy by virtue of conflict of interest
Everyone who says anything has that same conflict of interest. You do, I do, marketers do, salespeople do, engineers do, politicians do, scientists do. Completely dismissing value of an entire profession based upon self interest doesn't have a limiting principle.
Marketing, even if you naively limit the term to just cover advertising, is a rich and useful function of capitalism and society in general. The key to dealing with it is in protecting basic freedoms like a right to privacy.
> But I don't begrudge their ability to get word out for their product or service
I do because in 99% of cases it's a deliberate waste of my time and attention.
> Everyone who says anything has that same conflict of interest.
I don't think so. The information I receive from friends and peers is far more trustworthy. With marketing, I get selective truths at best.
Lots and lots of people on this site admit to adding "reddit" to their searches when looking for product reviews. Why? Because they don't trust marketers. We want real information from real people with real experiences, not some paid-for narrative. We especially want to know the risks, the negatives and the cons, precisely the kind of information marketers want to bury.
So how did your peers find out about robot vacuum cleaners, mobile phones, etc.? I guess you'll answer from their peer. And those?
Since ads need to be conspecious by law, I don't see a conflict of interest. We know this is a carefully crafted story of the person who has stakes in the product or service.
The fact that they are in the URL serves an additional purpose for these sites - to identify who is sharing exactly what, and with whom.
(I've noticed that TikTok does this explicitly, providing a different short URL for each share request - it's clean, which makes them easier to share without blocking out a whole chat, but still not wanted)
It's not that I don't mind the parameters, it's that I also mind the URL tracking. And I can do something directly about the URL tracking.
I doubt that’s the case - if you come from Twitter, you sharing the same link with 100 people probably skews their data. I just don’t think it’s that big of a problem yet for marketers to ask their devs to remove utm parameters after they’re logged.
it skews the data in the right direction (from their point of view). if you saw it on twitter, and shared it with 100 people, then (from their perspective) they should spend more time advertising on twitter.
AKA crap. When you walk into the store, does the clerk know which street have you used to drive to the store? Do you tell him on purpose even if he didn't ask? I didn't think so.
I'm regularly asked for my zipcode and I give it happily. Feels like the right tradeoff between allowing marketers to do their job and preserving my privacy.
Every time any kind of measure to improve people's browsing experience is posted here someone comes along and explains how this one is too much. But they are always wrong. There is no "going too far" in optimizing the browser for the people who are using it.
That's not the only issue. The ids are then fed back into the facebook.
Facebook can use it to link contacts together. I get a share link, it gives it an ID, I send it to someone, they open it and now they have linked my account with their account.
Same works if I click on a page and get the ID, share just that page, and someone clicks it (and there's some fb element on the page).
Now if several users a day share a link here on HN, facebook will know about us as belonging to a certain group.
The GP mentioned UTMs specifically, which are coarse-grained. Generally all links in a campaign will share the same UTM parameters. Sometimes different sizes or A/B variants will have distinguishing values. But as a matter of course, these are not unique to the user. Their primary purpose is to aggregate data, not to distinguish it.
This is different than the type of tracking you're talking about. ID-type parameters like gclid, dclid, and fbclid are all unique to the ad impression, and tied to the individual that ad was served to. Which means they can tie it back to other data sources they have about the individual. Like social graph data for Facebook, or demographic or interests data for other advertiser networks.
Personally I care about ID parameters a lot, and UTM-type parameters not at all. But that's just me.
As someone who is (sometimes militantly) pro-customer, I don't think tracking parameters are "war" - they are just a tool used to understand visitor flow, and ideally improve the visitor experience.
They are fundamentally first-party analytics - they show up in the server logs of the site visited, and that site needed to craft the link in order to place the parameters in the first place. There's a big difference between URL parameters and e.g. cookies attached to third-party javascript.
I definitely support the freedom of people to remove these URL parameters if they want. But it's not fair to classify them as a "war" - they are a tool used by scrupulous marketers, too.
I think the problem is that it's not possible for end users to know what's happening with the tracked data. Is the company on the other end creating a shadow profile of every single user (like Facebook)? Are they selling the data to companies that do user profiling? No idea. Who am I supposed to trust?
Even if there are scrupulous entities, the harm caused by the unscrupulous ones overshadows them.
Companies can only log information about your interaction with their sites. With very few exceptions, that just means on whatevercompany.com. Basically, they only have the data you explicitly give them.
In the case of some large sites that provide pervasive services like FB, Twitter, and Google, you interact with their sites incidentally as you surf in the internet. It's these sites that are a potential privacy risk IMO.
Consumers value their privacy and don't want to be tracked. Adtech keeps insisting on tracking users without consent. So users develop tools that neutralize the tracking. So adtech develops counter-measures. So users develop counter-counter-measures.
Consumers don't want to be tracked because it's become a meme to dislike tracking.
If you think about it... what's the problem with a URL tracking which advertiser you came from? Why would you insist that it be a secret which ad you clicked to come to a site?
This tool is removing URL parameters some of which are absolutely harmless and not violating anyone's "privacy". We really need to draw the line somewhere and decide what the heck means "privacy" at this point, because everything can be interpreted as violation of privacy.
Likewise, are those site owners allowed to exist, or should they just offer content at a loss, and pay millions of ads, and have no even clue which ads worked and which didn't? And when there's a paywall of course everyone is SUPER ANNOYED by the paywall.
So to recap, the public wants absolutely everything, for free, and they want to disrupt as much as possible from the site's mechanism to understand what the other side of this communication is and what they want.
> what's the problem with a URL tracking which advertiser you came from?
It's additional bits of information used to identify me.
> We really need to draw the line somewhere and decide what the heck means "privacy" at this point, because everything can be interpreted as violation of privacy.
Okay. If I explicitly give you information and you use it for my benefit alone, it's not a violation of privacy. Everything else is.
Concrete example: people provide their addresses to companies so they can have packages delivered. This is obviously legitimate. Selling my address to marketers so they can spam my inbox with unwanted ads is obviously unacceptable.
Placing identifying information in URLs is unacceptable simply because I didn't explicitly choose to reveal that information. I don't even care if it's harmless, the sheer audacity of these people is offensive.
> This tool is removing URL parameters some of which are absolutely harmless and not violating anyone's "privacy".
Yeah, I'm not risking it. They'll probably find a way to abuse this information if they haven't already. Marketers are not supposed to get any data whatsoever. I'm increasingly convinced marketing shouldn't even exist to begin with.
> should they just offer content at a loss, and pay millions of ads, and have no even clue which ads worked and which didn't?
Don't pay for ads in the first place.
> And when there's a paywall of course everyone is SUPER ANNOYED by the paywall.
That's okay.
> the public wants absolutely everything, for free, and they want to disrupt as much as possible from the site's mechanism to understand what the other side of this communication is and what they want.
I guess. Just return 402 Payment Required if people are expected to pay. We refuse to be the product.
I feel GDPR drew a good line: Any information that can be used to directly or indirectly identify an individual needs a legal reason, one of which is consent.
Is the information you collect unusable for directly or indirectly identifying an individual? Go wild! :)
I guess they're anthropomorphizing "the industry", treating all of them as responsible for what any of them do. But to steelman this viewpoint, industry should be unsurprised that consumers are filtering out UTM in the process of filtering everything, just as consumers should be unsurprised that the industry does what it does.
I always remove them. They're like referer headers. Where the visitor came from is just like any other info that might be useful to the site operator, but is really not any of their business unless the visitor voluntarily discloses it.
I don't mind people stripping these tags manually for link sharing, but stripping them across the board would be a major issue for website that finance themselves through affiliate links. Suddenly your referrals are no longer tracked and your main source of revenue dies up.
> I don't mind people stripping these tags manually for link sharing, but stripping them across the board would be a major issue for website that finance themselves through affiliate links.
Good riddance. Affiliate schemes just encourage people to spam low quality content full of affiliate links to products that are rarely good.
I agree, and I wonder if over the long term, an economy where no tracking is possible might not perform as well as an economy that tracks everything, for knowledge means better resource allocation.
(and then the tracking economy, let’s say China for example, will just steamroll our economies. This is what I‘m worried about, in a vacuum a slower developing but ad/tracking-free society would be preferable of course.)
Of course I despise all ads as much as the next hacker here on HN, I just wonder sometimes if they‘re a necessary evil.
So in the end I‘m inclined to agree with your nuanced „some general statistical gathering is OK, just no fingerprinting etc“.
> an economy where no tracking is possible might not perform as well as an economy that tracks everything, for knowledge means better resource allocation.
Such an economy is still very possible: just pay people for their data. Giving it away isn't economically efficient and imposes significant negative externalities, as we've seen.
> an economy where no tracking is possible might not perform as well as an economy that tracks everything
Oh well. Just let the economy perform slightly worse then.
> for knowledge means better resource allocation
Who cares about some corporation's resource allocation? That's their problem to solve. We should be caring about all the people whose privacy they are violating instead.
If they want to allocate resources efficiently, they should be required to do it in a way that doesn't invade anyone's privacy. If that means they'll make less money so be it.
I'm not disagreeing with the possibility, but this seems like a speculation on a possible risk. I can think of lots of reasons why this might not happen too... but the privacy invasion is happening now and is a direct threat to our public life. I think we should focus on the most pressing problem first.
Related, if you're looking to clean urls on the backend, here's my current pattern used on https://upstract.com and some other news aggregators I've built:
In your second list, are those the names of query params? I'm puzzled by the inclusion of @ in many of them, maybe you're saying that '_encoding' is a tracking param on any amazon domain, 'sk' is a tracking param on bing.com? What does the $ in the first entry indicate?
I don't see the point in blocking utm_ query string variables? they don't give up any personal information about you, they just help inform the landing site about which channels of marketing are working more effectively than others.
This isn't about personal data, removing the UTM codes isnt helping anybody, it just means that the sites don't know where best to spend their money on marketing and ultimately results in seeing more ads in more irrelevant contexts in the future.
Now the tracking parameters are all encoded in the last segment of the url. The backend just has to decode it accordingly and it will have both the item id and the bag of tracking parameters.
I'd also support just showing a warning when the apparent URI and the actual URI differ. Though a tool to automatically pick the canonical URL from the page source would also be neat.
Note that this addon requires the "Access your data for all websites" permission[0], which means:
> The extension can read the content of any web page you visit as well as data you enter into those web pages, such as usernames and passwords.
I'm sure the devs are super trustworthy, but there have been cases of legitimate extensions falling in the wrong hands, and this, coupled with automatic extension updates, could be a big security hole in your setup.
AFAIK it is not possible to get access to only the URL. I think the "tabs" permission triggers the "Access your data for all websites" warning. The "tabs" permission is required to get a tabs URL and change it.
This add-on together with Firefox, Bitwarden, uBlock Origin, HTTPS everywhere and EFF's Privacy Badger I us to improve my privacy online. Once a blue moon (few times per year) I have to switch them off to get a site to work.
Besides that I only have the Tree Style Tab add-on installed, which is much recommended.
Those addons are very basic, just what I'd have done in 2010 --- before Snowden!
Since you have Firefox, you could sync with a community-developed user.js like Arkenfox (previously GHacks) [1], which seems to go much farther and still not break much! At least the settings privacy.resistFingerprinting and privacy.firstparty.isolate looked indispensable as soon as I learned what they do.
And without FPI (first party isolation), not getting LocalCDN [2] (Decentraleyes successor) and Temporary Containers [3] seems like a gross oversight. They have a great discussion on add-ons at the Arkenfox wiki [4].
I haven't noticed, but I admit my Firefox setup has been subtly broken for a decade (by choice). In any case, you can always unconform where it matters for you, but why not start from sane defaults?
It still prompts you to install an update when one is available. In Firefox you still have to accept a prompt about restarting the browser after that, so it doesn't make a whole lot of difference.
But fair point about "extreme" vs "sane". They're quite subjective terms.
I used privacy.resistFingerprinting for a long time, but it changes the timezone to UTC. As a web developer it just caused a little more confusion than I was willing to deal with when working on front-end stuff. There's a bug [0] to address this but it hasn't been acted on yet.
I used to use Privacy Badger as well, but they've recently removed the learning feature (because it could be used to track you, IIRC), so it became similar to uBlock Origin, to the point where it feels redundant to run both.
Thanks for pointing this out. I just had a look at the settings and found that you can turn it back on, though it does come with the warning. Personally I think the risk to me of it learning being a detectable event vs. the tracking it blocks to be less, so I turned it back on.
I use AdBlock Plus in tandem with Privacy Badger because they do occasionally snag baddies that the other does not. To make this silly redundancy complete, I'm running them both on Brave, which catches almost all of what the two extensions used to handle.
I've been told that, but I'm comfortable with what I know at the moment. Given how aggressive Brave it, I'm not sure I'm at a disadvantage keeping the status quo.
Yes, I also turned it back on when I realized. I just noticed that I'm actually still using it, I just had the icon hidden. It's not as useful without the learning, though.
If a site doesn't work, open the link in a private tab. Usually that works, unless you have all these addons working in private mode as well of course. But I only use them in normal mode.
It should be noted that this extension strips ETag headers from all responses by default, which can break sites in surprising ways. As a developer of a web application that relies on ETag headers for vital functionality, I see not-infrequent support inquiries from ClearURLs users who don't understand the technical ramifications of this feature - nor do they understand why so many of the websites they use are so broken.
Have you considered using something other than ETag for your use case? It seems like ETag been compromised by trackers, and unfortunately this is why we can't have nice things.
We use the ETag header to make use of browser caching - not just for performance, but as a component of offline support. Yes, we could add an additional header with the same information to work around this specific extension for application-specific functionality using it, but that would leave the browser-based features broken.
While the ETag header may have been usable for cross site tracking at some point in the past [1], browser caches are isolated per-origin in Firefox, so there's no longer a cross-site tracking concern. That leaves it usable to identify you across sessions only in a first-party context, just like cookies, IP addresses (to a lesser extent), the Last-Modified header, and any number of other identification techniques ClearURLs doesn't block.
[1] I'd be interested to see any credible evidence of ETag headers being used for tracking in the wild - I've only seen theorizing that it _could_ be used as such, prior to cache isolation being implemented in Firefox and Chrome.
> ETags can be used to track unique users, as HTTP cookies are increasingly being deleted by privacy-aware users. In July 2011, Ashkan Soltani and a team of researchers at UC Berkeley reported that a number of websites, including Hulu, were using ETags for tracking purposes. Hulu and KISSmetrics have both ceased "respawning" as of 29 July 2011, as KISSmetrics and over 20 of its clients are facing a class-action lawsuit over the use of "undeletable" tracking cookies partially involving the use of ETags.
It appears that there have been at least a few cases of this in the wild.
The main distinction (at least to me) between ETag and the other tracking methods you mention is that ETag doesn't appear to be easily clearable by a user (although that sounds like something browsers should fix if they haven't already).
It's unfortunate that features like this end up getting co-opted by trackers, which leads to breaking legitimate use cases like your app in the process.
That's certainly credible evidence for past use I overlooked, though it remains unlikely to be useful with the advent of per-origin cache isolation.
The Last-Modified header can be used in exactly the same way, and isn't blocked by this extension, which harkens back to my original point: this is an extension that appears to see significant use by non-technical users, yet it breaks a browser feature by default. There are plenty of other methods of identifying a unique user that it doesn't prevent, so this seems like a pretty unexpected feature users should take note of.
There’s lots of rules and patterns in this implementation, but it’s worth bearing in mind that you can normally get a clean URL by looking at the <link rel=canonical> element.
Sites put this in because they want search engines to index a single clean URL rather than many tracking URLs, so it’s pretty reliable.
That works if you want to get a clean URL to share with others. But if instead you have gotten a link then not using built-in patters means you would first need to retrieve the site with the tracking parameters to get to the canonical URL.
Lovely extension, some discussions about its functionality can be found in this thread [0] after the removal of the extension from Chrome's Web Store.
One things I noticed is that it can be too aggressive from time to time. I encountered this "issue" when creating a Bitwarden account, I was unable to verify my e-mail address because ClearURLs was (unbeknownst to me) removing some of the parameters from the activation URL. While similar cases will most likely not be frequent, it can be really frustrating to determine why something does not work (also applies to ad blockers).
I love this just for the usability alone, never mind being anti-tracking.
I'm tired of every time I want to share a product page or post a URL or something, of having to strip 300 friggin' nonsense characters from the end of it.
Not all tracking is user-specific, but you do raise an interesting point, how to (thinking as a site owner) remove personal info from the URL, but still pass that info around locally.
Cookies is one way (if we stick to the domain), possibly using sidecar AJAX requests and localState is another.
Or maybe we can leave it all in the URL, but encrypt it with a key in a cookie, thus without the cookie, the info is recognized as foreign when passed around. Hmm yeah, not bad.
Another way to browse one-off sites one visits is to through a mirror like https://archive.is/ (I exclusively use mirrors to view posts on content aggregators like Medium, Substack, Buzzfeed, Blogspot, Wordpress; annoying News websites that download a gazillion files; and file-hosting websies like imgur).
A caveat: When you submit a request to archive a url, archive.is sends the client-ip (X-Forwarded-For) to the destination server.
archive.org archives any digital content; including larger (media) files. archive.org also has several other active archive projects than just the Wayback Machine. It also respects robots.txt. archive.org is a non-profit based in SF.
archive.is archives webpages by stripping it off of any dynamic content, but it tries its best to capture a dynamic (multi) page (spa) anyway (for example, twitter threads, linkedin profiles). It limits file-sizes up to 50MB (I think?), and works best with text-heavy webpages (news and blogs, for example). archive.is is ran by a for-profit company based in NY, but it isn't clear who is in fact behind it. Ref: https://en.wikipedia.org/wiki/wp:archive.is
This is one of those things that either few use and it works, or if many start using it, the tracking will just get obfuscated.
I already see many sites use something like ?arg={BASE64 STRING OF ALL THE THINGS} and no automatic tool can decypher that as it's a custom list of bytes.
Removing utm_ parameters will probably always work, because they are standardized and shared between different applications (like the website and Google Analytics). If you try to obfuscate them your analytics software can't read them as well.
But yeah, home grown analytics can't be reliable circumvented.
This is a neat extension but I think we should acknowledge that stripping parameters like these from affiliate links is going to cause major problems for websites that are financed through affiliate revenue, even if they are open and honest about it.
Your point is valid. Any business model that is dependent on collecting visitor data is flawed and its appropriate for these companies to either change or wither. I hear the concern, but IMO the need for privacy is greater.
There are ongoing discussions to include the first one as a stock list (i.e. present in "Filter lists"), though not enabled by default for now.
Addendum: to be clear, this is not a replacement for ClearURLs. ClearURLs has more capabilities then just removing query parameters from the URLs of outgoing network requests.
I have an idea for a project (I call it 'cannon' for now)[0] which would 'canonicalize' URLs and extract semantic information from it, ideally just by looking at the URL, without doing any extra requests. For example, a tweet URL usually encodes the tweet author and tweet ID; and by extracting such entities one could determine 'relations' between URLs. I'm using a simple prototype in Promnesia [1], a browser extension aiming to make the web browsing history more useful and aid knowledge management.
This effort is really ought to be shared, it's potentially a lot of manual work, and could benefit many projects. ClearURLs seems like one of the most promising existing projects doing similar stuff; have been meaning to approach the devs, feels like it's something we could cooperate on. Although ClearURL has a somewhat narrower scope, but still I feel like there is a potential to share.
Yep, that's kind of the main problem :) Hence the need for some manual curation. (e.g. ClearURLs seems to do it here https://github.com/ClearURLs/Rules/blob/master/data.json) For 80% of sites just throwing away the query parameters work, for the rest sadly it's necessary to do more sophisticated normalizing.
I'm also thinking that it might be possible by some simple machine learning, by looking at the corpus of existing URLs. E.g. if a human looks at a corpus of different URLs they would more or less guess what is useful, and what's tracking garbage, so perhaps it's possible to automate it with a high accuracy?
Then, I also feel if it's paired with some UI to allow the user to 'fix' the algorithm for entity extraction (e.g. by pointing at the 'relevant' parts of the URL), it would already be good enough for the user -- they would fix the sites that are worst offenders for them. Then these fixes could be optionally contributed back and merged to the upstream 'rules database'.
This will just lead to sites removing canonical urls from marketing entirely. Theere will be no somesite.whatever/books/123?campaign=1 and somesite.whatever/books/123?campaign=2. Instead they will be somesite.whatever/guid1 and somesite.whatever/guid2. What's the point then?
I'm not in the affiliate landscape but this kind of thing could have detrimental impact on publishers driving traffic to various commerce sites. If you're a person that makes an occasion purchase through publishers (small or larger)to support their content, this will immediately kill their earnings.
Publishers are desperate to monetize their audience anyway possible. Affiliate revenue always seemed to be lesser of evils, IMO, in comparison to programmatic/display. After all, the user intentionally is making a purchase vs. having their data sold out from under them with zero knowledge.
Here's to hoping that I'm misunderstanding how inclusive this will be to stripping parameters.
Check this out! I happened to see these folks just launched a business doing exactly what you described. I doubt they have any traction but there's hope:
Unfortunately still doesn't have a whitelist feature, which impacts usability severely.
If you use any site that's broken by the extension, that means you have to remember to turn it off (globally!) before using the site, and turn it on again after.
I think one thing need to be clarified is Tracking Link !== Affiliatized Link. If the extension get rid of all the affiliate info in the link, it can destroy the income of the youtubers who put affiliate links in their content.
It looks like the add-on has an option to "Allow referral marketing" which is off by default. If you install this add-on and feel like enabling this, it looks like you can.
That said, it still results in some level of tracking and given the add-on's purpose, having to opt in to affiliate links seems like the right choice.
I like this idea. I've been using a bit of Ruby wrapped into my shell to remove Facebook's tracking from links[1] but it's nice to have so many in one place that we all can contribute to.
It seems the interesting bit (for me) is in the Rules[2] repo.
Would be neat if it also removed the anchor gunk google is adding that highlights the specific word you searched for. Very obnoxious when you're trying to find a wikipedia page to link on mobile.
Doing some more digging, apparently AdGuard just created a URL tracking filter a couple of weeks ago that we should be able to enable in either the AdGuard or Ublock Origin (or perhaps other ad-blocking) Firefox mobile addons eventually:
https://adguard.com/en/blog/adguard-url-tracking-filter.html
Or you can manually add the filter now. After installing the Ublock Origin addon in firefox mobile, I clicked on the 3 dots -> Addons -> Ublock Origin -> Open the dashboard -> Filter lists -> Import... and pasted this URL (from the top of the above link):
https://raw.githubusercontent.com/AdguardTeam/FiltersRegistr...
I tested by sharing a URL with UTM and other parameters, and it did strip them.
I hate how some websites break when you go aggressive on cleaning tracking elements. One such example is Hey mail. There are many others, but for this one reason I moved to Chromium for Hey mail and all websites that break after privacy protection enabled, for everything else, it's Firefox.
I built something like this years ago for personal use in greasemonkey with a bunch of hardcoded common tacker tags. Greasemonkey is the only thing I really misss since switching to Safari.
UTM params are sometimes used to make a site free or might have some other hidden functionality (even if this may not be best practice). I wouldn't use this.
You can rightclick a link in your email, copy it, then paste it in the browser bar, remove the unwanted stuff. A lot of work, especially if you need to do this many times a day.
The utm params are sometimes used to make a site free when they come from a certain source or might have some other hidden functionality (even though that might not be best practice). I wouldn't use this.
It's not just privacy! When I want to share an article with someone, I don't want the link to have 1000 characters with urlencoded arguments! On mobile I often do it manually and it's very annoying!
If you share a link from the TikTok app, it gives you a vm.tiktok.com/[xyz] link to send/post elsewhere. It gives you no indication that this isn't a generic link to the post, nor does it give you an option to expose the generic link to the post.
Instead, when you share that link and someone clicks on it and does not have the app, it opens with a header saying "[First Last] is on TikTok." On the other hand, once you do click on that link (if and only if you don't have the app installed), you get redirected to the static link to the video and finally obtain it.
This is an anti-pattern that enables further tracking and potentially unknowingly exposes user data when links are shared publicly. And there's no indication to the user that this is happening, since the link is structured as if it does not contain any tracking. Ie a tool like this wouldn't be able to "strip out" the tracking since it isn't tacked on in any way, but embedded as the generated link itself.