Hacker News new | past | comments | ask | show | jobs | submit | buro9's comments login

I hated it so much I migrated to ProtonPass, deleted my data, and set my account to expire.

Then Proton CEO outed himself as a fash sympathiser, so I re-activated my Bitwarden account, migrated back, and am now learning to love the changes.

The best I've got for tips are:

1. Settings > Appearance > Quick Copy

2. Settings > Appearance > Compact Mode

3. Settings > Appearance > Extension Width > Wide

I still don't love it, but it remains the best of the bunch.


not any more, now the thing about Arsenal is that it's all set pieces and over-reliance on corners.

Saka isn't all that either, he'd be nothing within Odegaard.

(this is convincing, and I know about as much as the IT Crowd did)


Beware if you have UK users, you should read the new law in the UK called the online safety act.

Your site has young trans people, sexual content, and would also be a target for grooming from chasers. The risks, for you, are very high.


> The risks, for you, are very high.

Is the creator in the UK? Or do they visit regularly? It wasn't mentioned in the original post.


The law has ridiculous overreach, that's true. And we haven't seen what international enforcement of it looks like yet. But to deal with the facts as they are now, the law states that it applies if you have UK users, and that the personal liability for officers of the company can be up to £18M... The overreach continues because it also covers "harmful but not illegal" content.

The app publishing didn't exclude the UK, it probably should.


No, the law cannot possibly apply "if you have UK users".

The law applies if you're in the UK or in a country the UK can persuade to extradite you.


I don't agree with the law, there's no point in using emphasis to labour the point, the UK Govt advice on who it applies to is here https://www.gov.uk/government/publications/online-safety-act... and feel free to read it. As to enforcement, it's a new law so there has been none, hence no-one can speak to that, but you're right that extradition is the path and on that front if a similar law exists and agreements are in place, then for example most European countries would extradite. We don't know where the author is, but they posted on here in the morning of European time zones, and all I suggested is that they do their diligence on their personal risk as a result of laws that do claim to apply their service. No-one should dismiss the risk, the person should speak to a lawyer.

The article doesn't even touch "people enter their email incorrectly when registering an account".

I've received magic links to my Gmail account that belong to other people, for accounts that have ordered flight tickets, or clothing, or digital services.

Those people, I guess they now have no way to access their online account, as they cannot password reset (if that was the fallback), or change their email (usually requiring confirmation), or receive their magic link.

There's nothing I can do here, except to delete the email, I don't have any indication as to what the correct email should be, and the person's name is the same as my legal name and there are a lot of people with that name in the World.

Few services verify an email during sign-up, because I'm sure data shows that added friction during sign-up results in fewer people signing up.


You're right, I forgot to even cover that part because I was focused on how annoying they are to me as a user, not necessarily as a service provider. I also forgot to mention how they train people to click on links, how my inbox now consists of dozens of emails per day telling me to either click to login, or warning me that I logged in.

I have my own domains for email so I haven't had the issue of someone else entering my email but I keep hearing from friends getting that.


This happens on my Gmail all the time.

Frankly, if somebody else uses my address for a service and I'm receiving anything other than email verification from that service, I'm reporting it as spam on both Gmail and Fastmail because that's what it is.


It's been 2025 for 6 months already...


Their appetite cannot be quenched, and there is little to no value in giving them access to the content.

I have data... 7d from a single platform with about 30 forums on this instance.

4.8M hits from Claude 390k from Amazon 261k from Data For SEO 148k from Chat GPT

That Claude one! Wowser.

Bots that match this (which is also the list I block on some other forums that are fully private by default):

(?i).(AhrefsBot|AI2Bot|AliyunSecBot|Amazonbot|Applebot|Awario|axios|Baiduspider|barkrowler|bingbot|BitSightBot|BLEXBot|Buck|Bytespider|CCBot|CensysInspect|ChatGPT-User|ClaudeBot|coccocbot|cohere-ai|DataForSeoBot|Diffbot|DotBot|ev-crawler|Expanse|FacebookBot|facebookexternalhit|FriendlyCrawler|Googlebot|GoogleOther|GPTBot|HeadlessChrome|ICC-Crawler|imagesift|img2dataset|InternetMeasurement|ISSCyberRiskCrawler|istellabot|magpie-crawler|Mediatoolkitbot|Meltwater|Meta-External|MJ12bot|moatbot|ModatScanner|MojeekBot|OAI-SearchBot|Odin|omgili|panscient|PanguBot|peer39_crawler|Perplexity|PetalBot|Pinterestbot|PiplBot|Protopage|scoop|Scrapy|Screaming|SeekportBot|Seekr|SemrushBot|SeznamBot|Sidetrade|Sogou|SurdotlyBot|Timpibot|trendictionbot|VelenPublicWebCrawler|WhatsApp|wpbot|xfa1|Yandex|Yeti|YouBot|zgrab|ZoominfoBot).

I am moving to just blocking them all, it's ridiculous.

Everything on this list got itself there by being abusive (either ignoring robots.txt, or not backing off when latency increased).


There's also popular repository that maintains a comprehensive list of LLM and AI related bots to aid in blocking these abusive strip miners.

https://github.com/ai-robots-txt/ai.robots.txt


I didn't know about this. Thank you!

After some digging, I also found a great way to surprise bots that don't respect robots.txt[1] :)

[1]: https://melkat.blog/p/unsafe-pricing


You know, at this point, I wonder if an allowlist would work better.


I love (hate) the idea of a site where you need to send a personal email to the webmaster to be whitelisted.


We just need a browser plugin to auto-email webmasters to request access, and wait for the follow-up "access granted" email. It could be powered by AI.


Then someone will require a notarized statement of intent before you can read the recipe blog.


Now we're talking. Some kind of requirement for government-issued ID too.


I have not heard the word "webmaster" in such a long time


Deliberately chosen for the nostalgia value :)


I have thought about writing such a thing...

1. A proxy that looks at HTTP Headers and TLS cipher choices

2. An allowlist that records which browsers send which headers and selects which ciphers

3. A dynamic loading of the allowlist into the proxy at some given interval

New browser versions or updates to OSs would need the allowlist updating, but I'm not sure it's that inconvenient and could be done via GitHub so people could submit new combinations.

I'd rather just say "I trust real browsers" and dump the rest.

Also I noticed a far simpler block, just block almost every request whose UA claims to be "compatible".


Everything on this can be programmatically simulated by a bot with bad intentions. It will be a cat and mouse game of finding behaviors that differentiate between bot and not and patching them.

To truly say “I trust real browsers” requires a signal of integrity of the user and browser such as cryptographic device attestation of the browser. .. which has to be centrally verified. Which is also not great.


> Everything on this can be programmatically simulated by a bot with bad intentions. It will be a cat and mouse game of finding behaviors that differentiate between bot and not and patching them.

Forcing Facebook & Co to play the adversary role still seems like an improvement over the current situation. They're clearly operating illegitimately if they start spoofing real user agents to get around bot blocking capabilities.


I'm imagining a quixotic terms of service, where "by continuing" any bot access grants the site-owner a perpetual and irrevocable license to use and relicense all data, works, or other products resulting from any use of the crawled content, including but not limited to cases where that content was used in a statistical text generative model.


This is Cloudflare with extra steps


If you mean user-agent-wise, I think real users vary too much to do that.

That could also be a user login, maybe, with per-user rate limits. I expect that bot runners could find a way to break that, but at least it's extra engineering effort on their part, and they may not bother until enough sites force the issue.


I hope this is working out for you; the original article indicates that at least some of these crawlers move to innocuous user agent strings and change IPs if they get blocked or rate-limited.


This is a new twist on the Dead Internet Theory I hadn’t thought of.


We'll have two entirely separate (dead) internets! One for real hosts who will only get machine users, and one for real users who only get machine content!

Wait, that seems disturbingly conceivable with the way things are going right now. *shudder*


You just plain blocking anyone using node from programatically accessing your content with Axios?


Apparently yes.

If a more specific UA hasn't been set, and the library doesn't force people to do so, then the library that has been the source of abusive behaviour is blocked.

No loss to me.


Why not?


>> there is little to no value in giving them access to the content

If you are an online shop, for example, isn't it beneficial that ChatGPT can recommend your products? Especially given that people now often consult ChatGPT instead of searching at Google?


> If you are an online shop, for example, isn't it beneficial that ChatGPT can recommend your products?

ChatGPT won't 'recommend' anything that wasn't already recommended in a Reddit post, or on an Amazon page with 5000 reviews.

You have however correctly spotted the market opportunity. Future versions of CGPT with offer the ability to "promote" your eshop in responses, in exchange for money.


Would you consider giving these crawlers access if they paid you?


Interesting idea, though I doubt they'd ever offer a reasonable amount for it. But doesn't it also change a sites legal stance if you're now selling your users content/data? I think it would also repel a number of users away from your service


At this point, no.


No, because the price they'd offer would be insultingly low. The only way to get a good price is to take them to court for prior IP theft (as NYT and others have done), and get lawyers involved to work out a licensing deal.


This is one of the few interesting uses of crypto transactions at reasonable scale in the real world.


What mechanism would make it possible to enforce non-paywalled, non-authenticated access to public web pages? This is a classic "problem of the commons" type of issue.

The AI companies are signing deals with large media and publishing companies to get access to data without the threat of legal action. But nobody is going to voluntarily make deals with millions of personal blogs, vintage car forums, local book clubs, etc. and setup a micro payment system.

Any attempt to force some kind of micro payment or "prove you are not a robot" system will add a lot of friction for actual users and will be easily circumvented. If you are LinkedIn and you can devote a large portion of your R&D budget on this, you can maybe get it to work. But if you're running a blog on stamp collecting, you probably will not.


Use the ex-hype to kill the new hype?

And the ex-hype would probably fail at that, too :-)


What does crypto add here that can't be accomplished with regular payments?


What do you use to block them?


Nginx, it's nothing special it's just my load balancer.

if ($http_user_agent ~* (list|of|case|insensitive|things|to|block)) {return 403;}


403 is generally a bad way to get crawlers to go away - https://developers.google.com/search/blog/2023/02/dont-404-m... suggests a 500, 503, or 429 HTTP status code.


> 403 is generally a bad way to get crawlers to go away

Hardly... the article links says that a 403 will cause Google to stop crawling and remove content... that's the desired outcome.

I'm not trying to rate limit, I'm telling them to go away.


That article describes the exact behaviour you want from the AI crawlers. If you let them know they’re rate limited they’ll just change IP or user agent.


From the article:

> If you try to rate-limit them, they’ll just switch to other IPs all the time. If you try to block them by User Agent string, they’ll just switch to a non-bot UA string (no, really).

It would be interesting if you had any data about this, since you seem like you would notice who behaves "better" and who tries every trick to get around blocks.


Switching to sending wrong, inexpensive data might be preferable to blocking them.

I've used this with voip scanners.


Oh I did this with the Facebook one and redirected them to a 100MB file of garbage that is part of the Cloudflare speed test... they hit this so many times that it would've been 2PB sent in a matter of hours.

I contacted the network team at Cloudflare to apologise and also to confirm whether Facebook did actually follow the redirect... it's hard for Cloudflare to see 2PB, that kind of number is too small on a global scale when it's occurred over a few hours, but given that it was only a single PoP that would've handled it, then it would've been visible.

It was not visible, which means we can conclude that Facebook were not following redirects, or if they were, they were just queuing it for later and would only hit it once and not multiple times.


Hmm, what about 1kb of carefully crafted gz-bomb? Or a TCP tarpit (this one would be a bit difficult to deploy).


4.8M requests sounds huge, but if it's over 7 days and especially split amongst 30 websites, it's only a TPS of 0.26, not exactly very high or even abusive.

The fact that you choose to host 30 websites on the same instance is irrelevant, those AI bots scan websites, not servers.

This has been a recurring pattern I've seen in people complaining about AI bots crawling their website: huge number of requests but actually a low TPS once you dive a bit deeper.


It's never that smooth.

In fact 2M requests arrived on December 23rd from Claude alone for a single site.

Average 25qps is definitely an issue, these are all long tail dynamic pages.


Curious what your robots.txt looked like, if you have a link?


I am the OP, and if you read the guidance published yesterday: https://www.ofcom.org.uk/siteassets/resources/documents/onli...

Then you will see that a forum that allows user generated content, and isn't proactively moderated (approval prior to publishing, which would never work for even a small moderately busy forum of 50 people chatting)... will fall under "All Services" and "Multi-Risk Services".

This means I would be required to do all the following:

1. Individual accountable for illegal content safety duties and reporting and complaints duties

2. Written statements of responsibilities

3. Internal monitoring and assurance

4. Tracking evidence of new and increasing illegal harm

5. Code of conduct regarding protection of users from illegal harm

6. Compliance training

7. Having a content moderation function to review and assess suspected illegal content

8. Having a content moderation function that allows for the swift take down of illegal content

9. Setting internal content policies

10. Provision of materials to volunteers

11. (Probably this because of file attachments) Using hash matching to detect and remove CSAM

12. (Probably this, but could implement Google Safe Browser) Detecting and removing content matching listed CSAM URLs

...

the list goes on.

It is technical work, extra time, the inability to not constantly be on-call when I'm on vacation, the need for extra volunteers, training materials for volunteers, appeals processes for moderation (in addition to the flak one already receives for moderating), somehow removing accounts of proscribed organisations (who has this list, and how would I know if an account is affiliated?), etc, etc.

Bear in mind I am a sole volunteer, and that I have a challenging and very enjoyable day job that is actually my primary focus.

Running the forums is an extra-curricular volunteer thing, it's a thing that I do for the good it does... I don't do it for the "fun" of learning how to become a compliance officer, and to spend my evenings implementing what I know will be technically flawed efforts to scan for CSAM, and then involve time correcting those mistakes.

I really do not think I am throwing the baby out with the bathwater, but I did stay awake last night dwelling on that very question, as the decision wasn't easily taken and I'm not at ease with it, it was a hard choice, but I believe it's the right one for what I can give to it... I've given over 28 years, there's a time to say that it's enough, the chilling effect of this legislation has changed the nature of what I was working on, and I don't accept these new conditions.

The vast majority of the risk can be realised by a single disgruntled user on a VPN from who knows where posting a lot of abuse material when I happen to not be paying attention (travelling for work and focusing on IRL things)... and then the consequences and liability comes. This isn't risk I'm in control of, that can be easily mitigated, the effort required is high, and everyone here knows you cannot solve social issues with technical solutions.


Thanks for all your work buro9! I've been an lfgss user for 15 years. This closure as a result of bureaucratic overreach is a great cultural loss to the world (I'm in Canada). The zany antics and banter of the London biking community provided me, and my contacts with which I have shared, many interesting thoughts, opinions, points of view, and memes, from the unique and authentic London local point of view.

LFGSS is more culturally relevant than the BBC!

Of course governments and regulations will fail realize what they have till it's gone.

- Pave paradise, put up a parking lot.


Which of these requirements is, in your opinion, unreasonable?


> The vast majority of the risk can be realised by a single disgruntled user on a VPN from who knows where posting a lot of abuse material when I happen to not be paying attention (travelling for work and focusing on IRL things)... and then the consequences and liability comes. This isn't risk I'm in control of, that can be easily mitigated, the effort required is high, and everyone here knows you cannot solve social issues with technical solutions.

I bet you weren't the sole moderator of LFGSS. In any web forum I know, there is at least one moderator being online every day and much more senior members able to use a report function. I used to be a moderator for a much smaller forum and we had 4 to 5 moderators any time with some of them being among those that were online every day or almost every day.

I think a number of features/settings would be interesting for a forum software in 2025:

- desactivation of private messages: people can use instant messaging for that

- automatically blur post when report button is hit by a member (and by blur I mean replacing server side the full post by an image, not doing client side javascript).

- automatically blur posts when not seen by a member of the moderation or a "senior level or membership" past a certain period (6 or 12 hours for example)

- disallow new members to report and blur stuff, only people that are known good members

All this do not remove the bureaucracy of making the assessments/audits of the process mandated by the law but it should at least make forums moderable and have a modicum amount of security towards illegal/CSAM content.


> In addition to the cookie privacy pop-over when viewing that site

I don't know where you're seeing that as the site does not have such things. The only cookies present are essential and so nothing further was needed.

The site does not track you, sell your data, or otherwise test you as a source of monetisation. Without such things conforming with cookie laws is trivial... You are conformant by just connecting nothing that isn't essential to providing the service.

For most of the sites only a single cookie is set for the session, and for the few via cloudflare those cookies get set too.


the real risk I see is that as it's written, and as Ofcom are communicating, there is now a digital version of a SWATing for disgruntled individuals.

the liability is very high, and whilst I would perceive the risk to be low if it were based on how we moderate... the real risk is what happens when one moderates another person.

as I outlined, whether it's attempts to revoke the domain names with ICANN, or fake DMCA reports to hosting companies, or stalkers, or pizzas being ordered to your door, or being signed up to porn sites, or being DOX'd, or being bombarded with emails... all of this stuff has happened, and happens.

but the new risk is that there is nothing about the Online Safety Act or Ofcom's communication that gives me confidence that this cannot be weaponised against myself, as the person who ultimately does the moderation and runs the site.

and that risk changes even more in the current culture war climate, given that I've come out, and that those attacks now take a personal aspect too.

the risk feels too high for me personally. it's, a lot.


> the real risk I see is that as it's written, and as Ofcom are communicating, there is now a digital version of a SWATing for disgruntled individuals.

I'm sorry, what precisely do you mean by this? The rules don't punish you for illegal content ending up on your site, so you can't have a user upload something then report it and you get in trouble.


Yes you can https://www.ofcom.org.uk/siteassets/resources/documents/onli...

A forum that isn't proactively monitored (approval before publishing) is in the "Multi-Risk service" category (see page 77 of that link), and the "kinds of illegal harm" include things as obvious as "users encountering CSAM" and as nebulous as "users encountering Hate".

Does no-one recall Slashdot and the https://en.wikipedia.org/wiki/Gay_Nigger_Association_of_Amer... trolls? Such activity would make the site owner liable under this law.

You might glibly reply that we should moderate, take it down, etc... but we, is me... a single individual who likes to go hiking off-grid for a vacation and to look at stars at night. There are enough times when I could not respond in the timely way to moderate things.

This is what I mean by the Act providing a weapon to disgruntled users, trolls, those who have been moderated... a service providing user generated content in a user to user environment can trivially be weaponised, and it will be a very short amount of time before it happens.

Forum invasions by 4chan and others make this extremely obvious.


> A forum that isn't proactively monitored (approval before publishing) is in the "Multi-Risk service" category (see page 77 of that link),

That's not true, you'd need to conclude you're at a medium or high risk of things happening and consider the impact on people if they do.

> and as nebulous as "users encountering Hate".

But users posting public messages can easily fit into the low risk category for this, it's even one of their examples of low risk.


edit - moved across now the comment is alive


> Not sure why the reply buro9 gave is dead

Oh I do... the link... HN must have a word based deny list


Ahh that makes sense.


> Not sure why the reply buro9 gave is dead

I vouched for it, so it should be visible now.


thank you for removing the deadname.


Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: