Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Avoiding Imgur Link Rot (haasie.com)
68 points by rvavruch on May 5, 2023 | hide | past | favorite | 50 comments
Over the last decade I've built a number of different digital asset managers (mostly media files) that met the needs of my companies at the time. It is an area I enjoy working in. A month ago, when asked what was next for me, I jokingly said I would build another DAM.

Then on Saturday, two weeks ago, I learnt that Imgur was going to delete all anonymous & NSFW files on the 15th of May. It was pointed out that this would mean broken links in communities that had relied on Imgur. By the Sunday I had decided that I would build another DAM, initially with the intent of avoiding Imgur link rot.

It was challenging to find time to spend on this, the project was put together over about 8 evenings. It still has rough patches, this is an early MVP (a Michael Seibel "brick").

I have many ideas of where to take this project, but for now it only does one thing: backup Imgur files and produce new links that are easy to swap out for old soon-to-be-deleted Imgur links.




> Well done, you’ve defeated link rot

Until this site succumbs to the challenges of being a free image host?

FWIW ArchiveTeam is grabbing as much as possible so it could eventually show up in the wayback machine. Though what they have now, 47TB, is a drop to the petabytes imgur probably has now. There was an estimated 376 TiB in 2015.

- https://wiki.archiveteam.org/index.php/Imgur

- https://tracker.archiveteam.org/imgur/


Free image hosts just aren't sustainable, the costs are high and it's fairly hard to make that money back. I made a paid image host a while back (https://imgz.org) exactly because of that reason.


Hi, your site came up in my research. Although it wasn't clear if it was an actual service or an art project? Keen to hear more about your experiences.


It's both! It's an art project that works very well, which is the best kind of art project.


My intention is not to be a free host though - that's not sustainable. I hope to fund it through donations and potentially a subscription for enhanced features.


Sites like catbox[.]moe are popular and most months seem to fail to cover all their expenses with donations. And that's using cheap hosting (ovh, hetzner, etc, and no CDN), I imagine that your Fastly bill will be higher than theirs.

For this to work, I think you need to find revenue sources right away and run it as cheaply as possible, otherwise you'll run into problems, creating more link rot in the process.

Anyway, I hope you manage to make it work.


Thanks, agreed it will be a challenge.


Donations are not sustainable either. You need a business plan.


Could it be configured so if I go to, for example, https://i.haasie.com/something.png and you haven't already backed up from imgur, you back it up when that URL is hit?


Definitely agree. It also feels like this would be perfect in combination with something like ipfs for a decentralized way to store and access these images.


Agreed. I've played with IPFS before. Very interesting, definitely something to consider for the future of Haasie.


Yeah, a read through cache seems more obvious than opt-in.


That's a fantastic idea, thanks!


Interesting service, but an earnest question about your Terms and Conditions[0].

How closely did you read this? Since I see you used an TOC generator (which was nice for me learn about), I'm just curious how closely you read some of the conditions.

I just may be reading the Prohibited[1] section too closely as well as the User Generated Content [2] section, but it seems like your stated goal about protecting the NSFW content from Imgur, as I guess such content is potentially obscene, lewd, lascivious, filthy, violent, harassing, libellous, slanderous, or otherwise objectionable (as determined by [you and your team]).

I'm not here to trash your service or trying to "gotcha", just since you're responding in the thread, I'm only curious how closely you've read the TOC and if you stand by all the elements of it.

0 - https://haasie.com/terms_and_conditions

1 - https://haasie.com/terms_and_conditions#prohibited

2- https://haasie.com/terms_and_conditions#ugc

edit: fixed link formatting


Fair question. I did read the TOCs, and studied the current TOCs of Imgur as well as my hosting company to make sure Haasie's covered the most important aspects.

Imgur atm still allows for NSFW, although presumably this will change. But as it stands the contents of these TOCs are not dissimilar from Imgur's or any other file hosts that allow for NSFW content.

To clarify Haasie is not intended to be a specifically NSFW host - like RedGIF someone here mentioned - but rather to host any legal content, which includes NSFW.

Emphasis here is on legal. The intention behind the TOCs is to specifically prohibit the use of the service for anything illegal or otherwise prohibited by the hosting company. And to state that Haasie will act on any content that is found to be in violation of the TOCs.


thanks for the answer :) really, not trying to "gotcha", just was curious if your read of your current TOC was compatible with your goals. I didn't meant to indicate you were exclusively looking for nsfw (but I see how what I wrote implied it), just more a coffee thought while reading this thread :)

good luck on the project!


Thanks! I'm glad someone read the T&Cs after all the effort I put into them. :)


First off: great work!

The age old question is how you afford to run this, and how if you get popular do you not become an imgur yourself and have their same pressures to delete, moderate and make money.


I say we need basic universal internet infrastructure in the same way we have roads, libraries, hospitals, mail service, etc. Fundamental services like email, image and video hosting, backup storage, small cloud nodes, and so on. All partially or fully publicly funded, with no bloat or wasted overhead trying to commercialize it into an advertising platform filled with spyware and dark patterns.


It's sad to me that we went away from everyone just having their personal web space as part of their ISP package. Ditto email. It meant more smaller providers had to play nice and we didn't have a few major companies arbitrarily deciding what to host and what to drop.

I don't believe it should be government funded, but I wish there was a protocol that would let my ISP allocate a few GB of resources on my behalf that stay online even when my PC is powered down, and then I could reference them anywhere, and there's be a succession protocol in case I move to a different ISP so links don't break, etc.


You can't because some people hosted extremely popular files (often starting with p- and ending in -orn) on the few GB that your local ISP gave you. And your local ISP suddenly needed to serve hundreds of terabytes they didn't have in their peering budget.

So now you need caps. And enforcement. And exceptions. And suddenly, you have a cloud service provider. And they've invested a ton into making the place nice, and there are teams drawing salaries keeping this running, and "succession protocols" aren't making them any money.

What you're asking for is infrastructure. Humanity to date has not found a way to finance long-term working infrastructure that doesn't involve a government, because that's the only body that invests in the commons. (Yeah, sometimes we pretend we can privatize, and then you get PG&E, and everybody suffers)


Yeah, so we'd need something sort of like bittorrent-with-web-seeds for any file more than a certain size, and maybe some pre-concatenated optimization where the whole page and its assets become a single file under such a system. Maybe IPFS might be a useful layer?

Anyway, "hosting" would mean two things: 1: "I attach my name to this content and it's always locatable as a tag under my name", and 2: "I dedicate some static resources to always host this content, whether or not it's popular enough to also propagate to other nodes in the swarm, I will always seed it".

Then simply having that seed-box allocate a certain amount of its space to each subscriber would be all you'd need.

Succession protocol would have internal uses too; they could say "Hey this subscriber has some really popular content, let's move it to a faster node", and the same primitives would track that move.

I feel like most of these pieces already exist, the trouble is there's no longer an expectation for an ISP to provide hosting to customers, so providing better hosting isn't a differentiating factor. (Not that there's meaningful competition in many places anyway, which may be the root of the problem.)

Which all adds up to people not realizing there is or even could be an alternative to hosting their images on perpetually-unprofitable-and-thus-inevitably-transient services like photobucket, imgur, and their ilk.


Sorry what is PG&E? I googled it, is this a reference to the power company in California? I don’t live in California so am not sure. I thought initially maybe it was some acronym for Privatize Gains and Externalize losses?


Yep, the California power company. There's a whole long saga, but the short of it is that California decided to privatize electricity companies, that horribly imploded, and in the long run we ended up with a quasi-monopoly for a horrible company that's so bad at maintenance, they set the state on fire on a regular basis.

But sure, privatize gains and externalize losses works too :)


Shouldn't a whole bunch of upload bandwidth even out the traffic flows and make such an ISP a better candidate for peering?


Yes, but it's still not free. It's especially not free if it's unexpected load.


Doing it without government is unviable. You missed my entire point.


Why shouldn't image and video hosting be a paid service?

I don't think there ever was any free method of mass distributing information. Or storing it if someone didn't find it interesting. If you need to share something, you should burden the cost. Just because some idiots thought they had a model that didn't work doesn't mean there isn't real costs involved. And those should be paid. In free market they should approach margin.


The world lacks a good system for microtransactions to make paying the actual costs feasible.

Also I'm already paying my ISP for a connection, and so is everyone that views the things I host, so personally I hope for advances in P2P to leverage that existing already-paid-for bandwidth for distribution instead of having to toss more money on top.


Or people could pay the pittance that it would be if just a fraction of users chipped in. Why is "the government should pay for it" always a comment in every thread on HN? Is there something I haven't understood about this community.


> Or people could pay the pittance that it would be if just a fraction of users chipped in.

If only there was some way to gather and organize this rich fraction of users' contributions. Maybe we could create some kind of assembly where we discuss and vote on how to do so.

>Why is "the government should pay for it" always a comment in every thread on HN? Is there something I haven't understood about this community.

Because the purpose of government is to organize and manage society and that requires funds to do so? I'm not sure how to answer such a simple question with such a self-evident answer. How old are you?


No way do we want the internet to suck even more like those other services


You're going to need to do a lot more than that to convince me that the internet is a human right and public need.

And yes, you do need to convince me: If it's "publicly funded" then I (and everyone else) am/are paying for it. Personally, I would sooner pay for another US navy aircraft carrier.


I would say connectivity to the internet is a public need. Since so many government services are online, bank branches and post offices will keep closing and so on and so internet connectivity is to society what trains and buses are. So the government needs to ensure everyone can connect. It may be that 99% of the time there is nothing to do: John pays ISP or Telco and gets connected. But for rural or poor people they need to help people who can’t connect for whatever reason.

In terms of a government funded imgur backup: nah not convinced!


But would you pay for a another, say, French Navy or Turkish Navy aircraft carrier? There are few things which western democracies spend taxes on which don't benefit at least some proportion of the tax-paying population, and most of the cost of running those public services also goes back into private industry of the same country. A private internet company can employ foreign workers, pay foreign taxes, award dividends to foreign owners, and after all that, can be just as corrupt and expensive as a tax-funded service!


> then I (and everyone else) am/are paying for it

Don't you already pay for it?


Thank you! My thinking is to start with donations. There is no such thing as a free service. If people get value from using it to upload anonymously and they want to keep it going, makes sense to donate.

In the future I could also add a subscription account for additional features, access to an API, bigger upload limits, transformations, greater ability to organise, etc.


Is there a backup plan? Not saying that relying on donations is impossible, but realistically there is a very small chance that enough donation will flow in continuously.

As a sidenote, reddit is (was?) the one of main user of imgur, however https://www.google.com/search?q=%22haasie.com%22+site%3Aredd... yields no results - maybe it is worth to promote it there also.


Your question honestly got me thinking. If everything goes completely sideways there are a number of shuttering options I can think of.

One is to turn off any new uploads, and just serve the existing content, presumably the bandwidth on old content would be much lower - until some sustainable plan can be devised.

Last resort is to approach the ArchiveTeam mentioned here to see if and we could to setup a maintenance service.

To your side note, yes, I want to promote it on Reddit, as soon as I've added a few more features - like donations.


Could you export a list of all imgur URLs saved, so that ArchiveTeam can help save those to archive.org too?

https://wiki.archiveteam.org/index.php/Imgur


Thanks, I will check them out.


> Well done, you’ve defeated link rot!

Except you haven't, because anyone that wants to view these images where they are linked needs to know to change the URL to Haasie instead of Imgur. So all the links are still broken.

And on top of that, even if the users know about Haasie, it only works if someone went to this website before the original image was deleted, and enter the URL on a form. Non-popular images will be long gone before people do this manually. Perhaps accepting any URL and transparently fetching it from Imgur would improve this a little. From my manual testing, this doesn't seem to be the case, just gives an S3 error from bucket ID haasie01, and nothing shows up there even if I wait and refresh.

All of this is assuming this website won't go down or go bad. Probably the Archive Team, and Archive.org can handle this task better.


Interesting work! Thanks for putting this together.

One potentially problematic thing about the new Imgur policy is that it will break machine learning projects that depend on Imgur for NSFW data. For example, here's a project relying on Imgur links in order to train a classifier that detects adult content:

https://github.com/alex000kim/nsfw_data_scraper

https://github.com/EBazarov/nsfw_data_source_urls


Thanks for all the questions. It has really gotten me to thinking about how to navigate the future of the project.

I have a question for all of you. My approach is that we are all understand that there is no such thing as a free service, and that for something like this to be long term sustainable donations (or some source of revenue) is crucial. If people get value from the service then it only makes sense to give back to the service to sustain it.

With this in mind, I'm toying with an idea that I would like to get a temperature check from HN on.

How would you react if, when donations are too low, Haasie serves a single image that requests donations to sustain the service in place of the original anonymously uploaded content, until the required donation level is reached? Once there are enough donations normal service is resumed.


That makes me think something like this could be a nice phpbb plugin.

Let forum admins compile a list of linked imgur posts that should be archived. Storage and bandwidth are cheap compared to what they used to be, so some may be able to self-host images now.


Is RedGIFs not Imgur for porn?


Imgur hosts a lot of porn and was the preferred image host for some popular NSFW subreddits.


Only a matter of time until the same thing happens there.


Red gifs is a gfycat product though, not an Imgur one.


Curious why not a "recycling" approach where images that are about to expire on Imgur are simply reposted to Imgur gaining a fresh URL?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: