Hacker News new | past | comments | ask | show | jobs | submit login

That's impressive indeed, but if we boil it down to just the things that are worth saving, removing duplicates, long and pointless livestreams, long videos that are just endless loops, etc., it could be done with much less space. If we also ignore auto generated spam[1] and harmful content (Elsagate, etc.), it would require even less.

It's a nice thought experiment, but we really don't need to archive all of YT. That's why I appreciate projects like this and yt-dlp that allow me to not just archive what I'm interested in, but to watch it when and how I want, without Google tracking my every move, and interrupting every few minutes with ads. Paying for YT Premium only partially solves the second issue. I don't want to see sponsored content either.

[1]: https://youtube.fandom.com/wiki/Roel_Van_de_Paar




I would argue that backing up selectively instead of just grabbing everything makes the job harder, not easier. You might need less storage, but who will decide (and how) what gets stored and what doesn't?


In a world where backing up to IPFS would be as easy and widespread as hitting Ctrl+D to bookmark and sharing backups would not be dangerous legally, your answer gets an easy answer.

Each person gets to decide what they find important, backup and commit resources to.

We already have the sharing technology (BitTorrent, IPFS), and the backuping tech (ArchiveBox, TubeArchivist). But they are not integrated and they are not easy to configure and use for a nontechnical person. And they are unlikely to become mainstream thanks to the copyright cartel.

Many years ago there was the dream of every home having a home computer giving people ownership over their digital life. A world where everyone has their own email server, their own diaspora pod for their family, their own blog, etc. etc. and all they had to do was buy or build a box and plug it in.

Projects like freedomplug and sandstorm.io and YunoHost and others. There are even newer projects like Umbrel. And of course NASs that now have apps on them.

Arguably, NASs and Umbrel are really easy to configure and use. OMV is somewhat harder but not unreasonable.

Alas, a self sovereign world is not what people desire. People are content with letting themselves be controlled by a few megacorps. Everything is in the cloud (someone else's computer).

Also, with the rise of botnets and DDOSs, it became unfeasible to self host something public without something like Cloudflare in front.

I miss the old internet.


The current decentralized trend ("web3", etc.) becoming mainstream is a pipe dream only tech enthusiasts care about. The sad reality is that it has very slim chances of ever gaining mass adoption. The general public and non-technical users couldn't care less about owning and managing their data, running their own services, paying for services, and everything that entails. Even if they're aware that their privacy is being violated and that their personal data is sold on shady adtech markets, they see it as a cost worth paying for in exchange for the services they get for "free".

So even if all these technical solutions to problems only technical users care about become as easy to use for laypeople as modern web browsers are, the general public just won't care about it.

I've long believed that the blame for this lies mostly on early WWW architects. If the focus from the very start had been on sharing content as much as it was on consuming it, and user-friendly tools analogous to the web browser had been built, then the general public would be educated that the web works by being in control of your data, and sharing it selectively with specific people, companies, or the world. ISPs would be forced to deliver symmetrical connections to enable this, centralized services would be much less influential, and the web landscape would look much different today.

This was actually planned as a second phase in the original HyperText proposal[1], but was never completed for some reason. I'd be very interested to know what happened to this effort. If someone has insider knowledge, or can contact TBL, I'd be very grateful.

Alas, it's too late for this now. The centralized web is how most people experience the "internet", and that train has no chance of stopping.

[1]: https://www.w3.org/Proposal.html


I fully agree with what you wrote but I just want to mention that, as you also imply, the ideas behind "the current decentralized trend ("web3", etc.)", really aren't new. They just build on older ideas with regards to decentralization.

There are many ideas that fit under decentralization: torrents, the fediverse, crypto, even the old idea of the semantic web (because it was about standardized formats for metadata and carrying that metadata with the data instead of having it siloed in a central entity).

All of the hype around web3 really is only about crypto, because web3 is a marketable term for speculators.

Currently I am very cautiously hopeful about what the hype surrounding Mastodon (caused by Twitters self-immolation) will lead to.


On the contrary. Instead of everyone grabbing everything, each person would only archive what's important to them. This not only distributes the workload naturally, but serves as an implicit filter of content people find enjoyable and would actually watch, rather than archiving content nobody cares about.

But this is all hypothetical. We don't need a global YouTube archive. We need to stop using it altogether, and replace it with decentralized services. In the meantime, the existing personal archiving solutions work well.


Perhaps the backup algorithm needs to be coupled to the browser history of users. Stuff that isn't visited (or only for a few seconds) can be skipped.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: