This is a good idea, I will certainly look at this. There are some planned features WRT caching coming up.
While it is probably a bug, it's probably not a serious one that many people would run into nowadays. Now that https is ubiquitous, there aren't many caching proxies around to cause grief. Probably the only proxies people will experience are where they are behind a paranoid company's firewall, one that is configured to decrypt (and then re-encrypt) all their web traffic. And in those situations, they don't tend to do caching much now. (Because even though you can cache HTTP, you'll hit problems with misconfigured sites and users will blame your proxy for it.)
But programmatically, how do you advertise alternate representations? I'm not sure. Suggestions appreciated.
Sorry, I don't have a good answer for this. I only nit-pick problems in web comments :)
You could set a HTTP header to list the available variants, but there isn't a standard AFAIK so it would only help developers who spotted the header.
But the other thing is that the HTTP client you use could decide to change it's default Accept header. If curl changed to "application/json,q=0.9;/" then suddenly you'd get json (I didn't mention in the blog post but that is also implemented)!
That's cool! Aeons ago, I was involved in developing a web server, where we added support for properly handling all kinds of content negotiation (Accept-Encoding, Accept-Language, etc), where you could configure it to deliver the right file based on the user's language, file type preference, etc. It was a large chunk of code, but in the end, nobody really used it. In theory, web browsers and sites could co-operate to deliver the right page in the right language for all their users automatically. In practice though, it never works. No-one sets up their web browser to pick the language properly (who even knows how to change it?) As a result, multi-lingual sites offer to switch languages by clicking on a link, and if they choose a default language, they mostly do it based on IP address (and assumed location)
That's my main usability case and I wanted that to be as smooth as possible.
I think it's the right choice for csvbase, my original comment reads far too critical in retrospect, it's neat that if you curl a URL, you get the csv. But if I was writing code to scrape some csv data, I would still always prefer to download URLs with a .csv extension, because you know what you are getting 100% of the time, and you avoid any unpleasant surprises if some 3rd-party library or tool changes its behaviour.
> Now that https is ubiquitous, there aren't many caching proxies around to cause grief.
Well, there are still CDNs. csvbase is designed for a public cache for some pages. I haven't done much on this except for the blog pages, which use the CDN a lot.
I also have vague plans for client libraries that include a caching forward proxy as my experience is that most people export the same tables repeatedly. Likely that will be based on etags though so that the cache is always validated.
The designers of HTTP 1.1 clearly thought a lot about a lot of things, including caches.
Thanks for your thoughts. :) Keep in touch via email if you like (same goes for anyone else reading this): cal@calpaterson.com
While it is probably a bug, it's probably not a serious one that many people would run into nowadays. Now that https is ubiquitous, there aren't many caching proxies around to cause grief. Probably the only proxies people will experience are where they are behind a paranoid company's firewall, one that is configured to decrypt (and then re-encrypt) all their web traffic. And in those situations, they don't tend to do caching much now. (Because even though you can cache HTTP, you'll hit problems with misconfigured sites and users will blame your proxy for it.)
But programmatically, how do you advertise alternate representations? I'm not sure. Suggestions appreciated.
Sorry, I don't have a good answer for this. I only nit-pick problems in web comments :)
You could set a HTTP header to list the available variants, but there isn't a standard AFAIK so it would only help developers who spotted the header.
But the other thing is that the HTTP client you use could decide to change it's default Accept header. If curl changed to "application/json,q=0.9;/" then suddenly you'd get json (I didn't mention in the blog post but that is also implemented)!
That's cool! Aeons ago, I was involved in developing a web server, where we added support for properly handling all kinds of content negotiation (Accept-Encoding, Accept-Language, etc), where you could configure it to deliver the right file based on the user's language, file type preference, etc. It was a large chunk of code, but in the end, nobody really used it. In theory, web browsers and sites could co-operate to deliver the right page in the right language for all their users automatically. In practice though, it never works. No-one sets up their web browser to pick the language properly (who even knows how to change it?) As a result, multi-lingual sites offer to switch languages by clicking on a link, and if they choose a default language, they mostly do it based on IP address (and assumed location)
That's my main usability case and I wanted that to be as smooth as possible.
I think it's the right choice for csvbase, my original comment reads far too critical in retrospect, it's neat that if you curl a URL, you get the csv. But if I was writing code to scrape some csv data, I would still always prefer to download URLs with a .csv extension, because you know what you are getting 100% of the time, and you avoid any unpleasant surprises if some 3rd-party library or tool changes its behaviour.