Hacker News new | past | comments | ask | show | jobs | submit login
The curl-wget Venn diagram (haxx.se)
285 points by TangerineDream on Sept 4, 2023 | hide | past | favorite | 150 comments



I would also add at least "sane default options", "continues downloads" and "retries on error" to the Wget column. I recently had to write a script that downloads a very large file over a somewhat unreliable connection. The common wisdom among the engineers is that you need to use Wget for this job. I tried using curl but out of the box it could not resume or retry the download. I would have to study the manual and specify multiple options with arguments for this behaviour that really sounds something that should just work out of the box.

Wget needed one option to enable resuming in all conditions, even after a crash: --continue

Wget's introdcution in the manual page also states: "Wget has been designed for robustness over slow or unstable network connections; if a download fails due to a network problem, it will keep retrying until the whole file has been retrieved."

I was sold. Even if I by some miracle managed to get all the options for curl to enable reliable performance over poor connection right, Wget seems to have those on by default and the sane defaults make me believe it will also have this expected correct behaviour enabled even for those error scenarios that I did not think to test myself. Or - if the HTTP protocol ever receives updates, newer versions of Wget will also support those by default, but curl will require new switches to enable the enhanced behaviour - something which I can not add after the product has been shipped.

To me it often seems like curl is a good and extremely versatile low-level tool, and the CLI reflects this. But for my everyday work, I prefer to use Wget as it seems to work much better out of the box. And the manual page is much faster to navigate - probalby in part due to just not supporting all the obscure protocols called out on this page.


I agree on the "sane defaults".

Just the fact that `wget url` downloads a URL and saves it makes it a winner for me in command-line use.


Well, the point of the article is that they are not cpmpetitors and are used differently. For me, 99% of the time i'm curl-ing some API and I definitely don't want to save the result to disk (but often want to pipe it to grep/jq).


Agree. I also use both: wget to download files, and curl to talk to APIs, and as a "no magic" HTTP client.

This is not about "sane defaults", but about use cases.


> I tried using curl but out of the box it could not resume or retry the download.

Maybe I'm misunderstanding, but curl has exactly that feature, it's the `-C` flag. If you want retries, there's `--retry`. I find curls defaults pretty sane, personally, I wouldn't want either of those by default for a tool like curl.


You recall incorrectly. curl's -C flag does not work as-is. You must specify the offset from where it should continue. Why doesn't it take the resumed file's existing length as the guess by default? What else could the user want outside of some very exotic cases?

Yes, I want retries. They should be the default for a user-facing tool. Try searching the curl's manual page for "retry". There are no less than 5 different interdependent flags for specifying retry behaviour: --retry-all-errors, --retry-connrefused, --retry-delay <seconds>, --retry-max-time <seconds> and --retry <num>

If I "just" want it to retry, surely there's a boolean flag like "--retry" that enables sane defaults? Nope! --retry takes a mandatory integer argument of maximum number of retries. Surely I can set it to zero for a sane default? Nope again: "Setting the number to 0 makes curl do no retries."

curl is a good tool if you know it through & through and want exact control over the transfer behaviour. I don't think it's a good tool if you just want to fetch a file and except your tool of choice to apply some sane behaviours for you to that end, that would probably make sense if you are a human rather than an application using a library.


> You recall incorrectly. curl's -C flag does not work as-is. You must specify the offset from where it should continue. Why doesn't it take the resumed file's existing length as the guess by default? What else could the user want outside of some very exotic cases?

But... it does, though. From the man page ( https://curl.se/docs/manpage.html#-C )

> Use "-C -" to tell curl to automatically find out where/how to resume the transfer. It then uses the given output/input files to figure that out.

I just tried it, works perfectly. I don't really see a difference between writing `--continue` in wget and `-C -` in curl. And the use case for specifying it is not so exotic, you might want to do range requests for all sorts of reasons.

Look, it's fine if you prefer wget's command line: I don't think retrying a request is a reasonable default for a tool like curl, but reasonable people can disagree on that for sure. But curl is perfectly capable of resuming downloads automatically, you're just (very arrogantly) wrong on that one.


No, `-C -` is not a flag. It is specifying the `-C` argument with obscure special value of `-`, which causes curl to determine the offset to continue from the output file length. This might be obvious to you if you are well-versed in curl command line, but it's by no means expected or obvious like a simple flag.

> But curl is perfectly capable of resuming downloads automatically, you're just (very arrogantly) wrong on that one.

I've never claimed it doesn't. I've only demonstrated that the default options don't do it and enabling the behaviour is more difficult than it maybe should be for a simple tool. I fail to see the arrogance.


> I've never claimed it doesn't.

Yes you did:

> You must specify the offset from where it should continue

No, you mustn't, you can specify - and it does exactly what you want. The docs are very clear and even provide examples. At some point you should stop blaming curl for your inability to read a man page and admit that you were simply mistaken.


You still fail to understand that curl's -C does not behave as a simple flag but as an switch with a mandatory argument. And there's a magic special value for that argument that finally enables the expected behavior. It's unintuitive, hard to remember and not in agreement with usability. While I agree that curl is powerful I will not concede that it's CLI is user friendly.


It's not hard to remember if you're familiar with unixs tools and syntax. But no one is demanding you concede anything. The point of the conversation is explain the difference in expectations between how you expect a command line tool to act and how most people expect a command line tool to act. If I try to `cp src dest` and it fails. I don't want the tool to guess how to fix the issue, that's not it's job. Ditto for `dd` it shouldn't try to guess offsets. curl exists as a knife, you're expecting it to do the job have a food processor and blender combination. You're not wrong to expect a tool to behave that way you're wrong to expect curl to behave that way. And no one is asking you to concede that a knife is easier than a blender when you want a blender. everyone is pointing out it's not a blender it's a knife but you can still do everything you want to do.


> It is specifying the `-C` argument with obscure special value of `-`

It is not an “obscure special value”. Not only is `-C -` (or `--continue-at -` for the long form) well documented in the correct place in the manual, `-` is a common value in command-line tools (e.g. when specifying that a tool’s input will be STDIN instead of a file).


In what sense is "read offset to continue at from STDIN" a meaningful interpretation of `-C -`? That's not what it does.


That’s not what I said it does. An example (marked by the use of “e.g.”) can be something similar.


What is a sane default for retries? is it to loop indefinitely? should it retry against the same TCP connection or establish a new one? To the same IP it picked the first time or a different one, or reresolve the DNS entirely? against the same resolver?

IMO theres too much complexity for 'sane defaults' to not just be 'surprising behavior' for someone else's use case.


It's indeed miraculous how Wget got this right


Not only are you wrong, I had the opposite experience - whether was all well and good for downloading pages, but it fell over for making e.g. authenticated POST requests. I think that might be possible now/have gotten better, but wget has always been too restrained for me.

I am in agreement that wget has "sane" defaults i.e. it acts like a bot or web crawler, or basic browser. Curl has always been easier to get things done with, though. At least in the land of http requests.


I thought that `-C -` checks the output file for the offset


Yes it does. How does the wget --continue work?


I don't know


Strong agree. The only misbehaviour I believe curl displays out of the box is globbing, which has burned me enough times that I’ve come to believe it would’ve been better disabled by default and enabled with -g instead of vice versa.


do you have an example of the globbing that burns you?


Also add -i which lets wget read URLs from a file. In particular wget -i - which makes it read from standard input, and is very useful in pipelines.

curl cannot, AFAIK, do this. People usually suggest using xargs, which is a mediocre substitute because it waits for all the URLs to arrive before invoking curl, giving up any chance at parallelism between the command generating the URLs and the one downloading them.


`--config filename` allows this. `--config -` for stdin. Not only urls, but any config options

  echo '--url https://google.com/' | curl --config -


xargs doesn't have to wait, you can specify the number of items to include in a single sub-command and it'll batch things as they come in. For instance:

    ds@swann3:~# (for x in {1..100}; do sleep 0.1s; echo $x >&2; echo $x; done) | xargs -L5 echo
    1
    2
    3
    4
    5
    1 2 3 4 5
    6
    7
    8
    9
    10
    6 7 8 9 10
    11
    12
    [... and so on ...]
If the xargs call uses -I then --max-lines=1 is implied anyway.

If you replace echo with something that sleeps you'll see that the pipe doesn't stall waiting on xargs so the process producing the list can keep pushing new items to it as they are found:

    ds@swann3:~# (for x in {1..100}; do sleep 0.1s; echo $x >&2; echo $x; done) | xargs --max-lines=5 ./echosleepecho
    1
    2
    3
    4
    5
    starting 1 2 3 4 5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    done 1 2 3 4 5
    starting 6 7 8 9 10
    15
    16
    17
    18
    19
    [... and so on until ...]
    98
    99
    100
    done 46 47 48 49 50
    sleeping for 51 52 53 54 55
    done 51 52 53 54 55
    sleeping for 56 57 58 59 60
    [... and so on until xarg's stdin is exhausted]

And you can stop the calls made by xargs being sequential too for more parallelism with the --max-procs option (or use parallel instead of xargs):

    ds@swann3:~# (for x in {1..100}; do sleep 0.1s; echo $x >&2; echo $x; done) | xargs --max-lines=3 --max-procs=10 ./echosleepecho
    1
    2
    3
    sleeping for 1 2 3
    4
    5
    6
    sleeping for 4 5 6
    7
    8
    9
    sleeping for 7 8 9
    10
    11
    12
    sleeping for 10 11 12
    done 1 2 3
    13
    14
    15
    sleeping for 13 14 15
    done 4 5 6
    16
    [... and so on ...]
(I adjusted max-lines in that last example because my current timings made things line up in a manner that made the effect less obvious, adjusting the timings would have been equally valid, in a less artificial example like calling curl to get many resources timings will of course be less regular, perhaps these examples can be improved by randomising the sleeps)

I'm not sure what you would do about error handling in all this though, more experimentation necessary there before I'd ever do this in production!


Reply to self to add a note of something that coincidentally came up elsewhere¹ and is relevant to the above: of course xargs being able to push existing things forward while the list of actions is still being produced relies on it getting a steam of the list instead of the whole thing in one block. If your earlier stages cause a pipeline stall it can't help you.

For an artificial example, change

    (for x in {1..100}; do sleep 0.1s; echo $x >&2; echo $x; done) | xargs -L5 echo
to

    (for x in {1..100}; do sleep 0.1s; echo $x >&2; echo $x; done) | sort | xargs -L5 echo
The sort command will, by necessity, absorb the list as it is produced and spit it all out at once at the end. xargs can still use multiple processes (if max-procs is used) to make use of concurrency to speed the work, but can't get started until the full list is produced and sorted.

----

[1] An unnecessary sort in an ETL process causing the overall wall-clock time to increase significantly


Right, but then you are invoking curl several times, and so not reusing a single connection, as you would with wget -i, so it still loses.


Valid criticism, ish, but that wasn't in what was previously asked for so well done on being like my day-job clients and failing to specify the problem completely :)

You can specify multiple URLs on the same command in curl so using xargs in this way would do what you ask to an extent (the connection would be dropped and renegotiated at least between each batch) as long as you don't use any options that imply --max-lines=1.

With the --max-procs option you could be requesting multiple files at once which may improve performance over wget -i – though obviously take care doing this against a single site (with wget -i too for that matter) as this can be rather unfriendly (if requesting from multiple hosts this is moot, as is the multiple files-from-one-connection point).


Both tools have their use-cases. I think with the advent of LLMs like ChatGPT it has also become a lot easier to get the proper command line incantation for whatever tool you're using. Even if you've read the manual previously, it's easy to forget the exact flags you want to use, and validating the generated command line is usually less effort than having to build it up yourself from scratch by reading the manual.

With the modern web, sometimes it's easier to use a tool like Puppeteer from a custom script. Especially if the sites you're interacting with are using a lot of JS.


I just look at my curl / wget note for some sane examples. i cant believe one needed chatgpt to be productive


You don't need much of anything, it's just another tool to help people figure stuff out. If you've already invested the time in reading through the manuals for these tools then maybe you don't get much out of something like ChatGPT, but consider that there's thousands of new people entering the industry every year and the number of tools which they're expected to learn to use has increased over time.


It's fine when you've read the man page at least once. If you just cowboy your way through every task, you'll never even be aware of what your tools can do. Which results in comments we've all seen: "I can't believe curl/Firefox/vim/readline/whatever can do something like that!" about something so trivial that every good poweruser/sysadmin has known it for decades (because it's in the man page).


Sure, but that's knowledge you pick up over years. You can't reasonably expect people to frontload all of it on the first pass with every single tool they interact with, especially when starting off.

I've read through most of the man pages of every tool I use at least once, but it has taken me years, and I've done it incrementally.


Personally I use my memory. It's a bit disturbing that some people can't write code without reading the reference for basics.


I just memorise the man page. I can't believe etc etc etc

(Seriously, there's a million nix flags, and only so many brain cells. ChatGPT's better than Google for simple "what's the magic incantation?" searches, and laziness is a virtue. If you don't want to be lazy that's fine, but I think you're missing out).


If chatgpt makes someone more productive, then why not. It's just a tool as any other tool.


I agree with this sentiment. But I also have personal antidotes that make me cautious about it. When I was a child I memorized dozens of phone numbers of people I called a couple times a month. Now my phone memorizes any phone number for me. We used to memorize poems and such. I can't give you any scientific reason why it is good to have memorized the Gettysburg Address. But I feel intuitively that there is some benefit to exercise your brain similar to the intuition I have about exercising muscles.

If a person thinks they benefit from 100 situps a day, I'm not going to disagree. And if they think there is some benefit in reading man pages, well, thanks to those who take the time to write all that ducumentation.


When it backfires you're screwed, I have experienced it one using curl -L


I would love to see your curl / wget note if you don't mind.


I like tldr for quick examples. If it doesn't have what I need, then I fall back to phind or some other llm. Nice thing about tldr is there's an offline version.

https://tldr.inbrowser.app/pages/common/curl

https://tldr.inbrowser.app/pages/common/wget


Another pet peeve of mine is that curl's URL parser is a lot more strict compared to wget.

For example:

  $ curl -sSLOJ 'example.com/file name.txt'
  curl: (3) URL using bad/illegal format or missing URL

  $ curl -sSLOJ 'example.com/file%20name.txt'
  $ ls
  file%20name.txt
On the other hand, wget (without any additional flags) will produce a file called "file name.txt" for both URLs. Well, technically you also need to add a --content-on-error flag to wget because this example URL 404s.


I thought curl had the ability to encode html entities if you asked it to?


HTML entities (&.....;) are distinct from URL encoding (%..).


Retry with `wget` was one of the most incredible Linux distro included features when I started running it at home. Pretty crucial thing on 56K dialup, and it worked better than the Windows tools I was aware of at the time.


Our dialup used to disconnect every 2 hours by design. wget and wvdial were the only alternative to mail-ordering CDs


We were way out in the sticks, so interruptions were very common. Still nothing but dialup, satellite, and LTE (no 5G) out there.


I think our setup was very particular to the UK. We didn't (and still don't?) have free local calls like the US, so we paid per minute for ISPs.

Almost all ISPs went through a scheme setup by British Telecom - you could either have free internet but you paid for your calls, or you could pay for your internet, and have access via a freefone number - so effectively flat-rate.

But the flat-rate option disconnected after two hours, on the dot. Which was hugely frustrating because we had a voicemail variant that was hosted by the telco, and let you know you had messages waiting by pulsing the dialtone. And my modem did not recognise the pulsed dialtone as a valid dialtone, and refused to connect until we called the number and marked them read.

Which lead to one of my most UK-centric retro stories. I tried to connect to the internet, and it refused to dial. I blew away my wvdial config, and it refused to dial. I blew away my ppp config, and it refused to dial. I grepped / for the error message and it didn't exist. I ended up blowing away my OS (and accidentally installing onto the wrong drive, and blowing away everything non-OS too), and it still wouldn't dial.

So I dragged the modem & extension cord to my mother's PC, and shot off a mail to my preferred mailing list (one hosted by John @ linuxemporium, my preferred source of mail-order distros), and swiftly received the response that in order to be certified by BT to operate on their network, one of the rules equipment had to obey was to refuse to redial the same number x many times. And that all I needed to do was power-cycle the modem. Which I'd done by dragging it upstairs to my mother's PC. And my own machine had been wiped twice over needlessly.

Aside, there was a lady named Helen on that mailing list who knew everything about everything, and is everything I aspire to be today. She had opinions on which harddrives best survived salt/sea air, and why they weren't deathstars. Just an incredible amount of lived experience. I miss mailing lists.


Interesting information w.r.t. UK telephone practices! In the USA, it was usual to get free local calls, so ISPs would set up modem banks to try and get maximum coverage for a given NPA-NXX range. There was some arrangement with CLECs and ILECs where it was extremely profitable for IIRC CLECs to pass data-only calls through to ILECs, so one or the other was subsidizing a lot of the early dialin ISPs, to the point of buying them modem banks and whatnot!

That was probably one of the biggest death-bringers for the BBS era, no more long distance calls to get to what you wanted.

> Which was hugely frustrating because we had a voicemail variant that was hosted by the telco, and let you know you had messages waiting by pulsing the dialtone. And my modem did not recognise the pulsed dialtone as a valid dialtone, and refused to connect until we called the number and marked them read.

Yeah, some VM providers in the USA did that too, and it similarly confused modems. It's called "stutter dialtone" here, and the usual fix was to put some delay elements in the dial string, which were commas for Hayes command set modems.

> and why they weren't deathstars

They sure did earn that name! I was so hesitant to switch to HGST for ZFS pools due to my 90s/2000s deathstar experiences. Wouldn't run them in production for a while, of course now that I'm over it and trust them as well as any other enterprise brand, they'll screw it up again!


For downloading large files I would rather just use aria2c TBH.


I only used aria2c once a long time ago, but it was awesome for huge files. As I recollect one thing it does is download different sections of a file in parallel over multiple connections, which speeds up downloads from servers that throttle per connection.


Aria2c currently looks unmaintained https://github.com/aria2/aria2/pulse



Also following redirects by default. "Run curl, be confused run curl -L" is something I do many times a month.


I mean, it all depends on who you are and what you are primarily doing. “Sane defaults” for one person could be everything another doesn’t want.


One point where I would argue wget doesn't have a sensible default is on filenames - it really should make --content-disposition the default, at least for single file downloads. Otherwise it will often use the wrong name if there is a redirect or similar in the chain, which seems increasingly common.


I'll argue not trusting the server to dictate the saved file name is the only correct default behavior, and taking the filename from the user input (URL) is reasonable.


A server configured with a docroot to serve a static site will map requested URLs to hierarchical filesystem paths, but that isn't the only possibility; it's a common but quite loose coupling of ideas.

But the filename directive of the content-disposition response header is entirely coupled to the idea of a filename. Therefore, it ought to take precedence.


I don't want the stranger deciding to save the content as .bash_profile or such.


Fair enough, but do you have that fear when using a browser? I guess a Downloads folder is lower stakes than whatever other working directory you're wgetting from, though.


Also (annoyingly) URL params


curl is an excellently powerful library and utility but I agree that wget has better defaults. I am almost certain to get the behavior that I want by just throwing a URL at wget, including retrying from the point where it had an issue. I actually ran into a case where our corporate firewall was a little too eager to block a download being performed by the Visual Studio installer because of a signature match partway through a specific download. All I had to do to grab the file was have wget download it. No magic incantations, it was just smart enough to not start the download from the beginning after being cut off, and since it started midway it no longer tripped the signature match rule.


Indeed - and curl requires `-L` to follow redirects whereas wget just does that by default too. So for ad-hoc CLI use, I turn to wget rather than remember all the curl options required.


I think of it like this: curl makes HTTP requests, wget downloads files.

(Though both can be made to do the other thing in some capacity)


I would use GNU Wget2 for this. It's supposed to be a wget successor.


For many of us, I bet the key distinction is "the one that writes to stdout by default" vs "the one that makes a file by default".


or "the one that can be piped to `sh` by default" ;-)


For me the killer feature of wget is that by default it downloads a file with a name derived from the url.

You do:

    wget url://to/file.htm
and a file named "file.htm" appears in your cwd.

Using curl, you would have to do

    curl url://to/file.htm > file.htm
or some other, less ergonomical, incantation.



Also -OJ : with -O you get the name derived from the URL (the initial one, I think, even if redirects are being followed), with -OJ you get the one from the Content-Disposition header or the final URL, the way browsers do it. Of course, plain -O is safer. (For parity with Wget, you might also want to add -R to set the downloaded file’s mtime according to the Last-Modified header.)


Wget will give you an equivalent with the --content-dispisition flag. I would like for it to be the default, but it would likely break backwards compatibility with some scripts that except a different output filename.


-OJ is nice but doesn't work with -C- for some reason


True. AFAIU the reason is that Curl wants to make a single request (modulo redirects), whereas making -OJC- work would require two: issue a HEAD to receive the Content-Disposition header and learn the file name, then look at that file and see how long it is, then issue a GET with a Range header to request the suffix you still need to download. With other methods I don’t think you could make this work at all. I don’t know if Stenberg is opposed to a GET-specific solution, perhaps that could be a fun project. (Although I’ve encountered noncompilant servers that couldn’t handle HEADs.)


> curl -O

Yes. But the GP said by default.


No, OP said…

> you would have to do `curl url://to/file.htm > file.htm` or some other, less ergonomical, incantation

… which begs the conclusion that OP is unaware of `-O`.


I'm well aware of curl's -O (long form: --remote-name), but it has an unfortunate clash with an option of the same name from wget (long form: --output-document). These options have closely related yet very different meanings. Using these options without looking at the manpage fills me with a sense of dread that I prefer to avoid.


man curl | grep -C 2 " -O"

--remote-name-all This option changes the default action for all given URLs to be dealt with as if -O, were used for each one. So if you want to disable that for a specific URL after --remote-name-all has been used, you must use "-o -" or --no-remote-name.

alias curl='curl --remote-name-all'


I was obviously responding to the curl command they posted. I didn't say anything about default behaviour.


I much prefer the fact that curl doesn't do this by default (but has the option), it much closer matches the behavior of most unix-y tools. Makes it so much easier to put it into pipelines.


What's "better" depends on whether you 'just' want to download a file, or if you want to build an larger workflow.

For me, I usually want to download files and I'm usually not doing any more processing, so I tend to type in *wget" first.


I usually use `curl -OL <url>` or `curl -L -o <local-filename> <url> instead of wget.

wget is better for download lists, but I'd accept the argument that a simple shell script is similarly easy.


Yeah, but:

    wget "url://to/file.htm?uid=foo&q=bar&rnd=4"


`&unused=.htm`. It usually works.


At this point you might as well use the -o option (-o file.htm). It's easier and easier to understand.

I'd prefer wget to be a bit more clever when handling URLs query strings though, but I guess changing this behavior now might break some scripts.


The -O option, not the -o option. The capital O sets the output file, while the small o in your comment sets the log filename.


Yep, thanks for the correction. I meant big -O, I don't know how I ended up writing small -o.


well, depends on the usecase. sometimes you want the whole url, like when i want to mirror a site and it has stuff like foo.html?page=1 foo.html?page=2 ...

wget does have options to use the name proposed by the server, and so another option to remove the query arguments would be useful, and in line with those.


A new option to strip query parameters from the output filename would be interesting. But its not so simple. When combined with recursion, one will often see a lot of pages with the same name but different query parameters. How should they be stored on disk? There's a couple of different issues I can think of.

However, if the potential issues can be resolved with sane defaults, I think this would be a great new switch to add.


yes, exactly. i think that the option would have to be ignored when doing recursion. or alternatively use the .1 .2 ... method like with all cases where a file of that name already exists.


So... you're adding more noise to the filename?

What?


It's a simple solution to give the file the right extension, and preserving query parameters can be the right thing to do if you hit the same path repeatedly e.g. for pagination.


> It's a simple solution to give the file the right extension,

Oh, I see now.

Do you work with many tools that can't work with files if they don't have the "right" extension? I thought that was mostly a Windows problem.


I’ve always seen this as a misfeature of wget, on the general principle that command-line utilities should write their principal result to stdout unless otherwise instructed.


or curl -O which is more ergonomic

that "killer feature" for cat would be turn `cat file.html` into `cat file.html > file.html` which means if you actually wanted to cat instead of cp you'd also need `cat file.html -o -` kinda glad curl doesn't have that killer feature.


Daniel Stenberg is among those rare breed of developers who put their heart and soul into their creation, a fading trait in the modern world of big tech that shadowy developers seem to be replaceable cogs of a money-making machine.

It's as if he treats curl as his mark on the world of IT.


Free software is full of people like this. That's why I use free software even if it's technically inferior. Of course, a lot of it is actually technically superior these days which makes it an even easier choice.


Maybe if you work for a company you don't put your heart into your creation, but if you have a popular personal project that brings you a lot of cash I'm sure you'll be as dedicated as him.


Curl is a money maker? It makes me very happy to hear that. I hope it is true.


From Daniel’s homepage¹:

> I work for wolfSSL doing commercial curl support. If you need help to fix curl problems, fix your app's use of libcurl, add features to curl, fix curl bugs, optimize your curl use or libcurl education for your developers... Then I'm your man. Contact us!

From Wikipedia’s wolfSSL page²:

> In February 2019, Daniel Stenberg, the creator of cURL, joined the wolfSSL project.

Given that, saying cURL is “a popular personal project that brings [Daniel] a lot of cash” seems like a bit of a stretch.

¹ https://daniel.haxx.se

² https://en.wikipedia.org/wiki/WolfSSL#History


Seems maybe dated. For example, it excludes both of these from wget in the diagram

> HTTP PUT

wget --method=PUT --body-data=<STRING>

> proxies ... HTTPS

wget --use-proxy=on --https_proxy=https://example.com

Curl consistently has more options and flexibility, but there's several things on the right side of the venn diagram where wget does have some capability.


Looks like there's FTP support also (based on the man page).


Ok, wow, I didn't know that curl supported so many protocols - but the fact remains that that small intersection area is probably what > 90% of curl/Wget users are using the tools for. So, from a developer's perspective, the overlap is not that big, but from a user's perspective it might appear much bigger...


The best part of the post for me is:

"""I have contributed code to wget. Several wget maintainers have contributed to curl. We are all friends."""


Mandatory mention for the comparison made by Daniel Stenberg

https://daniel.haxx.se/docs/curl-vs-wget.html


This newer comparison is also by Daniel Stenberg and is hosted on the same domain, but it's on his blog instead of the curl docs.


Ooops. There was also link to this on the original post.


In the olden times we used wget when we wanted to mirror a website. It is a specialized tool.

Curl is a general purpose request library with a cli frontend (also used embedded from other programs, or as a standard library API in PHP etc).


Personally I'm a fan of httrack for mirroring, although wget has some href/src translation capabilities that are occasionally a better match for particular goals.


I guess the most common usage is the overlap between the two. That's why I'd love to see a Venn diagram of where (OS and docker images) each is installed by default!


What are happy eyeballs in the curl circle ?


A fast(er) & standards compatible way of doing ipv6+ipv4 connections.

- https://en.wikipedia.org/wiki/Happy_Eyeballs


Interestingly, it would make more sense for Wget supporting these, since it is more of an "user" agent.


Looks like wget 2 introduces an equivalent to libcurl: libwget [0][1].

[0]: https://gitlab.com/gnuwget/wget2

[1]: https://en.wikipedia.org/wiki/Wget#Wget2


wget's "downloads recursively" is worth half the features of curl.


I read “downloads recursively GPLv3 licensed” and wondered whether even Stallman would really claim that a file downloaded by wget becomes retroactively GPLv3.


I've never seen them as competitors!

wget is my goto if I need to download a file now, with the minimum of fuss.

curl is used when I need to do something fancy with a url to make it work, or when I'm fiddling with params to make an API work/debug it.


It's even in the name: wget will do everything reasonable to just get something to you, while curl will do a huge portion of things that use URLs.


curl is the ffmpeg of url fetching


Couple more things wget can do that curl can't.

1. wget can resolve onion links. curl can't(yet). You'll get a

    curl: (6) Not resolving .onion address (RFC 7686)

2. curl has problems parsing unicode characters

    curl -s -A "Mozilla/5.0 (Windows NT 10.0; rv:102.0) Gecko/20100101 Firefox/102.0" https://old.reddit.com/r/GonewildAudible/comments/wznkop/f4m_mi_coño_esta_mojada_summer22tomboy/.json
will give you a

    {"message": "Bad Request", "error": 400}
 
wget on the other hand, automatically converts the ñ to UTF-8 hex - %C3%B1 - and resolves the link perfectly.

I've searched the curl manpage and couldn't find a way to solve this. Please help.

I'm having to use `xh --curl` [1] to "fix" the links before I pass them to curl.

[1] https://github.com/ducaale/xh


Curl can fetch over imap/imaps? Can I use it to download and back up my entire mailbox?


Yes, you can :)


This is... wild


For the FreeBSD users out there also ‘fetch’ is available.

Don’t know what the advantages/disadvantages are, but it comes with the default install. It’s usually what I use.


curl is a connection tool while wget is an app. At a basic level they do the same thing, but they excel in different areas.

This diagram is clearly and unapologetically biased towards curl. Feels strange that the author of curl doesn’t know what wget actually offers.


I recently found [axel], which is very impressive wget-like tool for larger files.

[axel]: https://github.com/axel-download-accelerator/axel


Used to use parallel downloading of axel a lot 20 years ago whe it’s long fat pipes, window scaling wasn’t always enabled, and it wasn’t on the corporate proxy I had to use.


On the cURL side; ridiculous manual

I regularly forget the order for the values for --resolve, try searching for that word and figuring it out quickly

I've been relegated to grepping a flippin' manual


A trick I've found useful when searching large man pages for a flag --foo is to search for `␣␣--foo` (note the two leading spaces). In my experience this always hits the line where the flag is defined instead of irrelevant mentions of that flag, and it's faster than paging through the manual by hand.


Ah, good call - I've tried a variation of that with one space and been left disappointed; two works a treat!


Or for something more flexible: ^ *--foo. But unfortunately this kind of pattern will be defeated by -f, --foo indexes.


Another thing I forget that wasn't supported in wget (but worked in curl) last I checked: IPv6 link-local address scopes (interface names on linux).


I find wget is more likely to be on a given system than curl by default so I usually reach for that first. But I am squarely in the middle of the venn.


On macOS, curl ships by default but wget does not.


Apple will not ship any GPLv3 code.


Same on Windows


don't forget the weekly security fix on the right side ;)


Curl is very widely used and has a ton of features which means that it gets a lot of CVEs, but their severity is often significantly overstated for users outside of specific niche configurations - for marketing purposes, it’s nice to be able to say that you found a HIGH in libcurl without mentioning that it only affected Windows domain authentication on ARM. The lead developer has written about this providing a lot of noise without much tangible security benefit:

https://daniel.haxx.se/blog/2023/08/26/cve-2020-19909-is-eve...


Looks like cURL and SQLite have the same woes: https://www.sqlite.org/cves.html

Previously I worked on an open source project that pulled in many third party libraries. Users would run their corpo vulnerability scanners on the project and find dependencies with open CVEs and demand fixes, not understanding that in our usage of the libraries, the vulnerability is not exposed.

I think in 4 years, we had users open roughly 50 issues like this, which corresponded to exactly 0 real world exploitable issues.

A central vuln DB makes sense for sysadmins, but too many make it the end-all-be-all.


I think this ends up devolving to Goodhart’s law: once CVEs became marketing, a ton of people had a huge incentive to game their stats at the expense of everyone else’s time.


I don't know about weekly but the security record seems alright for something as complex as curl: https://curl.se/docs/security.html



I have never seen an example of curl working with SFTP. Does anyone know or have used curl over SFTP?


warc support is smth wget specific, worth mentioning


Can anyone explain "happy eyeballs"? Did find one page about it, but wasn't 100% clear what the use case for it being an option was, or where on earth the name came from...


Happy Eyeballs makes a simultaneous connection over IPv4 and IPv6 to an HTTP server, and uses the first connection that gets a server acknowledgement. This is useful because many networks have noticeably different response times for IPv4 and IPv6, and many have one of them configured but not working properly (usually IPv6).

Without Happy Eyeballs web browsers can be slow fetching some web pages, for some users, waiting for a request timeout on IP addresses that don't work before trying one that works, or working but with the slower IP.

It's called Happy Eyeballs because it improves the visible page load time in web browsers for many users.

https://en.m.wikipedia.org/wiki/Happy_Eyeballs


Thanks...so curl always does this but wget never does? I got the impression initially it was an option.


Neat. Love it.

Is there a feature matrix to Venn diagram converter?

(Deep down) on my To Do list is comparing Ansible, Puppet, Chef, Docker, etc.

Which ultimately means some kind of feature matrix, right?

With a converter, we'd get Venns for free.


To me the real takeaway here isn't related to wget or curl. It's related to using the right tool for the job, whatever that is.


Does the webpage parsing functionality of wget only come into play when doing something like an entire site backup?


For the intersection area, I see no reason to use curl or wget over requests / urllib. Assuming one is inside a script.


Within the python ecosystem, I find httpx to be more similar to curl, and requests to be more like wget. For example, when following redirects or handling connection issues.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: