It wouldn't surprise me if the O(n log n) sorting solution is faster than the O(...

ghthor · on Nov 14, 2023

How much memory does the sort command end up using; 2N?

throwaway81523 · on Nov 15, 2023

It uses whatever amount of RAM you tell it to. I think the default is 1MB, which is way too small. It uses external sorting which means it uses O(1) RAM and O(N) temporary disk space. Oversimplified: it reads fixed sized chunks from the input, sorts each chunk in RAM and writes each sorted chunk to its own temp disk file, then merges the sorted disk files. If there are a huge number of temp files, it can merge them recursively, converting groups of shorter files into single longer ones, then merging the longer ones. I'd set the chunk size to a few GB depending on the amount of ram available.

That is basically how everything worked back when 1MB was a lot of memory. The temp files were even on magtape rather than disk. Old movie clips of computer rooms full of magtape drives jumping around, were probably running a sorting procedure of some type. E.g. if you had a telephone in the 1960s, they ran something like that once a month to generate your phone bill with itemized calls. A lot of Knuth volume 3 is still about how to do that.

These days you'd do very large sorting operations (say for a web search engine indexing 1000's of TB of data) with Hadoop or MapReduce or the like. Basically you split the data across 1000s of computers, let each computer do its own sorting operation so you can use all the CPU's and RAM at the same time, and then do the final merge stage between the computers over fast local networks.

I've used the Unix sort program on inputs as large as 500GB and it works fine with a few GB of memory. It does take a while, but so what.