Definitely don't underestimate a good database's ability to stream large queries...

btilly · on Dec 22, 2022

The applicability of this advice definitely depends on the database, drivers, and query.

For example with PostgreSQL you need to create a cursor, then FETCH NEXT 1000 over and over again in a loop. This is a bit of a pain, but is the difference between processing as data arrives, with only small buffers everywhere, versus waiting for all data to arrive before doing anything.

What exactly you need to do and how to make it work is very much database specific.

jerf · on Dec 22, 2022

Yeah, I wish this was more standard. SQL is not so much a standard as a skeleton of a standard. Better than nothing, maybe, but still every database I walk up I pretty quickly hit issues like this.

I'm not trying to promise that every database will stream a petabyte without a problem; I'm more trying to help people get out of an early 2000s mindset and if nothing else, check what their DB will do. A lot of old programmer's tales about how to baby old databases along are actively pessimal and unnecessary in 2022/almost 2023. Don't spend days writing code to correctly slice and dice a query into tiny pieces when you could just send it in one shot and get better performance in every way.