Definitely don't underestimate a good database's ability to stream large queries. One of the many ways an ORM-centric view of the world can mess you up, since ORMs have a constitutional bias towards instantiating the entire result of the query in memory. (They don't have to, it isn't completely impossible for them to stream, but even if your ORM can stream it probably doesn't take much to convince it not to, even perhaps accidentally.) It is generally better all the way around to send a single query that is everything you want, if at all possible, let the database do its thing, and then spew a stream of all the results you want at you as fast as the network can carry it, than to be sitting there constantly harassing the poor thing with tiny query after tiny query, adding latency every step of the way. Match it with code that can consume the result as a stream and you can do a lot of work without using a lot of simultaneous resources.
This does break down eventually but I feel this is another one of the several places where developers still sometimes subconsciously have an early-2000s view of the world, as if all relational databases start panting and sweating if you ask them to return more than a couple hundred rows of any kind. No, set them up with the right indexes and foreign keys and they'll happily stream gigabytes at you, without the CPU even hardly doing anything. It's just as likely to be the consuming code that is the bottleneck!
You get up to "big data" and this approach stops working but what constitutes "big data" has also gotten a lot bigger since the early 2000s. Even in the engineering-centric company I work for, a lot of engineers & management assume that things are "big data" way before they should.
The applicability of this advice definitely depends on the database, drivers, and query.
For example with PostgreSQL you need to create a cursor, then FETCH NEXT 1000 over and over again in a loop. This is a bit of a pain, but is the difference between processing as data arrives, with only small buffers everywhere, versus waiting for all data to arrive before doing anything.
What exactly you need to do and how to make it work is very much database specific.
Yeah, I wish this was more standard. SQL is not so much a standard as a skeleton of a standard. Better than nothing, maybe, but still every database I walk up I pretty quickly hit issues like this.
I'm not trying to promise that every database will stream a petabyte without a problem; I'm more trying to help people get out of an early 2000s mindset and if nothing else, check what their DB will do. A lot of old programmer's tales about how to baby old databases along are actively pessimal and unnecessary in 2022/almost 2023. Don't spend days writing code to correctly slice and dice a query into tiny pieces when you could just send it in one shot and get better performance in every way.
This does break down eventually but I feel this is another one of the several places where developers still sometimes subconsciously have an early-2000s view of the world, as if all relational databases start panting and sweating if you ask them to return more than a couple hundred rows of any kind. No, set them up with the right indexes and foreign keys and they'll happily stream gigabytes at you, without the CPU even hardly doing anything. It's just as likely to be the consuming code that is the bottleneck!
You get up to "big data" and this approach stops working but what constitutes "big data" has also gotten a lot bigger since the early 2000s. Even in the engineering-centric company I work for, a lot of engineers & management assume that things are "big data" way before they should.