I don't. Whilst I didn't write it particularly eloquently, I included that it would be an index-scan if there was one. And like the full table scan, this is a 50 vs 1 cost (unless the querying ids are well sorted, at which point you'd maybe get a 5vs1 cost at best).
> I have never seen a single database system where the most basic primary key A.id wasn't indexed.
Nobody has said it was a primary key. In fact it very likely isn't. All we really know is that there were approx 50k rows selected from B; we do not know how many are matched in A.
As they are using BigQuery and the queries are taking such a long time, it would be reasonable to assume A is some large dataset clustered around some other value (e.g. timestamp). But that itself would be an assumption.
> [streaming] does require you to keep a client connection open for 50x longer than batching would,
It does not. It will be less time.
---
Looking at pg.
I'm trying really hard to see a your point. As far as I can tell, you're bothered by the working set memory of the query caused by the join exceeding a limit and causing contention - this is the only time the streamed join is worse than the batching. On an index join this would have to be a very large table.
As for CPU contention - its a non-issue.
There may be a point related to time-to-execute with respect to lock contention.
Regardless, if either lock or index memory contention are problems for you then you will still want to `JOIN` - just against a subquery/cte with limit and offset.
> Why would you expect that there's no index?
I don't. Whilst I didn't write it particularly eloquently, I included that it would be an index-scan if there was one. And like the full table scan, this is a 50 vs 1 cost (unless the querying ids are well sorted, at which point you'd maybe get a 5vs1 cost at best).
> I have never seen a single database system where the most basic primary key A.id wasn't indexed.
Nobody has said it was a primary key. In fact it very likely isn't. All we really know is that there were approx 50k rows selected from B; we do not know how many are matched in A.
As they are using BigQuery and the queries are taking such a long time, it would be reasonable to assume A is some large dataset clustered around some other value (e.g. timestamp). But that itself would be an assumption.
> [streaming] does require you to keep a client connection open for 50x longer than batching would,
It does not. It will be less time.
---
Looking at pg.
I'm trying really hard to see a your point. As far as I can tell, you're bothered by the working set memory of the query caused by the join exceeding a limit and causing contention - this is the only time the streamed join is worse than the batching. On an index join this would have to be a very large table.
As for CPU contention - its a non-issue.
There may be a point related to time-to-execute with respect to lock contention.
Regardless, if either lock or index memory contention are problems for you then you will still want to `JOIN` - just against a subquery/cte with limit and offset.
Roundtripping is not the answer!