But in a real-world application of course I have to think about the queue's capacity and bandwidth limits and monitor accordingly. Assuming infinite anything in production would be a time bomb of tech debt.
While I see where you coming from, the whole "scalability" story is about assuming infinite resources and scaling your usage as the demand spikes. Kubernetes (I think) is specialized in that regard - assuming you can create a potentially infinite number of instances, you create a system that consumes as much resources as it needs for fulfilling the demand.