QUIC's promise is fantastic, and latency-wise it's great. And probably that's what matters the most for the web.
However I have run into issues with it for high-throughput use-cases. Since QUIC is UDP based and runs in user-space, it ends up being more CPU bound than TCP, where processing often ends up being done in the kernel, or even hardware.
In testing in a CPU constrained environment, QUIC (and other UDP-based protocols like tsunami) capped out at ~400Mbps, CPU pegged at 100%. Whereas TCP+TLS on the same hardware could push 3+Gbps.
It'll be interesting to see how it plays out, since a goal of QUIC is to be an evolving spec that doesn't get frozen in time, yet baking in to the kernel/hardware might negate that.
Luckily, there are ways to reduce syscalls (like Generic Segmentation Offload and other tricks[1]). But I agree that not having things run in the kernel makes it more challenging for high-throughput scenarios.
However I have run into issues with it for high-throughput use-cases. Since QUIC is UDP based and runs in user-space, it ends up being more CPU bound than TCP, where processing often ends up being done in the kernel, or even hardware.
In testing in a CPU constrained environment, QUIC (and other UDP-based protocols like tsunami) capped out at ~400Mbps, CPU pegged at 100%. Whereas TCP+TLS on the same hardware could push 3+Gbps.
It'll be interesting to see how it plays out, since a goal of QUIC is to be an evolving spec that doesn't get frozen in time, yet baking in to the kernel/hardware might negate that.