Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

By throwing away TCP, they are throwing away decades of optimisations, and hardware offloading that network hardware makers made to handle TCP well

Indeed. I work at Netflix on optimizing cpu efficiency on our Open Connect CDN nodes, largely to reduce power use and capital expenses. We use FreeBSD, ngnix & TCP, and make heavy use of offloads like async sendfile(), TSO, LRO, kTLS and more recently hardware kTLS offload.

Right now, I have a single socket 32c/64t AMD Rome server delivering over 350Gb/s of real Netflix customer traffic. This traffic is all TLS encrypted, and is served across hundreds of thousands of TCP connections.

From measurements we've done, current QUIC would cost about 3x as much as TCP when using software crypto. So my back-of-the-envelope guess is that this box would do about 77Gb/s with QUIC (230Gb/s is the limit when disabling hardware TLS offload and using software crypto).

Are the benefits of QUIC really worth an a 4x increase in the amount of energy required per stream?

Once QUIC has optimizations similar TCP in place, the story will obviously be different. But we're not there yet.



At the moment, I think QUIC has a lot more benefit for many small requests, rather than for one very large streaming request.

QUIC would have the advantage that you could maintain that one stream and multiplex both video data and control information over it without the problems you'd encounter doing so on HTTP/1.1 or HTTP/2, but that definitely isn't worth the performance loss you'd get by deploying it today.

I think it makes perfect sense for you to wait for QUIC to be more heavily optimized. Once optimized, I think QUIC has the potential to be even faster, but it isn't there yet for your use case. (For that matter, some of that optimization is work you've put into optimizing existing HTTP and TCP, so it isn't surprising that your existing optimized stack beats current QUIC.)


QUIC is already getting those optimisations as the existing large deployments are incentivised to do so, as one example Kazuho Oku from Fastly made changes to their TLS implementation that showed improvements to AEAD and header encryption[1]. I suspect we will see improvements to QUIC's performance at a pace faster than the optimisations to TLS were made to make ubiquitous use of it trivial.

[1]: https://mailarchive.ietf.org/arch/msg/quic/OnBhC4gCosmlU3VKo...


Would you say that QUIC might not be worth it for video content as it's the transferal of large files over the network.

Whilst QUIC shines when you have a lot of small assets that you want to fetch as quickly as possible?

Like could we have a system where we choose http 2 or 3 depending on the type of data?


> Would you say that QUIC might not be worth it for video content as it's the transferal of large files over the network. Whilst QUIC shines when you have a lot of small assets that you want to fetch as quickly as possible?

The way video is generally served now is actually as a large number of dynamically-selected chunks of the video and associated audio. QUIC makes perfect sense for YouTube/Netflix/Vimeo type VOD, and especially the MPEG-DASH style of streaming.


So you’re using hardware offload of the crypto now? And that wouldn’t be available if you switch to QUIC?

It seems a bad reason to knock the protocol cos people haven’t implemented something unrelated (hardware acceleration) for it.

I appreciate it obviously would be stupid for you to switch given those stats. But it’s not related to the protocol design or some fundamental shortcoming with QUIC.


Is it worth it now for your use-case? Maybe not. Is it useful for others? Probably. I’m also curious that Google chose to deliver QUIC for everything when they can including YouTube. I guess the cost tolerance for them is there because they believe they improve the experience enough that it’s a net revenue generator.


A lot of it depends on whether or not the content you're delivering is a static file, or if you're transcoding something. When its a static file, your job is "easier" in that you can use sendfile, and (with hardware kTLS offload) avoid having the CPU touch any data being sent (the Netflix case). When you have a gigantic long tail, and are transcoding a lot of traffic, then optimizations like sendfile and kTLS hardware offload matter a lot less. I imagine Google falls more on one side of the spectrum, and we fall on the other.


> When you have a gigantic long tail, and are transcoding a lot of traffic [...]

Do you mean to say “YT does JIT transcode” or simply that the huge long-tail corpus means that transcode optimization >>> delivery benefits? :)

Other things to factor in: different mix of mobile users & geographies (incl. access networks), software stacks, hardware costs/goals/footprint, etc.


What hardware are you using with hardware kTLS offload?


We've tried both Chelsio T6 and Mellanox ConnectX-6 Dx in prototypes like this.


YouTube is markedly similar to Netflix architecturally I’m sure (precompute ahead of time) and uses quic.


Sendfile is great for The Netflix Workload, but for more dynamic web stuff it feels like the "other way around" i.e. netmap makes more sense.. and QUIC is kind of a natural fit for that :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: