The study says there's surprising performance problems with Java's virtual thread implementation. Their test of throughput was also hilarious, they put 2000 OS threads vs 2000 virtual threads: most of the time OS threads don't start falling apart until 100k+ threads. You can architect an application such that you can handle 200k simultaneous connections using platform-thread-per-core, but it's harder to reason about than the linear, blocking code that virtual threads and async allow for.
> Context switching cost due to CPU cache thrashing doesn't go away regardless of which type of thread you're using.
Except it's not a context switch? You're jumping to another instruction in the program, one that should be very predictable. You might lose your cache, but it will depend on a ton of factors.
> there's a new cost caused by allocating the internal continuation object, and copying state into it.
This is more of a problem with the implementation (not every virtual thread language does it this way), but yeah this is more overhead on the application. I assume there's improvements that can be made to ease GC pressure, like using object pools.
Usually virtual threads are a memory vs CPU tradeoff that you typically use in massively concurrent IO-bound applications. Total throughput should take over platform threads with hundreds of thousands of connections, but below that probably perform worse, I'm not that surprised by that.
> Except it's not a context switch? You're jumping to another instruction in the program, one that should be very predictable. You might lose your cache, but it will depend on a ton of factors.
Java virtual threads are stackful; they have to save and restore the stack every time they mount a different virtual thread to the platform thread. They do this by naive[0] copying of the stack out to a heap allocation and then back again, every time. That's clearly a context switch that you're paying for; it's just not in the kernel. I believe this is what the person you're replying to is talking about.
[0] Not totally naive. They do take some effort to copy only subsets of the stack if they can get away with it. But it's still all done by copies. I don't know enough to understand why they need to copy and can't just swap stack pointers. I think it's related to the need to dynamically grow the stack when the thread is active vs. having a fixed size heap allocation to store the stack copy.
> Context switching cost due to CPU cache thrashing doesn't go away regardless of which type of thread you're using.
Except it's not a context switch? You're jumping to another instruction in the program, one that should be very predictable. You might lose your cache, but it will depend on a ton of factors.
> there's a new cost caused by allocating the internal continuation object, and copying state into it.
This is more of a problem with the implementation (not every virtual thread language does it this way), but yeah this is more overhead on the application. I assume there's improvements that can be made to ease GC pressure, like using object pools.
Usually virtual threads are a memory vs CPU tradeoff that you typically use in massively concurrent IO-bound applications. Total throughput should take over platform threads with hundreds of thousands of connections, but below that probably perform worse, I'm not that surprised by that.