Hacker News new | past | comments | ask | show | jobs | submit login
Lthread is a multicore/multithread coroutine library written in C (github.com/halayli)
58 points by nkurz on April 23, 2012 | hide | past | favorite | 25 comments



So, last time this was posted I commented that I was interested, but couldn't use this at work because of the GPL license.

Since then I implemented my own M:N userspace:kernel threading library (which is what lthread boils down to) based on Russ Cox's BSD-licensed libtask. You know what I found?

1) Many libc routines require a surprisingly large amount of stack. 64kiB was the smallest power of 2 I could find where I wouldn't overflow the stack somewhere in a libc call. (This isn't a problem with stack-copying, but it is a problem if you have dedicated per-lthread stacks).

2) It was slow! Just dropping the M:N library and using blocking network I/O with 200+ kernel pthreads was vastly faster.

3) Many pthread-functions get very, very confused if they are used from different pthreads (same logical "lthread"). (This happens when a userspace scheduler swap happens between matching operations.) For example, pthread mutexes don't like to be locked in one pthread and unlocked in another pthread (same "lthread"). I ran into other things that match this category, but can't remember them off of the top of my head.

I ended up scrapping it and just using a pool of kernel pthreads. It works, the code is pretty clear (blocking IO! whatever, man), and it's fast enough.

Edit: By "fast enough", I mean "can fill 2x 1Gbit pipes from disk" (without any perf-focused work thus far).


Since I can't edit this post anymore, here's the relevant code:

https://github.com/cemeyer/taskmn

(Buyer beware: this is basically a snapshot of it at the point we decided to drop it entirely; I poked it enough to get it to compile under GCC / linux, but haven't verified functionality at all.)

More edits:

4) Re: Stack-copying; I found that without any attempt at optimizing memory use, my "light" threads were consuming on the order of ~600 bytes of stack (when de-scheduled); with a stack copying approach and a run-time stack of 128kiB, calling large-stack-using libc functions was fine.

5) Re: per-task dedicated stack memory; the default amount of memory allocated per-pthread on our platform is 128kiB, so 64kiB isn't great savings. I detected stack corruption at 32kiB and below by setting a canary value (at the top of the allocated stack) in the scheduler before running a ready "lthread."


Thanks for sharing your code and experience.

600 bytes per de-scheduled thread sounds excellent; It makes a million mostly-dormant threads feasible in less than a gig of memory -- I think that's the use case lthread-style threads shine in. You wouldn't be able to do that with blocking kernel threads (you'd need 64gig of ram just for stack, and that assumes the kernel scales well enough to handle that).

I think overall lower efficiency per thread, even if it's 50% slower, is acceptable in such a use case, given the memory requirements reduction.


No problem :-).

Re: "lower efficiency… is acceptable in such a use case": Maybe — it really depends on your use case. In the case you describe, sure. In my particular (specialized) case:

1) My clients are connected by 1Gb or 10Gb switched ethernet, and they are typically on firewalled local networks. I don't worry about trickle-style DoS attacks. (In other words: my threads are mostly-active, not mostly-dormant.)

2) My clients really only care about sustained streaming throughput. So, if they can max out the underlying disk with 20-30 simultaneous connections, they won't bother throwing 1000s of connections at my server. pthread-per-connection works great for this circumstance (with a pre-allocated thread pool).

Use the right tool for the job ;-).


Since you can consider libc routines as blocking code, you could switch back to the C stack (of the original pthread), so you would need only one 64kB stack per pthread.


Out of interest, what implementation of libc was that?


A hacked up FreeBSD libc (we make a FreeBSD-derived operating system). Some of the authentication/credentials code, for example, creates a bunch of large stack frames.


The trouble with this sort of C coroutine library in my experience is portability. Lthread seems to have taken the "write a bit of inline asm" approach, which means it's x86/x86-64 only. If you avoid the assembly and try to do things only with C library functions, you end up with something like QEMU's coroutines, which have four different backends: makecontext/setcontext based, win32 fibers, the nasty sigaltstack trick used by GNU Pth[1], and a last-ditch fallback using a separate GThread per coroutine. Multiple backends means more code, and the less-used backends are more liable to bitrot and undetected bugs, which is the last thing you want in a key bit of infrastructure.

Also some libc implementations don't take kindly to programs messing with the stack pointer behind their backs. Early Linux NPTL implementations put thread-local-storage just above the stack and used "round ESP up to 2MB boundary" to access it, which meant you had to switch back to the libc-created thread stack before calling just about any libc function. I hear at least one of the BSDs still does something similar.

All things considered I'd really rather just use threads...

[1] http://www.gnu.org/software/pth/rse-pmt.ps


Previously posted on HN 50 days ago, there are some interesting critiques in the comments, especially regarding the GPL license, the consensus being it is ridiculous for a library like this and probably a good reason it's not as popular as it could be:

http://news.ycombinator.com/item?id=3661038


Thanks, I hadn't seen the earlier discussion. I'm more interested in the approach than in the specific code, so GPL doesn't concern me terribly. It seems to be a significant block for others, though.

   -----	
halayli 50 days ago | http://news.ycombinator.com/item?id=3661584

I am not married to the license. I noticed several complaints, so I am going to reconsider it. :)

   -----
Halayli: Any further thoughts on licensing?


I've been pretty busy in the past month and I never got a chance to go ahead and change the license. But yes I'll be changing it to BSD license this week.


and done!


Cool! Thanks.

Edit: looks like you missed COPYING; it's still GPLv2 (whereas LICENSE has BSD-looking text).


fixed. Thanks for catching it!


I wonder how this library compares agains Apple's Grand Central Dispatch.

I guess they don't contain 1:1 functionality, but both libraries seem to be meant to make threading easier. I guess GCD is a more complete solution due to the addition of queues.


They're different beasts entirely. One key distinction is that GCD requires kernel support, while lthread and friends simply sit on top of any pthread-compatible system. As I understand it, GCD is based on an old systems idea (1991):

https://en.wikipedia.org/wiki/Scheduler_activations

It also has some auxiliary stuff (compiler-level support for C closures (Clang blocks), etc…) to "make threading easier," like you say.


Any thoughts on this HP research paper

http://www.hpl.hp.com/techreports/2004/HPL-2004-209.html

"Threads Cannot be Implemented as a Library"?

That paper states that library-based threading implementations that don't also involve the compiler can't guarantee correctness of the resulting threading.


Not relevant here.

The paper talks about correctness for parallel kernel-level threads. In contrast, lthreads are about concurrency. These user-level threads are about software architecture, because they are more elegant than an event-loop. Parallelism (exploiting multicore hardware for a speedup) is a secondary goal, but it is much easier to parallelize a user-level thread program compared to an event loop program.


When I saw this repost, I thought the license had changed. Too bad it hasn't yet.


What's wrong with the BSD license?


I changed the license to BSD one hour ago after I was reminded by this post :). The comment was made before the change.


This is previously submitted to HN several times. http://news.ycombinator.com/item?id=3331474



According to SF, it was last updated in 2009 (is it actively developed?). I don't see much in the way of examples on the SF page, which is a bit off-putting. I guess I'm curious about the model it uses and how it performs, but my best guess is that it's not radically different from lthread or any other M:N userspace threading library. Do you know more?


Neat.

I wonder what happens if you use swap/get context?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: