I have a hard time getting past it being mostly unusable on 32 bit Linux [1]. That just seems like a fundamental flaw in the garbage collector that moving to 64 bit simply kicks the can down the street.
And before anyone suggests "just use 64 bit", that actually a really crappy solution. In most cases it merely (almost) doubles your memory footprint for no real gain.
There's a patch to make the current conservative GC a precise GC, which will totally fix this problem. It's in the process of being reviewed and integrated into the core.
FWIW, the panelists on the recent "Go in production" panel at I/O responded that garbage collection wasn't an issue for them on 64-bit systems. http://www.youtube.com/watch?v=kKQLhGZVN4A
That's awesome, kudos. How do you deal with maintaining the distinction between pointers and values on the stack and in registers? I didn't see any code to do that, skimming through that patch. Since it's a precise GC, you need to maintain the distinction.
In Rust we have a very large patchset to LLVM to allow us to do this; it involves modifying nearly every stage of the LLVM pipeline. Curious how you did that in gccgo.
Edit: I just looked at the source and it still seems conservative for the stack and registers. So it's not a precise GC yet and will not eliminate false positives. (This is of course totally fine, I just wanted to clarify for myself.)
That's too bad. Hopefully someone will improve `gccgo` as well. I actually started out experimenting with Go using the `gc` compiler, and found performance (basically manipulating large arrays sequentially) atrocious. Moving to `gccgo` improved performance four-fold, making my unoptimized Go test code in some cases faster than my unoptimized C test code.
Go 1.0.2, which is the newest stable release. I will see if I have some extra time to file a report.
Slightly off topic (but relevant to my comment above), do you know why this:
func TraverseInline() {
var ints := make([]int, N)
for i := range ints {
sum += i
}
}
should be slower than:
func TraverseArray(ints []int) int {
sum := 0
for i := range ints {
sum += i
}
return sum
}
func TraverseWithFunc() {
var ints := make([]int, N)
sum := TraverseArray(ints)
}
Seeing a huge difference here. With an array of 300000000 ints, TraverseInline() takes about 750ms, whereas TraverseWithFunc() takes 394ms. With gccgo (GCC 4.7.1), the different is much slighter, but there's still a difference: 196ms compared to 172ms. (These are best-case figures, I'm doing lots of loops.) Go is compiled, not JIT, so I don't see why a function should offer better performance.
I don't have any immediate insight for the difference in performance you're seeing (my hunch is that it's to do with where and how memory is being allocated), but I would suggest you ask about it on the Go-Nuts mailing list: https://groups.google.com/forum/?fromgroups#!forum/golang-nu...
The Go authors should be very receptive and informative on such an issue.
Not sure why the difference in performance between function and non-function, particularly considering this looks inline-able.
2 theories on why gcc is doing better. Maybe its unrolling the loop, and avoids a bunch of jumps. Alternatively, it could be using SSE, but this theory is a much less likely, because I would expect it to be 4 times faster not twice as fast. gc doesn't use SSE instructions yet.
I'm not seeing any difference between the two functions using the 1.0.2 x64 gc compiler. Here's the code I used (slightly modified to compile): http://play.golang.org/p/gcf0BOGncQ
If you want to dig deeper and are not averse to assembly, you can pass -gcflags -S to 'go build' and see the compiler output. Here's what I got for the above code:
The numbers are from Go 1.0 as that's what is available on Ubuntu Precise, my test box. (Getting similar numbers with 1.0.2 on a different box where I don't have gccgo.)
I realized that the first loop was using "i < len(ints)" as a loop test, which turns out to be more expensive than I thought, and the compiler doesn't optimize it into a constant (which would be expecting too much, I guess). After rewriting the test, the function call case is only slightly faster, although it is still significantly faster with gccgo.
I've talked with Chris Lattner about upstreaming it and the LLVM developers seem open to it when it's ready. In the meantime you can find the most recent work here:
Currently, you are mostly out of luck if you try to use mainline, unless you're okay with the llvm.gcroot intrinsics, which pin things to the stack so they can never live in registers (causing a corresponding performance loss). Probably your best bet is to do what Go will do once the patch enneff was referring to is merged: scan conservatively on the stack and precisely on the heap. You can still get false positives and leaks, and you can't do moving GC that way, but it's better than fully conservative GC.
By far, the hardest part of GC is precisely marking the stack and registers. Yet you have to be able to do it if you want to prevent leaks. Our goal is to put in the hard work so that new languages will be able to have proper GC without having to roll a custom code generator or target the JVM or CLR.
> Our goal is to put in the hard work so that new languages will be able to have proper GC without having to roll a custom code generator or target the JVM or CLR.
As someone who has ambitions of writing a fun little language this is exactly what I want out of LLVM. Can't wait for your patches to hopefully get to mainline.
In practice very few people have issues on 32bit systems, and for those that do there are workarounds.
Also, Go works much better on 64bit systems for many other reasons, if nothing else because that is what most of the core team uses. 64bit certainly does not double your memory footprint, and it increases performance dramatically (in part because the extra registers, etc., and in part because the Go 64bit compilers are much better).
And as Andrew mentioned, there is a patch already to fully solve things for the few people stuck on 32bits that have any issues.
Finally, all this is an implementation detail, and has very little to do with the qualities of the language. which is what the article is really about.
> In practice very few people have issues on 32bit systems,
I'm not sure what the proportion of people who experience issues is (whether it's "some", "most", or "very few"), but in my anecdotal experience running several long-lived Go processes on a 32-bit VPS it hasn't been a problem.
I think it just comes down to the allocation patterns in your program. Some programs trigger the pathological behavior and others don't.
Regardless of the numbers of affected programs, I'll be glad when the issue is behind us for good.
To be fair, it's kicking the can 60 years down the street, to when we're fully populating a 64 bit address space with 16 exabytes of ram. Considering the change will be backwards compatible, 64 bit is for all intents and purposes a permanent solution.
You do have a point about 32 bit, though (I use 32 bit Redis for instance, for the same reason).
It's not really a fuck up, it's a "we didn't have time to do that yet because instead we were focusing on make an awesome language and standard library".
Lots of issues with the runtime had to wait because they can wait because it doesn't matter if you wait 5 years to fix the GC because you won't break any code when you do it.
And before anyone suggests "just use 64 bit", that actually a really crappy solution. In most cases it merely (almost) doubles your memory footprint for no real gain.
[1]: https://groups.google.com/group/golang-nuts/browse_thread/th...