The suggestion to reuse objects, rather than reallocate temporaries (e.g. inside a loop body) was intriguing. Coming from C/C++ where stack allocations are approximately 'free', I tend to scope stack variables as narrowly as possible for readability and to help the compiler break dependencies. This is an interesting paradigm change that I wouldn't have expected from Go.
Most generational GCs will work better if you don't reuse objects - reuse extends lifetime and increases probability of promotion to an old generation, slower to scan. (Go's GC is not generational today, AFAIK.) I'd take any recommendations along those lines as last resort techniques to reduce pressure on critical path loops. And a more sophisticated runtime may allocate such temporaries in registers or on the stack.
In other words, you're tuning to today's runtime if you reuse heavily. Those techniques may not age well, and will harm readability and maintainability.
It isn't, and will probably never be. The Go runtime developers have actually implemented and tested a generational GC, but found that it's not useful, and even harmful at times, to the goal of having fast, low-latency, concurrent garbage collector. Mostly because most short-lived objects are allocated on stack, and the escape analysis is getting better with each release.
The generational GC that the Go team implemented does not offer one of the primary benefits of generational GC: bump allocation in the nursery. Without that benefit, it's an unfair comparison.
This benefit is directly relevant to this thread, because bump allocation in the nursery makes the allocation fast path somewhere on the order of 6 instructions.
As far as I understand this Go will always try to keep objects on the stack because it is practically free (just as in C/C++). However, if the object escapes to the heap then allocation gets expensive (just as in any other language) and it can be beneficial to reuse objects instead of allocating new ones for every iteration.
Allocation is not expensive in every other language. In Java HotSpot, JavaScript V8/SpiderMonkey, etc. it is somewhere on the order of 6 instructions. That is because those language implementations use generational garbage collectors with bump allocation in the nursery.
Unneccessary allocation is expensive in all languages. Even if the allocation is instantanious, it creates more work for the GC and thus has a performance impact on the program.
The Go dev team tried multiple alternative approaches as described here: https://blog.golang.org/ismmkeynote and the generational GC didn't compare so well against the current GC.
They did not compare a generational GC with bump allocation in the nursery to their current GC. Without that, it's not a fair comparison.
Furthermore, there isn't much of a relevant difference between stack allocated and nursery allocated objects, because they have to be scanned either way--either as roots or via a Cheney scan. The difference is only in sweeping, which is incredibly fast for nursery objects.
What's really important about generational GC is that heap allocation becomes nearly as cheap as stack allocation. That's a game changer.
There must have been a very good reason they tested the generational GC as they did.
Your comparison of heap allocation generational GC with stack allocation is not correct. Yes, allocating in the nursery is very fast, a couple of cycles only. The stack frame gets allocated once per function call. Yes, when a GC runs, you have to scan the whole heap, including the stack.
But what you are completely missing is, that allocating on the heap eventually triggers a GC, which takes cpu resources to perform. After a collection of the nursery, all surviving objects are promoted to the older generations. This promotion not only takes work, but grows the older generations which are more expensive to GC. So, while heap allocation with a generational GC is very cheap, it is not free. A large allocation count causes more frequent GC runs and objects might be promoted to older generations prematurly. As a consequence, a program that allocates less will perform better. Avoiding a high amount of heap allocations is a good way to increase your programs performance, be it by doing stack allocation, or by reusing buffers for example.
> There must have been a very good reason they tested the generational GC as they did.
The reason was that they didn't have time to implement copying GC, per the talk. That's fair as far as engineering schedules are concerned. It says nothing about how good generational GC is in general.
> But what you are completely missing is, that allocating on the heap eventually triggers a GC, which takes cpu resources to perform.
It causes a minor GC only. Those are very cheap.
Yes, there are potential add-on costs. But it's been repeatedly shown that with a fast generational GC, the benefit of escape analysis for garbage collection is marginal. That's why Java HotSpot took so long to implement it. The main benefit of escape analysis in HotSpot, in fact, is that it allows SROA-like optimizations like lock elision, not that it makes garbage collection faster. Generational GCs really are that good.
IMHE, generations are a nightmare to operate for high performance servers at scale because you have to balance the sizes of those heaps manually and it can change abruptly with code changes or workload fluctuation.
Go allocations are indeed costlier but the performance critical sections of applications can be profiled and optimized accordingly to remove allocations.
I'd rather have Go's amazing low GC latency and slightly higher allocation costs vs the operational nightmare from HotSpot.
Automatic management of generations has never fully worked in Java. Every new JDK version just adds more knobs. Sounds like you have a different experience?
As a heuristic it would be OK, however, there are some situations where the escape analysis algorithms don't detect certain scenarios so a linter may lead you to believe things are not escaping but in reality they really are, it's also implementation specific, so something that may have escaped in go 1.10 may no longer escape in 1.11 for example.
It's easy enough to profile and benchmark in go, so I would always treat that as the source of truth.
It would be better to spend this effort to add generational GC with a bump allocator to Go. That way memory allocation becomes faster for everyone, not just the tiny subset of people who use fancy tools.
I think you've made yourself clear. You've jumped into half a dozen threads in this post with some variation of the same comment as you do with every Go GC post. Your comment is interesting, but repeating it over and over is tiresome. Have you reached out to the Go GC team? What is their response?
There are heuristics — and they improve from version to version — but they're necessarily conservative: heap allocating something which doesn't escape is a performance hit, stack-allocating something which does breaks software.
In C++ if the class has non-trivial constructors, neither stack nor heap allocations are free. So in C++ reusing objects is still often better for performance.