The borrow checker has become smarter, but not at handling cyclic data structures.
The problem you're describing still exists, though I'd say the advice isn't quite right.
Generally the better advice is to use a graph library, or something like slotmap [1] if you're rolling your own, than to use a Vec or HashMap. That's superior to a vec/hashmap for a few reasons. It handles keeping indicies stable/writing your own index allocator. It also makes it possible with a feature flag to detect all use after frees at low-ish cost (instead of only detecting them if no-one is re-using the slot). It makes it convenient to use typed keys instead of integers. The underlying datastructure is basically the same though.
You can also use Rc/Weak for cyclic data, for ref counting "GC", it's an approach I have less experience with so I can't speak towards it as well.
In the end not that much data is cyclic (especially in idiomatic rust), and this is trading off a small amount of debugging convenience on cyclic data, for guarantees of memory handling correctness in the rest of your code, and for guarantees about the amount of damage that mistakes can do in the code that does need to handle cyclic data (which in turn is a debugging win on cyclic data, so...).
I agree it's a bit of a rough edge, but it's a case where rust solved the easy 95% of the problem with only slightly questionable tradeoffs on the remaining 5%, and that's a win in my book.
This is my main gripe with Rust. Things that should be part of the standard library are relegated to third parties, thus requiring developers to audit yet another dependency (and it's dependencies, and sub-dependencies, and sub-sub-dependencies, ...). Realistically, this just means I can't use most third-party crates, greatly limiting what I can do with Rust.
It's been a long time since I've been able to write anything in Rust despite loving the language. My last hopes for Rust are now in gcc-rs splitting the Rust ecosystem into two; the current npm-style crates.io ecosystem and a more destro-centric properly vetted package ecosystem. If this doesn't happen, I'll just continue to use other languages or limit my dependencies to those with sane package management (usually this means libraries written in C) until something better pops up.
Rust being a low-level library, adding something means inherently choosing a preferred approach to a problem rather than another, which may be disagreeble.
The consequence is that there would be still the sub-sub-dependencies problem, because the author of a crate may decide that the stdlib implementation is not appropriate for the use-case (Rust is low-level; low-level development is typically "pickier" than web development).
I personally think that it's good not to have higher level APIs in the stdlib. My only exception to this is the exclusion of fast hashing, because this choice had side-effects beyond simple need for an API.
Having two distinct "products" could help here. There could be a "bare rust" that is minimal and unopinionated, and used by those that need a small footprint, for example. A "battery included rust" could have common solutions to common problems that are hard to solve with "bare rust", at the expense of making choices that are best avoided for "bare rust".
My current outsider's perspective is that there's some kind of ever-shifting, never quite established consensus on which batteries to include, which tends to make it harder than necessary to switch project, interoperate between libraries etc.
> Having two distinct "products" could help here. There could be a "bare rust" that is minimal and unopinionated, and used by those that need a small footprint, for example
There was a time when it felt as if every piece of interesting java code out there also came in variation badge-engineered into its matching "spring-whatever". Perhaps there is some merit to that kind of business model?
I just glanced through the small number of open issues, and none seem like actual issues so much as potential improvements. The maintainer seems to still be around, and I've never found that a standard library is faster at merging improvements than third party libraries, rather the exact opposite (for good reason, the standard library needs to avoid breaking changes at nearly all costs).
While I think there are valid arguments in both directions for putting more things like this in std, I don't think "non-bug issues have been open for over a year" is one of them. The exact same is true of std.
> the standard library needs to avoid breaking changes at nearly all costs
IIRC in Rust this is a little bit looser than elsewhere (let's say in C++) as the ABI is not stable. This allows improvements that change the internal structure of structs but not the API. As a counter example, in C++11 there was a breaking change that introduced a size field to std::list, making size() O(1) instead of O(n), breaking linkage between older and newer versions of C++ binaries. In Rust there is no such guarantee so changes like that could be introduced anytime.
Yes but everything is versioned so it's not like this will break any existing applications unless the user/owner explicitly upgrades (but then they should be ready for breakages).
The absence of versioning hell with Rust is one of the many things I love about Rust.
On the other hand it means you don't automatically get security fixes and have to manually set up CVE monitoring and rebuild your application every time a CVE appears.
Of course if it appears in an old version the author won't bother to backport the fix so you will have to bump the dependency to the latest, doing all the API changes that you didn't want to do.
Yeah, I'd emphasize that moving something to the standard library doesn't magically solve maintenance issues. In fact if the community can't even maintain something then that's a strong argument for it not to be moved somewhere that's much harder to contribute to, whose maintainers have a lot else on their plate and where any API mistakes are for forever.
Last year has been a bit rough and I didn't have a whole lot of free time and energy to work on projects at the same time.
That said, I don't believe any of the open issues on slotmap are of immediate need of attention, they are mostly minor feature requests/incremental improvements.
I do have a slotmap 2.0 in the planning that will include most of these improvements and allow further customization and control of the number of bits used for index/generation as well as what to do when the generation would wrap around (either leak that specific slot or allow a spurious reference with a (very) old key).
Thank you for creating slotmap. It helps for folks who don't want to dig into nitty-gritty laborious unsafe code themselves. I only expressed the wish that something fundamental like this was was designed and incorporated as part of Rust stdlib.
As a Rust user, I have to say I think this is the right choice. Keep things out until they are practically dead/done. Rust covers a very large number of language use cases from C++ to Python. C++ doesn't have a std lib JSON parser either. And serde is fantastic!
Generally rust prefers to keep things out of the standard library because it's hard to change/improve a standard library. They try to give enough useful stuff and then leave all the other stuff to easily downloadable packages.
More importantly, it means that version upgrades are decoupled.
They don't have to download and install version 1 of an API for backwards compatibility if you're on version 2, and you don't have to download a new compiler version to upgrade a package as long as the package's maintainer is still willing to support the version you're on.
(That latter one being the same as with support for new revisions of C or C++ in in GCC.)
What is the advantage of a slotmap style approach vs references? It seems a bit manual and error prone, and the errors risk being silent references to wrong objects that may result in security vulnerabilities or data corruption.
The advantage w.r.t references is that with Slotmap, you can express cyclical data structures, whereas with references you can't.
The advantage w.r.t plain Vec and juggling integer indices is that unlike integers, Slotmap keeps track of the "identity" of the stored object; the keys are unique to that object, and you can't accidentally refer to a wrong object with them.
> errors risk being silent references to wrong objects
Slotmap is specifically designed to prevents this. I recommend you to read its documentation; the generational indices system is quite nice!
> Slotmap keeps track of the "identity" of the stored object
(I am not sure if slotmap uses this strategy)
To give more details some of these data structures use generational indexes, a pair (generation, index) where index is a plain index of the underlying vector and generation is a bookkeeping counter of how many times you have allocated a value to that index. These two values can be combined in a single 32bit-64bit value but additional memory would be required to keep track of the counter.
E.g. with a vector of length 2
{meta: [0,0], data:[...]}
malloc -> (1,0)
{meta: [1,0], data:[...]}
malloc -> (1,1)
{meta: [1,1], data:[...]}
free(0)
{meta: [2,1], data:[...]}
malloc -> (3,0)
{meta: [3,1], data:[...]}
free(0)
{meta: [4,1], data:[...]}
malloc -> (5,0)
{meta: [5,1], data:[...]}
free(0)
free(1)
{meta: [6,2], data:[...]}
malloc -> (7,0)
malloc -> (3,1)
{meta: [7,3], data:[...]}
This way if you tried to access the pointer (5,0) the library can check that at index zero of the meta array the generation is 7 and conclude that you are doing a use after free (in this example even generations denote unallocated memory).
This is a description of a very simplified algorithm.
I meant counted references that were mentioned as an alternative in the GP comment.
A generation count indeed could mitigate "use after free" style bugs, but may have a high false negative ratio if many/most objects have same generation. But a glance at slotmap docs didn't yield any hits for a generation I'd being used, do you have a specific link?
> I meant counted references that were mentioned as an alternative in the GP comment.
The main problem with reference counted pointers is detecting cycles. Rust doesn't have a cycle detector, but only a concept of "weak" pointers that don't prevent the object they're pointing at from being destroyed, just detect if it has been. This works fine if you have a tree with parent pointers, you make the parent pointers weak and everything works. It doesn't work so well if you don't have a clear tree structure though, because then you can't really tell which pointers should be strong and which should be weak.
> but may have a high false negative ratio if many/most objects have same generation. But a glance at slotmap docs didn't yield any hits for a generation I'd being used, do you have a specific link?
Apart from wrapping around after 2^32 frees of a specific slot, I don't believe any false positives are possible. The trade off is that there's more overhead than I think you're imagining.
Once I get around to slotmap 2.0 (as noted by my inactivity, I haven't had much of the good free time + energy combination last year), the default behavior will be to leak the memory of a slot after 2^31 alloc/free pairs in that slot rather than wrapping around. You can still get the old wrapping behavior, but the default will be to never, ever allow spurious references, at the cost of leaking a couple bytes of memory once in a blue moon if you have heavy, heavy churn. That means, amortized, each insertion will leak 3.726 nanobytes if you're storing u32s, which is a fun unit. Note that if you destruct the slotmap the memory is reclaimed of course, I just mean that slotmap itself will not use that piece of memory again.
> The trade off is that there's more overhead than I think you're imagining.
I don't think the overhead is as large as this implies. For SlotMap the memory overhead is ~4bytes plus alignment per element, and insertion/deletion is only a handful instructions and one branch extra compared to a vector push:
> I don't think the overhead is as large as this implies. For SlotMap the memory overhead is ~4bytes plus alignment per element, and insertion/deletion is only a handful instructions and one branch extra compared to a vector push:
I think they were imagining one generation counter for the entire map instead of one per slot. I agree it's not particularly high.
The problem you're describing still exists, though I'd say the advice isn't quite right.
Generally the better advice is to use a graph library, or something like slotmap [1] if you're rolling your own, than to use a Vec or HashMap. That's superior to a vec/hashmap for a few reasons. It handles keeping indicies stable/writing your own index allocator. It also makes it possible with a feature flag to detect all use after frees at low-ish cost (instead of only detecting them if no-one is re-using the slot). It makes it convenient to use typed keys instead of integers. The underlying datastructure is basically the same though.
You can also use Rc/Weak for cyclic data, for ref counting "GC", it's an approach I have less experience with so I can't speak towards it as well.
In the end not that much data is cyclic (especially in idiomatic rust), and this is trading off a small amount of debugging convenience on cyclic data, for guarantees of memory handling correctness in the rest of your code, and for guarantees about the amount of damage that mistakes can do in the code that does need to handle cyclic data (which in turn is a debugging win on cyclic data, so...).
I agree it's a bit of a rough edge, but it's a case where rust solved the easy 95% of the problem with only slightly questionable tradeoffs on the remaining 5%, and that's a win in my book.
[1] https://crates.io/crates/slotmap