I was actually just looking at this problem last week, when I realized I needed to keep items in a database table positioned arbitrarily, ideally without needing to manipulate the rest of the list. So if a user adds in a new element after element 5, that element becomes 6, without needing to update the original item that came after element 5. There are indeed very sophisticated algorithms to manage this problem and minimize theoretical bounds. But it also seemed like the simplest solution for this particular version of the problem is to just use fractional amounts, and occasionally pay a penalty to relayout the list.
Basically it works as long as your label space is large compared to the number of items. The more sophisticated methods are necessary when that isn’t the case. For example, say you have 4 bytes for the label and 1 billion items.
This sort of problem also occurs when you're trying to do CRDTs, which can roughly be described also as "design something that does Git better."
So e.g. to frame this, one approach to a CRDT is to just treat the document as a list of facts, "line 1 is 'foo', line 2 is 'bar'", and each fact has a number of times it has been asserted, and to "merge" you just add together the assertion counts, and then you can detect conflicts when a fact has been asserted more than once or fewer than zero times. So a patch says "change line 2 to 'baz'", this becomes "unassert that line 2 is 'bar', assert that line 2 is 'baz'" and it conflicts with a patch that says "change line 2 to 'quux'" because the fact "line 2 is 'bar'" has an assertion count of -1.
But anyway, in this context you might want to allow inserting lines, and then you have the list-labeling problem, you don't want the patch to unassert lines 4,5,6 just to insert a new line after line 3. So then an obvious thing is to just use a broader conception of numbers, say "line 3.5 is <X>" when you insert, and then we hide the line numbers from the user anyways, they don't need to know that internally the line numbers of the 7 lines go "1, 2, 3, 3.5, 4, 5, 6".
So then you need a relabeling step because you eventually have some line at 3.198246315 and you want to be able to say "yeah, that's actually line 27, let's have some sanity again in this thing."
This also maybe hints at the fun of adding randomization, consider that one person might add line 3.5, then add line 3.75, and then remove line 3.5; meanwhile someone else might add a different line 3.5, add a line 3.25, and then remove their line 3.5, and these patches would both amount to "assert line 3.25 is A, assert line 3.75 is B", and would merge without conflict. This means that in general if two people are messing with the same part of the same document asynchronously, this model is not able to consistently flag a merge failure in that case, but will sometimes instead just randomly order the lines that were added.
We can then just make that a feature rather than a fault: you don't insert at 3.5, which is 3 + (4 - 3) / 2, rather you insert at 3 + (4 — 3) * rand(). And then when two people both try to insert 12 lines between line 3 and 4 independently, when you merge them together, you get 24 lines from both, in their original orders but interleaved randomly, and like that's not the end of the world, it might be legitimately better than just declaring a merge failure and harrumphing at the user.
> This sort of problem also occurs when you're trying to do CRDTs, which can roughly be described also as "design something that does Git better."
Aren't the goals of git and CRDTs different. With git you want to get the merged result to be semantically correct. With CRDTs you want to achieve convergence (so no merge conflicts), as far as I know semantically correct convergence (not sure what to correct term is) is not really possible as it is too difficult to encode for CRDTs, though. Isn't that why CRDTs are mostly used for multiplayer interactive applications where these kinds of mismatches are quickly seen?
Ha, I had this exact problem asked as an interview question.
IIRC their real life solution was to leave gaps between elements (eg index 0, 100, 200… instead of 0, 1, 2) and re index when needed. Probably works well enough, what I came up with is (as you say) the idea of fractional indexing, but dealing with decimals is a pain so you can instead represent them as vectors, which you can just represent as a string of numbers that you sort lexicographically.
So an element inserted between elements 1 and 2 gets an index 11 (anything between 11-19 is fine). Between 1 and 11 is 101. Between 11 and 2 is 12 (or anything between 12-19). Note that these indexes are not numbers, they’re string that are compared lexicographically.
I’m sure there’s downsides, eg it takes a lot more memory to sort these indexes (strings are much larger than numbers). It feels a bit too smart to not have unexpected downsides.
> IIRC their real life solution was to leave gaps between elements (eg index 0, 100, 200… instead of 0, 1, 2) and re index when needed. Probably works well enough, what I came up with is (as you say) the idea of fractional indexing
Those are the same thing. Leaving gaps is fractional indexing. It's just fixed-point rather than floating point.
> an element inserted between elements 1 and 2 gets an index 11 (anything between 11-19 is fine). Between 1 and 11 is 101. Between 11 and 2 is 12 (or anything between 12-19). Note that these indexes are not numbers, they’re string that are compared lexicographically.
This reminds me of the most interesting method of generating random integers in an arbitrary range from random bits: interpret the bitstream as a string representing a real number (in binary) between 0 and 1. If, for example, you want to use bits to generate a number between 0 and 5, you divide the unit interval into sixths, and examine bits until you've determined conclusively that your number lies within one of those intervals:
Very similar to my first intuition on how to approach this problem. An interesting question that comes to mind is how should you pick index sizes and how often should you rebalance the whole thing. Make the index too large and you're wasting a lot of space you'll never need, too small and you're forced to rebalance too often. I'm guessing an ideal index size is such that you can rebalance at most once a night with a cron job and then avoid rebalances the whole rest of the day.
To be fair, this sounds like one of those classic problems that someone for sure already figured out in the 50s or 60s, just under a different name and/or context. Hash chaining is a similar idea, but not quite the same.
The "real life" solution of leaving gaps & reindexing is actually my earliest programming memory (as a teenager)/lesson in cleverness, of feeling like I should have been able to come up with a more clever/optimal/~~scalable~~ solution but settling for this awkward 100 200 300 thing. Nowadays I've used that approach like...dozens of times over the decades of real world projects since, and I'm still maintaining that web app I made in 2003, so I'm very glad I never came up with something unduly clever.
That is how I implemented it at work around 9 years ago. Use strings for ordering and if you use the full byte range they end up fairly compact (rather than just 1-9 as in your example). There are some edge cases that can cause it to balloon in size so there is a separate reordering step but it almost never needs to be invoked.
Theoretically, using fractionals as list labels requires unbounded amounts of memory to store the fractions. In practice that is extremely limitted. But the difference becomes really a problem. If you are not just assigning ordered labels to a collection, but if you want to use these labels directly as array indices to store the elements. That would be a more literal modelling of the library sorting problem.
So you believe that your experience is a universal one for all learners? Cognitive load is a real consideration in teaching, and having to ignore and filter text is challenging for some folks.
The boilerplate around a main function is 10 units of load. Everything else you have to know to write a simple program is 500-600 units of load. The boilerplate is a rounding error, and just does not matter.
I wonder what can be done. It's terrifying to realize how dependent students are going to become on these tools. Too many people are just not willing to live with the discomfort that comes with learning something difficult, when the alternative is so readily accessible. Short term gain over long term gain, exemplified.
I like the use of the word discomfort here. It does take an acceptance of some level of discomfort to engage with material you don't yet understand. Similar to how you experience some physical discomfort / strain when pushing your limits exercising. As you engage with discomfort your tolerance builds and what was once a difficult exercise becomes routine. The reward is worth the effort but I worry what the future looks like with many just opting out.
It really is spot on. I've been reading "The Coddling of the American Mind" [0] by Greg Lukianoff and Jonathan Haidt. The book's precipice feeds directly into this idea and has been a fun read thus far. It seems that LLMs will feed negatively into that "coddling", described in the book, in a very negative way as they are providing discomfort avoidance.
Why must every educational python library insist on teaching kids with global variables? I know parameters aren't easy for novices, but it feels like it's missing a lot of the value when we don't attempt to teach them...
Because it's perfectly fine for 100 line programs. Beginners don't need to write extensible code that will be maintained for 30 years on business critical systems. You start by learning what a fuckin for loop does.
Not only that, it's a relatively gentle introduction into object oriented concepts, where an instance is just like independent copies of modules, as they're using them: module.function(), with function accessing some global conceptually turns into instance.function() where function accesses some instance "global".
I was always a bit surprised that python didn't allow "module instances" that automatically did that.
Honestly the way I learned not to use a ton of global variables was because I used so many and would forget what they'd do. But that was back in the days of Apple // BASIC so when I was introduced to C I rejoiced. People need to learn from mistakes.
For me, I learned when I wanted all my function calls to do the same thing no matter how many times, or order, I called them.
I often wonder how much "pain" is required to really learn something. I would claim that 90% of how I write code is the result of pain, not patterns, even though they often end up being similar.
True, but having a single scope is more intuitive at first for people who don't have a good grasp of what a function is and are still trying to understand programming. Also processing/p5 kind of focuses on the speed and accessibility of getting an idea going, for which a game-loop and a single func scope are quite powerful!
Ten days from now will mark the one year anniversary of my wife's death. We would have been together for 11 years last November. It's tough. I really understand this letter, I think. You can't stop loving them. Why would you want to? It hurts. I send her Facebook messages, although earlier today it said it couldn't send them. I could write a letter, but that is not how we communicated. I may have to think of some other form of "communication" soon. An email, perhaps.
FWIW, I still "call" my sister. I pick up the phone and dial it. Pop my earbuds in. Then I just hit the button to lock the phone instead of the one to call the number. And we talk.
It's been 6.5 years since she was alive to answer the phone, but my heart knows her voice and what she would say. I don't always actually dial the actual phone these days, but we still catch up.
I spent over a week upgrading a project last January, over what was supposed to be my winter break... Had to rewrite some of the core internals from scratch to prepare for this. I'm not saying they weren't good changes, but the urgency is not nice haha.
reply