Walking the Clojure source history: a talk not given

filoeleven · on May 3, 2020

There was an article posted here a few years back which visually shows how a git repo’s source code changes over time. Projects have varying degrees of downward slopes in the banding, which indicates the rate at which old code is overwritten.

One commenter (puredanger) ran the tool against the Clojure repo and remarked on its stability. After 2011, so past the time examined in the notes from this article, the banding is almost flat: new layers get added while very few changes are made to old code. The image stands out, and indicates how sound the language core library is. They keep adding new features, and don’t have to mess with the plumbing to make them work.

Code half-life: https://news.ycombinator.com/item?id=13112449

Clojure repo chart: https://m.imgur.com/a/rH8DC

lvh · on May 3, 2020

It's really neat to be able to empirically verify this. If you watch Rich Hickey's talks, especially Spec-ulation, you'll see this deep commitment to compatibility and progress through growth. This is exemplified by Clojure itself (which unlike many languages is famously "just a library") and many times in third party work. If you find a ten year old Python library, it's unlikely to work verbatim. A ten year old Clojure library is probably still idiomatic.

This also works in the other direction. People regularly run RCs and alphas for Clojure itself in production. Unless you're doing something pretty novel and experimental, like, say, GraalVM native images (where manually rejiggering is common in every environment, and Clojure is no exception), odds are everything Just Works.

uryga · on May 3, 2020

> Clojure (which unlike many languages is famously "just a library")

how do you mean? "language features" implemented via macros? or something about its JVM implementation?

lvh · on May 3, 2020

Both are true, but usually that means the latter. Clojure was designed to make it as easy as possible to rub a little bit of parens on an existing (JVM, though now also other platforms) project. That means being a hosted language that embraces interop very closely, and shipping the language as just a jar.

uryga · on May 3, 2020

ah, didn't know about that single JAR thing, thanks!

ken · on May 3, 2020

I think this extends beyond the standard library. In my experience, once I've written a Clojure function, I rarely need to go back and edit it. The obvious, natural way to implement anything also tends to be correct, concise, generic, and efficient.

It certainly helps that the core library is stable, but that's not the only reason. Even when I'm not upgrading to a new version, Clojure code tends to be stable in a way that other languages are not. With languages like Python, Swift, or C#, I'll be constantly taking something I wrote, and modifying it to add a feature, and again to add another feature, and then re-architecting the whole thing to work in a fully generic way. Or I'm taking something obvious and simple, and changing it all around to make it more efficient.

It's not simply an attribute of Lisp. Common Lisp makes changes which are simple to describe also simple to implement. Clojure quite often makes them unnecessary.

jonahbenton · on May 3, 2020

There is a good talk here, a series of talks. Clojure and Rich and the thinking of the entire Cognitect team (Fogus included, of course) have been so influential- intellectually- and many of its and his and their tenets have become mainstream, even dogma. That being concerned about "immutability" is a practice these days is at least partly due to Rich. Can you imagine programming without immutability?

But a big part of Clojure's story has not just been in the ideas, but also in the details of the design and implementation. From the performance guarantees of the persistent data structure implementation, to the idioms around parameter ordering, both of which I was appreciating yet again earlier today- aside from the ideas, there are ergonomics that only make themselves known through use, like easter eggs.

So there is much to mine in tracing through the history of the implementation, revealing those easter eggs in nascent form.

dwohnitmok · on May 3, 2020

I think that's focusing on the non-unique parts of Clojure. Clojure was a product of the 2000s, a generation of interop-focused FP languages (Scala, F#, and Clojure foremost among them) that were deeply influenced by the functional programming languages of the 90s (mainly thinking of Haskell and OCaml here). Immutability and FP are not the new things that Clojure brought to the table. While Clojure transients are a fascinating tool and derived from its implementation of immutable data structures, otherwise its immutable data structures are roughly in the same ballpark as other languages. And parameter ordering is something that I think Rich has walked back occasionally (I remember a talk at some point where he talked about his skepticism of the entire idea of parameter ordering vs just named parameters or its equivalent, e.g. passing in a map).

Clojure's main shtick (at least its main influence on me) is a combination of image-oriented programming (that is a live REPL that lets you edit a program you're writing on the fly without needing to restart the program) with a strong distrust of any abstractions that extend beyond immutable collections (Clojure's mantra of focusing on data).

This combination is a bit unique. You had some of the latter in "classical" Haskell (that is Haskell from the 90s and 2000s that was quite skeptical of user-defined typeclasses compared to modern-day GHC Haskell where some codebases use typeclasses out the wazoo), that emphasized using algebraic data types over typeclasses and higher-order functions and demanded that any typeclasses had associated laws. OCaml still has a lot of that today.

Then you had other languages with the former, such as Smalltalk and Common Lisp. But those two languages never really treated immutable collections as essentially the end-all-be-all of abstractions (certainly not Smalltalk where the idea of independent data is almost antithetical to its premise!). This is definitely an exaggeration; Clojure does have facilities for other abstractions and being an FP language it certainly has higher-order functions, but the community shies away from those as much as possible.

But Clojure in my mind is the first to marry those two ideas together and see how far things can go.

elangoc · on May 3, 2020

I agree with the grandparent comment. The parent comment sounds like a random and non representative sampling of what makes Clojure unique.

Both parent and grandparent correctly identify that many Clojure ideas come from elsewhere. Lisp and hosting on the JVM were not new. Immutability and persistent data structures weren't new ideas.

But putting them into one and making it performant was a paper published by people in the Clojure community (Rich was the lead author).

Clojure describes itself as data-oriented. That means not just the primacy of data, but of plain data, unadorned: https://youtu.be/VSdnJDO-xdg

What I find almost unique Clojure is that there's a philosophy to it that resonates throughout in it's design decisions: simplicity. Data is simpler than code, and code is simpler than macros, so prefer data and leave macros so a last resort. Libraries are simple, frameworks are complex. Plain data is simple, and the knock on effects of every library converting to & from plain data is powerful. But it only reveals itself after using Clojure for a while. Immutability allows for simpler code that you can reason about. Isolating state into containers allows you to characterize, control, and reduce touch points with state, which is necessary yet a source of complexity.

You can see the same set of ideas that underpin the design of Clojure to be present in follow on library additions (reducers/transducers, async, Datomic, etc). There's simplicity, often manifested in immutability and the associative model of information (everything is a map), at least.

dwohnitmok · on May 3, 2020

> everything is a map

This I strongly agree with. Clojure's insistence on carrying through with its core immutable collections everywhere it can is a defining feature of the language IMO.

I mean at the end of the day we're talking about personal influence so I can't really argue with you thinking that what I'm talking about is random.

Nonetheless personally I don't find the JVM to be a definitional feature of Clojure. E.g. Clojurescript feels just as Clojure-y to me as JVM Clojure.

Likewise I think simplicity is in the eyes of the beholder.

Personally I think (the current presentation of) transducers and core.async are both too complex (and in the former's case complected to use a Clojure-ism).

The latter I think is better served by manifold and the former, man I really need to write this up at some point since this keeps coming up, but transducers don't need to be higher order functions on reducers. Every transducer is exactly equivalent to a function from a -> List b (again yes this presentation is agnostic of source and therefore holds even for something like channels despite the presence of a concrete list). The only thing that you get from its presentation as a higher order function is reusing function composition which I view as bearing all the hallmarks of complection. It goes the "opposite" direction you'd expect and it's hardly ever used as "function" composition (when's the last time you composed a transducer with something that wasn't another transducer?).

lvh · on May 3, 2020

> Clojure's main shtick (at least its main influence on me) is a combination of image-oriented programming (that is a live REPL that lets you edit a program you're writing on the fly without needing to restart the program) with a strong distrust of any abstractions that extend beyond immutable collections (Clojure's mantra of focusing on data).

With you there.

> While Clojure transients are a fascinating tool and derived from its implementation of immutable data structures, otherwise its immutable data structures are roughly in the same ballpark as other languages.

The immutability was genuinely innovative though. It's true other languages had immutable data structures. But Clojure's hash-mapped tries were genuinely new and a massive performance improvement that made "just make everything immutable all of the time" feasible and not just a giant exercise in copying (that you hope the compiler is smart enough to work around).

> Then you had other languages with the former, such as Smalltalk and Common Lisp. But those two languages never really treated immutable collections as essentially the end-all-be-all of abstractions (certainly not Smalltalk where the idea of independent data is almost antithetical to its premise!). This is definitely an exaggeration; Clojure does have facilities for other abstractions and being an FP language it certainly has higher-order functions, but the community shies away from those as much as possible.

Perhaps I'm misreading you here. Clojure does focus on data, but the community shying away from things like higher-order functions?! No, I don't think people shy away from map, filter, reduce, or transducers, or update, or... and all of those are HOFs. Can you elaborate? If you mean like "complex type-like hierarchies" then yes, nobody really uses those, but HOFs specifically?

dwohnitmok · on May 3, 2020

RE HAMTs that's sort of true. Transients are a real leap forward that the FP community didn't have before (things like Haskell's ST generally require at least one full copy of an immutable data structure into a mutable one). Clojure's choice of dedicated syntax for each of its main data structures is also an interesting ergonomic choice (which sadly few languages have chosen to copy).

However the presence of a commonly available vector-like data structure (a data structure with logarithmic or better cons, snoc, and indexing) is not new. For example Haskell's vector-like Data.Sequence predates Clojure (as does its Patrica Trie and Map implementations).

RE higher-order functions I was being imprecise. I meant user-defined higher order functions. It's very common to use higher order functions, but it's rather uncommon to define your own new ones.

Incidentally, I think Clojure transducers are one instance where they've used a higher order function when a first order function would do.

It turns out that transducers can be implemented as functions of the form `a -> List b` rather than the usual presentation as a higher order function on reducers. Yes even though there's a List it works on non-collection things.

Skinney · on May 3, 2020

Wasn’t Clojure the first language to make use of immutable array mapped tries? I’d say that’s a pretty significant contribution to the realm of immutable data structures.

dwohnitmok · on May 3, 2020

HAMTs by themselves are interesting, but not game-changing. In both wall clock time and asymptotic time analysis Clojure's immutable data structures are plenty fast, but not ahead of the pack of other immutable data structures when disallowing transients. I'm not sure, but I think that's the main reason that even though Bagwell came out with HAMTs in 2000, it wasn't until Clojure 7 (or 8?) years later that they were finally picked up.

Transients on top of HAMTs are a major step forward.

Skinney · on May 5, 2020

HAMT's probably weren't all that interesting because their main benefit over other mutable hashmaps was reduced worst-case scaling (no need to re-hash all keys to fit a larger underlying array), and reduced memory use at the cost of slower access times. I believe Clojure was the first language to use them as an _immutable_ hash map, and HAMT was picked specifically for performance. Transients didn't come until later (1.1.0?), and transients doesn't draw any specific benefit from a HAMT, it would have the same effect on any other datastructure.

Clojure also made use of the underlying HAMT data structure for vectors (skip hashing and use the index as a key).

I believe red-black trees was the state of the art immutable datastructure at the time, and HAMTs are much faster. Using two levels of a HAMT you can store 1024 elements, the similar number of levels required for a red-black tree is 10 if I'm not mistaken. HAMT also doesn't require rebalancing.

dwohnitmok · on May 6, 2020

Not quite, there were a bunch of other persistent data structures. Finger trees, PATRICIA tries, etc. And the perf was competitive. See http://blog.ezyang.com/2010/03/the-case-of-the-hash-array-ma... for example. In there Haskell's IntMap (powered by a PATRICIA trie) just beats out Clojure's HAMT without transients. In practice they're probably more or less equal, because in order to make IntMap usable for arbitrary keys and not just ints, you need to have a hashing step first that generates an int from an arbitrary key, so you have the overhead of a single hash.

Ah you're totally right RE transients. I don't know why I had the notion they were tied to HAMTs... Nonetheless I view it as one of the big improvements that Clojure brought to the scene.

Skinney · on May 7, 2020

Thanks for the link. Interesting. I have implemented finger trees in Elm, and for the purposes of working as a hash map, it’s performance did not get near the performance of a HAMT. Works great as a deque though.

I also believe I looked into a Patricia impl, but I could be wrong. Will definetly take a new look :)

dustingetz · on May 3, 2020

"I remember a talk at some point where he talked about his skepticism of the entire idea of parameter ordering vs just named parameters or its equivalent, e.g. passing in a map."

Anyone have a source for this one?

vnorilo · on May 3, 2020

As Rich himself wrote, none of these ideas or implementation strategies were novel or original in clj. What appeals to me (as well) is how tastefully it was put together. A bit like the typical clj project that composes smallish libraries. The strength of the ecosystem is in how well those snap together, which comes back to the tasteful bit - great synergy and universal use of the core language features.

codesections · on May 3, 2020

> Can you imagine programming without immutability?

Yes.

Don't get me wrong; I'm a big fan of immutable data and think it is often the best approach. But I also think it is far from the default in many production systems. If you cannot imagine programming without it, it is possible that you may be working in a (wonderful, functional) bubble.

jonahbenton · on May 3, 2020

Point taken, was poorly stated.

Context- I very clearly remember arriving at the benefits of the idea of immutability long before Clojure but I didn't have the word immutability or a hook in the public discourse to connect it to. There were of course related concepts and projects from the late 1990s and early 2000s- idempotent operations in HTTP, log-based file systems, packaging systems like NixOS, some PL papers- but prior to Rich's early popularity I remember difficulties on various projects in explaining this approach to those for whom it was a new idea.

I feel like shortly after Clojure and Rich's talks, the idea- the term "immutability"- achieved R > 1 spread and it quickly became a default design principle.

So the point would be better stated- can you imagine (or remember) when you didn't have the term immutability to refer in shorthand to?