The main takeaway from this is that having one somewhat ill-defined definition of equality over the whole language ecosystem is a bad idea. There is a reason why typical LISP has 5+ different notions of equality in the base language.
The typical reason for having some kind of language-wide notion of equality (and hash code) is to support some kind of general collections library. If the objects that have some kind of general equality relationship defined by anything else but their identity are mutable then the whole thing invariably breaks down. And on the other hand you often want collection that uses some completely arbitrary notion of equality. This means that good design of collections library should not dictate any interface for the equality concept on the elements and use plain object identity by default while allowing overriding of that not per class of the elements, but per usage of the collection.
Completely agree. I’ve been working on a Common Lisp collection library, and I went out of my way to make the element comparison function a parameter at data structure init time. If someone disagrees with my very strict definition of equality, they are welcome to provide their own comparison function! It always bothered me very much that FSet (the most prominent collections library for CL) used a definition of equality that is based on the a specific generic function that couldn’t be replaced. The icing on the cake is that FSet’s equality generic function messes up with derived classes. Oops.
For immutable value types, AutoValue[1] is a nice library for automating this. Lombok[2] is also popular, but a lot of orgs avoid it because it has some pretty deep dependencies on implementation details of the JDK that aren't guaranteed to be stable across releases. It seems well-maintained in practice, though. Once Java 16 is widely adopted, Java records[3] will probably be the preferred approach, but it will be a while before most teams are on a version of the JDK that supports them.
For mutable objects, the notion of equality is hairy in general, because what you want often depends on why you'd be comparing the two objects. You might have two mutable objects that can be meaningfully compared in your business logic, but absolutely should not be used as keys in a map. Often you can satisfy your goals by implementing a separate method that does not attempt to conform to the 'equals' contract, which ensures it will not unintentionally be used for e.g. data structure ordering.
This is one where composition really does shine. A colored point is a point and a color. Make it so that it is explicit, and the code is much easier to reason about. Two colored points only equal if both components equal. And you have obvious ways of comparing the points and the colors.
It doesn't much make a difference. If you want to write an equality method for ColoredPoint (containing a Color and a Point) then you're doing all the same work.
This is true. The solves for Pitfall #4. But it solves for Pitfall #4 but just explicitly being different types. That works when you really want them to be different types. For this example, I agree -- ColoredPoint and Point should not be designed as the same type.
In cases where inheritance makes more sense, the equals relationship is also easier to specify.
Reference equality's inability to survive a persistence round trip is the big killer. And sneaking in a round trip in a process that used to be fully in memory is where dragons lurk.
Sometimes you are seeing if a constructed instance equals an existing one. I would presume that is rather common. Especially with points, a basic collision system is often basically an equality check. Right?
A "point" would be a value type and you wouldn't use reference equality or inheritance for a value type.
But if this were a game you might have a "BigBoss" instance than inherits from "Enemy". If you had one boss, you'd only have one instance of it, and you would use reference equality. If you round-tripped and ended up with 2 instances of the same boss then you're already in trouble.
OK, can you go into detail why equality between Point and ColoredPoint written using composition is simpler than equality between Point and ColoredPoint written using inheritance. Specifically, how does composition make equality transitive?
A point can never equal a colored point. You can extract the point from a colored one to compare it to a point. That is, you can compare a point to the point of a colored point. But to compare a point to a colored point is essentially false, by definition.
Ah, so your point (ba-dum-ts) is that equality should just be type-invariant. That's a reasonable take, but it's a bit orthogonal to the larger conversation.
In your invariant world, is the integer 2 equal to the floating-point number 2.0. Or is 2.0 actually a composition of an integer part and a fractional part, while 2 and 2.0 are not directly comparable?
Comparing equality of color seems like a trap. If they aren't from the same color space, it's more like comparing a float, which will need some precision specified.
https://news.ycombinator.com/item?id=26999909 has the right answer, IMO: you can have multiple equality comparators (and/or orderable comparators) and specify which to use in your collections like sets.
It's not clear that there should be a single notion of equality for any class/struct/whatever, but that there might need to be context-specific notions of equality.
Is there a legitimate scenario for equals method or its counterparts to be defined on anything but final (sealed in C#) classes, that only have immutable fields?
Two-place equality methods are a general solution, but not a very good idea. I have hit upon the a better abstraction for handling custom equality, which I call "equality substitution". An objects equality method doesn't perform the equality comparison, but rather nominates an alternative value which is used as that object's representative for equality comparisons. That alternative is the "equality substitute".
The key idea is that we have a reasonably rich set of built-in objects in the language that support equal equality useful ways.
Instead of trying to bend other objects to also support equal equality from scratch, we require objects to calculate a representative of themselves which is some language built-in data type that natively supports equal.
Equality substitution, though not a perfect concept, removes much of the error proneness of defining equality.
- The application developer is never required to write a hashing function. Not only does that relieve the application programmer of burden, but it also relieves the system of the burden of having to support custom hashing functions everywhere such as in hash tables.
- Inequality comes for free also. For instance, suppose a user-account structure has an equality method that produces the user-id (a character string) as the substitute. This means that we can now compare two user-account objects using less. One user-account is less than another if their respective user-id fields are less, which is a lexicographic comparison.
- The equality substitute can be the result of a complex calculation, which is cached in the object. I.e. the equal method can produce some compound key the first time it is invoked, and then use the cached value subsequently.
Demo:
This is the TXR Lisp interactive listener of TXR 257.
Quit with :quit or Ctrl-D on an empty line. Ctrl-X ? for cheatsheet.
Join TXR Rewards now, and get 15000 closing parentheses you can use anywhere.
1> (defstruct account ()
user-id
(:method equal (me) me.user-id))
#<struct-type account>
2> (defvarl a (new account user-id "aardvark"))
a
3> (defvarl b (new account user-id "bat"))
b
4> (defvarl z (new account user-id "zebra"))
z
5> (less a b z)
t
6> (less b a)
nil
7> (equal a b)
nil
8> (equal a a)
t
9> (equal a (new account user-id "aardvark"))
t
10> (hash-equal a)
327970753
11> (hash-equal "aardvark")
327970753
12> (defvarl h (hash))
h
13> (set [h a] 100)
100
14> (set [h b] 200)
200
15> 4
4
16> (set [h z] 300)
300
17> h
#H(() (#S(account user-id "bat") 200) (#S(account user-id "aardvark") 100)
(#S(account user-id "zebra") 300))
18> [h a]
100
19> [h b]
200
20> [h "aardvark"]
100
A key observation here is that the object a, of account type, is equal to the character string "aardvark", as well as to any object of any type which produces that string as its equality substitute.
However, this is a very clear aspect of the model, and not a hidden pitfall.
We are saying that a participates in equality as if it literally were the string "aardvark" (though not satisfying the stringp predicate).
This is clear from the beginning and you program your logic accordingly, taking advantage of that while anticipating whatever risks it may cause.
The system is very easy to understand and use; so far in every program where I took advantage of equality substitution, it was a pleasure to work with.
I have seen this in the wild, in the form of code that compared objects by asking them to serialize themselves to JSON and then comparing the JSON. It was a classic case of a system where behavior and data had no reason to be mixed, except the programmers had been taught that the only correct way to view data was as the behavior of an object. Luckily, each object in the system needed to be able to reproduce the record it represented in JSON form for the purpose of interacting with other systems. That allowed us to check whether two objects created by different subsystems represented the same record.
I have been out of that mindset for so long it is strange to remember, but I do remember: records coming out of a database or an API request were weird and gross. They weren't encapsulated, and they didn't do anything. You weren't supposed to touch the insides of an object, and records were NOTHING BUT INSIDES. It was unsettling, like seeing a picture of a grisly accident. You're not supposed to see the insides. The first thing we always did was hide those fields safely in an object. Whew. Only touch them with getters and setters. Much nicer; much safer. Thank goodness for OOP.
How is this superior to the standard relational algebra method of using immutable keys associated with mutable attributes?
The fun example is ZIP CODE for houses. Since one house can have multiple values, it makes sense that the house had another key representation. And that using equality based on what is ultimately a mutable attribute just doesn't really make sense.
Even your "aardvark" example fails if you for some reason also reference it as an "Orycteropus afer". Right?
Now, if you expose that at all equal spots, it makes sense. Sorta "a.equals(b, by(get name))". Or as common lisp does it in most spots, ":key #'person-name". :)
Firstly, whatever problems you anticipate with the aardvark also afflict the Java way of defining a two-place equal method, with the hashing burden and all.
Ultimately, what do these operations do? They bottom out on some built-in data types like strings and numbers: stuff made of bits.
Equality substitution says: please, just produce the representative bits, and let the regular equality deal with it, instead of keeping those bits encapsulated and trying to implement hashing, equality and inequality yourself.
Imagine the account object implemented in a Java-like way. All that the binary equal method is going to do is compare the darned user-id strings. The hash method will hash the string. The less comparison method will compare the strings. At the end of all that, due to the encapsulation, you still don't have the useful property that a and "aardvark" are equal: the equal method is only called for other account objects.
Equality substitution doesn't enforce the immutability of keys. That's up to the programmer and almost always a good idea.
In a small "throwaway" program, if someone finds some clever use for a mutating equality substitute, I mean, more power to them.
(That program probably won't be, or shouldn't be, using hash tables, because the documentation explicitly states that hash table behavior becomes unspecified if objects are inserted into it whose equality substitutes mutate.)
The leeway in the system allows an object to come to life without an equality substitute value, which is then lazily calculated by the equal method and cached. Since that happens on the first call, it doesn't look like a mutation from the equality point of view.
Objects can have more than one key. Objects of the same kind can be considered equal in more than one way. Well, there is only one equal function: you get to pick one of those equalities to be the one that equal uses.
I thought about ways to work-in multiple equality relations, but that's a work in progress.
> you still don't have the useful property that a and "aardvark" are equal
That sounds more like a footgun to me. The blogpost slug "aardvark" and the username "aardvark" aren't equivalent. The type carries crucial semantic intent.
There can easily be a form of equality substitution whereby the types of the original objects remain relevant for the purposes of the comparison. So that is to say, equal still demands some sort of type match on the original objects, and if that is satisfied, then it proceeds with the equality subs.
Just, not in my language, thanks.
Problem is, how exact of a match do we require? And the exact match is out of the question, too, since it ignores inheritance. If the blogpost slug is derived from username by inheritance, I might want them to be comparable. Thus left has to be a subtype of right, or vice versa.
However, a key property of my object system is that it does not require inheritance for substitutability, you see. The programmer expectation is that they can write some new type from scratch which just has the right properties and methods to be used wherever some existing type is applicable, without inheriting from any common base. And "right properties and methods" includes the equal method for equality substitution.
Also read as: "This language is so poor at expressing common abstractions that we need to litter our code with not-code so that a compile-time code generation tool can make it tolerable."
Though, to be fair, Java finally has records (or will?).
I've heard more than one Java programmer dismiss macro based compile-time meta-programming as way to complicated while happily littering codebases with annotation and classpath based runtime meta-programming.
"... Making this language actually quite pleasant to use due to its gigantic ecosystem." but I see your point. Lombok is really just a collection of bandaids and some cool stuff and @SneakyThrows as mandatory footgun.
The typical reason for having some kind of language-wide notion of equality (and hash code) is to support some kind of general collections library. If the objects that have some kind of general equality relationship defined by anything else but their identity are mutable then the whole thing invariably breaks down. And on the other hand you often want collection that uses some completely arbitrary notion of equality. This means that good design of collections library should not dictate any interface for the equality concept on the elements and use plain object identity by default while allowing overriding of that not per class of the elements, but per usage of the collection.