Hacker News new | past | comments | ask | show | jobs | submit login
How to Write an Equality Method in Java (2009) (artima.com)
31 points by Tomte on April 30, 2021 | hide | past | favorite | 46 comments



The main takeaway from this is that having one somewhat ill-defined definition of equality over the whole language ecosystem is a bad idea. There is a reason why typical LISP has 5+ different notions of equality in the base language.

The typical reason for having some kind of language-wide notion of equality (and hash code) is to support some kind of general collections library. If the objects that have some kind of general equality relationship defined by anything else but their identity are mutable then the whole thing invariably breaks down. And on the other hand you often want collection that uses some completely arbitrary notion of equality. This means that good design of collections library should not dictate any interface for the equality concept on the elements and use plain object identity by default while allowing overriding of that not per class of the elements, but per usage of the collection.


OCaml solves this elegantly by making the equality functions parameters to the constructor of the collection type.


Completely agree. I’ve been working on a Common Lisp collection library, and I went out of my way to make the element comparison function a parameter at data structure init time. If someone disagrees with my very strict definition of equality, they are welcome to provide their own comparison function! It always bothered me very much that FSet (the most prominent collections library for CL) used a definition of equality that is based on the a specific generic function that couldn’t be replaced. The icing on the cake is that FSet’s equality generic function messes up with derived classes. Oops.


For immutable value types, AutoValue[1] is a nice library for automating this. Lombok[2] is also popular, but a lot of orgs avoid it because it has some pretty deep dependencies on implementation details of the JDK that aren't guaranteed to be stable across releases. It seems well-maintained in practice, though. Once Java 16 is widely adopted, Java records[3] will probably be the preferred approach, but it will be a while before most teams are on a version of the JDK that supports them.

For mutable objects, the notion of equality is hairy in general, because what you want often depends on why you'd be comparing the two objects. You might have two mutable objects that can be meaningfully compared in your business logic, but absolutely should not be used as keys in a map. Often you can satisfy your goals by implementing a separate method that does not attempt to conform to the 'equals' contract, which ensures it will not unintentionally be used for e.g. data structure ordering.

[1] https://github.com/google/auto/blob/master/value/userguide/i...

[2] https://projectlombok.org/features/EqualsAndHashCode

[3] https://blogs.oracle.com/javamagazine/records-come-to-java


Java records are out of preview now in Java 16


Thanks, I edited with your correction.


This is one where composition really does shine. A colored point is a point and a color. Make it so that it is explicit, and the code is much easier to reason about. Two colored points only equal if both components equal. And you have obvious ways of comparing the points and the colors.


It doesn't much make a difference. If you want to write an equality method for ColoredPoint (containing a Color and a Point) then you're doing all the same work.


You dodge all of the difficulties of inheritance that they reference.


This is true. The solves for Pitfall #4. But it solves for Pitfall #4 but just explicitly being different types. That works when you really want them to be different types. For this example, I agree -- ColoredPoint and Point should not be designed as the same type.

In cases where inheritance makes more sense, the equals relationship is also easier to specify.


I've seen frighteningly few places where the inheritance equality acted as was desired after more than a single collaborator. :(


I think the issue only exists for "value" types anyway. Reference equality works perfectly well for all types where I've used inheritance.


Reference equality's inability to survive a persistence round trip is the big killer. And sneaking in a round trip in a process that used to be fully in memory is where dragons lurk.


Equality is the least of your problems if you multiple instances of the same entity floating around.


Sometimes you are seeing if a constructed instance equals an existing one. I would presume that is rather common. Especially with points, a basic collision system is often basically an equality check. Right?


A "point" would be a value type and you wouldn't use reference equality or inheritance for a value type.

But if this were a game you might have a "BigBoss" instance than inherits from "Enemy". If you had one boss, you'd only have one instance of it, and you would use reference equality. If you round-tripped and ended up with 2 instances of the same boss then you're already in trouble.


OK, can you go into detail why equality between Point and ColoredPoint written using composition is simpler than equality between Point and ColoredPoint written using inheritance. Specifically, how does composition make equality transitive?


A point can never equal a colored point. You can extract the point from a colored one to compare it to a point. That is, you can compare a point to the point of a colored point. But to compare a point to a colored point is essentially false, by definition.


Ah, so your point (ba-dum-ts) is that equality should just be type-invariant. That's a reasonable take, but it's a bit orthogonal to the larger conversation.

In your invariant world, is the integer 2 equal to the floating-point number 2.0. Or is 2.0 actually a composition of an integer part and a fractional part, while 2 and 2.0 are not directly comparable?


Numerical comparisons are already tough. Multi dimensional comparisons are easiest just dodged.

Edit: forgot to say kudos on the joke. :)


Comparing equality of color seems like a trap. If they aren't from the same color space, it's more like comparing a float, which will need some precision specified.


I dodged that dubious point. :)

And, the same goes for points, all told. Does the 2d point (x,y) equal the 3d point (x,y,0)?

Basically, equality works best when you greatly limit the scope and don't try to get fancy.


https://news.ycombinator.com/item?id=26999909 has the right answer, IMO: you can have multiple equality comparators (and/or orderable comparators) and specify which to use in your collections like sets.

It's not clear that there should be a single notion of equality for any class/struct/whatever, but that there might need to be context-specific notions of equality.


Is there a legitimate scenario for equals method or its counterparts to be defined on anything but final (sealed in C#) classes, that only have immutable fields?


Discussed at the time:

How To Write an Equality Method in Java - https://news.ycombinator.com/item?id=649258 - June 2009 (11 comments)


These articles used to appear in JavaWorld in 1998/99.


Two-place equality methods are a general solution, but not a very good idea. I have hit upon the a better abstraction for handling custom equality, which I call "equality substitution". An objects equality method doesn't perform the equality comparison, but rather nominates an alternative value which is used as that object's representative for equality comparisons. That alternative is the "equality substitute".

This is documented in the TXR Lisp reference manual: https://www.nongnu.org/txr/txr-manpage.html#N-02CDB347

The key idea is that we have a reasonably rich set of built-in objects in the language that support equal equality useful ways.

Instead of trying to bend other objects to also support equal equality from scratch, we require objects to calculate a representative of themselves which is some language built-in data type that natively supports equal.

Equality substitution, though not a perfect concept, removes much of the error proneness of defining equality.

- The application developer is never required to write a hashing function. Not only does that relieve the application programmer of burden, but it also relieves the system of the burden of having to support custom hashing functions everywhere such as in hash tables.

- Inequality comes for free also. For instance, suppose a user-account structure has an equality method that produces the user-id (a character string) as the substitute. This means that we can now compare two user-account objects using less. One user-account is less than another if their respective user-id fields are less, which is a lexicographic comparison.

- The equality substitute can be the result of a complex calculation, which is cached in the object. I.e. the equal method can produce some compound key the first time it is invoked, and then use the cached value subsequently.

Demo:

  This is the TXR Lisp interactive listener of TXR 257.
  Quit with :quit or Ctrl-D on an empty line. Ctrl-X ? for cheatsheet.
  Join TXR Rewards now, and get 15000 closing parentheses you can use anywhere.
  1> (defstruct account ()
       user-id
       (:method equal (me) me.user-id))
  #<struct-type account>
  2> (defvarl a (new account user-id "aardvark"))
  a
  3> (defvarl b (new account user-id "bat"))
  b
  4> (defvarl z (new account user-id "zebra"))
  z
  5> (less a b z)
  t
  6> (less b a)
  nil
  7> (equal a b)
  nil
  8> (equal a a)
  t
  9> (equal a (new account user-id "aardvark"))
  t
  10> (hash-equal a)
  327970753
  11> (hash-equal "aardvark")
  327970753
  12> (defvarl h (hash))
  h
  13> (set [h a] 100)
  100
  14> (set [h b] 200)
  200
  15> 4
  4
  16> (set [h z] 300)
  300
  17> h
  #H(() (#S(account user-id "bat") 200) (#S(account user-id "aardvark") 100)
   (#S(account user-id "zebra") 300))
  18> [h a]
  100
  19> [h b]
  200
  20> [h "aardvark"]
  100
A key observation here is that the object a, of account type, is equal to the character string "aardvark", as well as to any object of any type which produces that string as its equality substitute.

However, this is a very clear aspect of the model, and not a hidden pitfall.

We are saying that a participates in equality as if it literally were the string "aardvark" (though not satisfying the stringp predicate). This is clear from the beginning and you program your logic accordingly, taking advantage of that while anticipating whatever risks it may cause.

The system is very easy to understand and use; so far in every program where I took advantage of equality substitution, it was a pleasure to work with.


I have seen this in the wild, in the form of code that compared objects by asking them to serialize themselves to JSON and then comparing the JSON. It was a classic case of a system where behavior and data had no reason to be mixed, except the programmers had been taught that the only correct way to view data was as the behavior of an object. Luckily, each object in the system needed to be able to reproduce the record it represented in JSON form for the purpose of interacting with other systems. That allowed us to check whether two objects created by different subsystems represented the same record.

I have been out of that mindset for so long it is strange to remember, but I do remember: records coming out of a database or an API request were weird and gross. They weren't encapsulated, and they didn't do anything. You weren't supposed to touch the insides of an object, and records were NOTHING BUT INSIDES. It was unsettling, like seeing a picture of a grisly accident. You're not supposed to see the insides. The first thing we always did was hide those fields safely in an object. Whew. Only touch them with getters and setters. Much nicer; much safer. Thank goodness for OOP.


How is this superior to the standard relational algebra method of using immutable keys associated with mutable attributes?

The fun example is ZIP CODE for houses. Since one house can have multiple values, it makes sense that the house had another key representation. And that using equality based on what is ultimately a mutable attribute just doesn't really make sense.

Even your "aardvark" example fails if you for some reason also reference it as an "Orycteropus afer". Right?

Now, if you expose that at all equal spots, it makes sense. Sorta "a.equals(b, by(get name))". Or as common lisp does it in most spots, ":key #'person-name". :)


Firstly, whatever problems you anticipate with the aardvark also afflict the Java way of defining a two-place equal method, with the hashing burden and all.

Ultimately, what do these operations do? They bottom out on some built-in data types like strings and numbers: stuff made of bits.

Equality substitution says: please, just produce the representative bits, and let the regular equality deal with it, instead of keeping those bits encapsulated and trying to implement hashing, equality and inequality yourself.

Imagine the account object implemented in a Java-like way. All that the binary equal method is going to do is compare the darned user-id strings. The hash method will hash the string. The less comparison method will compare the strings. At the end of all that, due to the encapsulation, you still don't have the useful property that a and "aardvark" are equal: the equal method is only called for other account objects.

Equality substitution doesn't enforce the immutability of keys. That's up to the programmer and almost always a good idea.

In a small "throwaway" program, if someone finds some clever use for a mutating equality substitute, I mean, more power to them.

(That program probably won't be, or shouldn't be, using hash tables, because the documentation explicitly states that hash table behavior becomes unspecified if objects are inserted into it whose equality substitutes mutate.)

The leeway in the system allows an object to come to life without an equality substitute value, which is then lazily calculated by the equal method and cached. Since that happens on the first call, it doesn't look like a mutation from the equality point of view.

Objects can have more than one key. Objects of the same kind can be considered equal in more than one way. Well, there is only one equal function: you get to pick one of those equalities to be the one that equal uses.

I thought about ways to work-in multiple equality relations, but that's a work in progress.


> you still don't have the useful property that a and "aardvark" are equal

That sounds more like a footgun to me. The blogpost slug "aardvark" and the username "aardvark" aren't equivalent. The type carries crucial semantic intent.


There can easily be a form of equality substitution whereby the types of the original objects remain relevant for the purposes of the comparison. So that is to say, equal still demands some sort of type match on the original objects, and if that is satisfied, then it proceeds with the equality subs.

Just, not in my language, thanks.

Problem is, how exact of a match do we require? And the exact match is out of the question, too, since it ignores inheritance. If the blogpost slug is derived from username by inheritance, I might want them to be comparable. Thus left has to be a subtype of right, or vice versa.

However, a key property of my object system is that it does not require inheritance for substitutability, you see. The programmer expectation is that they can write some new type from scratch which just has the right properties and methods to be used wherever some existing type is applicable, without inheriting from any common base. And "right properties and methods" includes the equal method for equality substitution.


Makes sense. I particularly like the "more power to them". :)


Just not, like, power over people or world economies or anything of the sort. :)


Easy: @EqualsAndHashCode


Also read as: "This language is so poor at expressing common abstractions that we need to litter our code with not-code so that a compile-time code generation tool can make it tolerable."

Though, to be fair, Java finally has records (or will?).


As a Lisp programmer, I feel personally attacked by that.


At least Java doesn't have those pesky macros! /s


I've heard more than one Java programmer dismiss macro based compile-time meta-programming as way to complicated while happily littering codebases with annotation and classpath based runtime meta-programming.


The difference is that Lisp isn't a programming language- it's a programming language language. ;)


I thought Lisp stands for "Lots of Irritating Single Parentheses" ;)


"... Making this language actually quite pleasant to use due to its gigantic ecosystem." but I see your point. Lombok is really just a collection of bandaids and some cool stuff and @SneakyThrows as mandatory footgun.


It might be a good start in most cases but it is not foolproof in every case.

A foolproof automated method would have to read your mind and Lombok can't do that.


Just for some perspective - this problem does also exists in other languages.

In JavaScript writing:

{a: 7} == {a: 7}

Returns false.

A common way of checking for object equality is to go JSON.stringify(a) == JSON.stringify(b).

And finally now with Java 16 (or something) we get records which often will be what programmers in this situation would like to use.


This is a pretty bad comparison, == doesn't check equality for objects in Javascript and doesn't promise to. Similar to how Java's == doesn't.


According to MDN it does[1]:

> The equality operator (==) checks whether its two operands are equal, returning a Boolean result.

Additionally, the ECMAScript spec calls it the "equality operator" (although admittedly it spells out a 14 step list[2] of how to do this).

1: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...

2: https://tc39.es/ecma262/#sec-abstract-equality-comparison




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: