I'd encourage people to also read Niels Möller and Torbjörn Granlund's division ...

remcob · on March 24, 2021

IIRC, I tried this method, but did not see any performance improvement. The method replaces a single `divq` instruction with a table lookup and several multiplications. On modern processors this is no longer a worthwhile trade off.

chjj · on March 24, 2021

The 2-by-1 and 3-by-2 division functions described in the paper result in a very measurable speedup in my code. I think you're confusing those with the reciprocal calculation itself (which can be computed with a lookup table). I agree that part doesn't really lend itself to any significant performance benefit and is probably better calculated with a single hardware division instead.

I feel it necessary to point out that the 3-by-2 division actually has multiple benefits which are easy to miss:

1. The quotient loop can be skipped as I mentioned.

2. The "Add back" step is less likely to be triggered.

3. Since a 2-word remainder is computed with the division, you can skip 2 iterations on the multiply+subtract step.

My reimplementation of GMP documents both the 2-by-1 and 3-by-2 divisions pretty thoroughly[1][2].

[1] https://github.com/bcoin-org/libtorsion/blob/master/src/mpi....

[2] https://github.com/bcoin-org/libtorsion/blob/master/src/mpi....

zmodem · on March 24, 2021

I had not seen that GMP reimplementation before, but it looks very readable. Thanks!

I used the Möller & Granlund paper for a portable implementation of 2-by-1 division [1], and on my machine (which is admittedly not exactly new) it runs much faster than the DIV instruction.

[1] https://www.hanshq.net/eprime.html#div

mratsim · on March 24, 2021

The 3-by-2 division is from Burnikel and Ziegler [1][2] with a nice comparison from Hasselstrom between division approaches.

OpenSSL has introduced another 3 by 2, but this one is written with constant-time concerns for cryptography [4]

[1]: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.47....

[2]: https://github.com/status-im/nim-stint/blob/bae3fc6/stint/pr...

[3]: http://bioinfo.ict.ac.cn/~dbu/AlgorithmCourses/Lectures/Hass...

[4]: https://github.com/openssl/openssl/pull/7589