Don't fancy x86 addressing modes provide most of those multiplications and offse...

cfallin · on Feb 3, 2020

Yeah, this should be roughly the same overhead as an ADD:

    LEA rDest, [rBase + 8*rPtr]

(The "load effective address" instruction computes an effective address like a load or store would, but just gives the address without doing a memory access.)

the8472 · on Feb 3, 2020

AIUI mov supports these things directly[0] and if I read the instruction tables correctly then at least on skylake the latency/throughput is the same for all addressing modes[1]

[0] http://www.c-jump.com/CIS77/ASM/Addressing/lecture.html#R77_... [1] https://www.agner.org/optimize/instruction_tables.pdf (page 238)

verwaest · on Feb 7, 2020

Decompression isn't the problem, compression is. Compression is just a mov. Now we need additional shifts.

verwaest · on Feb 7, 2020

Also we'll probably lose some cache benefits from compression due to larger alignment.