This paper [1] (thou it is a tad old now) shows that the register machine approach does still outperform the stack machine by a fair margin. Largely from the reduction of needed instructions.
Wonder which is a better source represntation for a JIT thou.
JIT compilers aren't really an argument either way, as you can do tracing or transform to an SSA graph from either representation quite easily. All the real complexity is further down the compilation pipeline.
I (half) wrote a toy trace compiler based on a register bytecode a while back. Writing the front end was quite pleasant - the difficult parts were back end stuff like figuring out how to support trace exit information. (The difficulty there is that you might want to perform stores, materialize objects etc as you go off trace as part of store and allocation sinking.)
That would be very inefficient, because registers cannot be indexed. The dispatch you have to introduce to jump to code that references the right physical regs would murder you with mispredictions.
Also, register VMs typically have three operands. Specialising each instruction for each possible register for three operands would result in a ridiculous volume of code.
Special purpose instructions that access fixed registers would be fine, but general purpose operand references cannot be sanely implemented in this way. Existing register VMs use memory because it's faster.
> Also, register VMs typically have three operands.
Wrong. Only SSA VMs (with an optimizing compiler) need three operands.
Simple register VMs as well as hardware CPUs have two operands only,
and you can easily fit them into one word then.
You cannot do better SSA based optimizations then, but the VM speed is much faster than with two or multi word ops, which blow up your instruction cache.
lua has to use shorter ranges to get 3 operands into one word.
Writing a JIT with two-address ops is also trivial. It's a straightforward conversion using fixed registers.
> Specialising each instruction for each possible register for three operands would result in a ridiculous volume of code.
Nobody does that, it's not needed.
> Special purpose instructions that access fixed registers would be fine, but general purpose operand references cannot be sanely implemented in this way. Existing register VMs use memory because it's faster.
No. You need general purpose operand insns for only one or two fixed registers (eax, edx typically). Nobody implements all possible combinations.
And you use the stack to shuffle values around, not memory.
Multiple real implementations of register VMs have three operand instructions. Lua, Luajit and Guile are examples that I'm familiar with. For that matter, many ISAs feature three operand instructions. Even some of the recent extensions to x86-64 are three operand versions of (some of) the vector instructions.
If you think three-operand instructions aren't common in both hardware and interpreter design, you are simply misinformed.
None of this has anything to do with SSA. SSA IR can be reasonably constructed from any of stack, accumulator, two-operand register or three operand register bytecode. All of the usual optimizations follow from that.
> the VM speed is much faster than with two or multi word ops
While variable length encodings are best avoided, that doesn't rule out three operand instructions. For a good example of an extremely fast interpreter that uses such instructions, have a look at luajit.
> Nobody does that, it's not needed.
That's what the OP was suggesting, so that's what I was discussing.
Registers are stored in memory, but ideally "memory" means L1 cache, and it seems to me like register VMs would have better cache locality. This might be why they're getting more popular relative to stack VMs as the speed advantage of cache increases.
Stack machines are constantly using the same absolute offsets into the stack when evaluating in a loop, or evaluating different branches of a big expression. In fact, one of the simplest register allocators you can use in an expression code generator is a stack; pop to allocate, push to deallocate, and spill if you run out. On a CPU, this will have benefits, but in a VM, it's just a relabeling scheme for the same memory slots a stack VM would use.
What you potentially gain with a register VM is reduced accounting overhead; a stack machine will be constantly adjusting the stack pointer on every operation. It's a tradeoff for less complexity at codegen time. It's swings and roundabouts in the lowlands of performance, though; no loop and switch VM will be super-fast.
* In interpreted environments, registers are stored in memory anyways, so the advantage of simulating them isn't as great
* It is easier to generate code for stack machines, because you don't need to run register allocation
* There's a tradeoff in instruction complexity versus number of instructions between stack and register machines