Registers are stored in memory, but ideally "memory" means L1 cache, and it seems to me like register VMs would have better cache locality. This might be why they're getting more popular relative to stack VMs as the speed advantage of cache increases.
Stack machines are constantly using the same absolute offsets into the stack when evaluating in a loop, or evaluating different branches of a big expression. In fact, one of the simplest register allocators you can use in an expression code generator is a stack; pop to allocate, push to deallocate, and spill if you run out. On a CPU, this will have benefits, but in a VM, it's just a relabeling scheme for the same memory slots a stack VM would use.
What you potentially gain with a register VM is reduced accounting overhead; a stack machine will be constantly adjusting the stack pointer on every operation. It's a tradeoff for less complexity at codegen time. It's swings and roundabouts in the lowlands of performance, though; no loop and switch VM will be super-fast.