You can do it by treating everything as a ByteBuffer, but then you pay some conversion costs(Int -> Float is esp painful).
It's fundamentally a data structure + execution flow problem so it's not something that a VM/compiler can help with. The fact that everything in Java is a ref(to be fixed some point in distant future) just means that every object fetch is a cache miss(really bad).
So, we could conceivably claim that if that is fixed, Java might be able to JIT compile run code a bit faster than C++ in a wider application that currently, but then again by the time that happens, C++ might have a plethora of other tricks up it's sleeve to make execution a little faster.
Isn't this what Dalvik does? (And by extension, ART)
IIRC, a register-based virtual machine would alleviate the cache-miss behavior the GP talks about. (Because larger, more complex instructions = fewer hits to cache)
It's been a while since I've mucked around with this type of stuff, though.
Nope, you're thinking a one level too low in the stack.
The issue is that each object is a reference, and because it's a reference it's position in memory is by nature ambiguous(compacting GCs make this even worse by moving things around).
Until you can guarantee memory location you don't know that the N+1 object you're going to access is in the same cache line(or prefetched) and any sort of cache optimization is shot to hell. Only by knowing your standard execution flow and data set size can you write code that's as efficient about cache misses as possible.
Hotspot only tends to focus on tight inner loops where the stuff I'm talking about involves looking at the larger program(and the structures you choose to put data into). Think Structure of Arrays(SoA) rather than Arrays of Structures(AoS). Possible to do in Java, but very painful.
Not only that, but IIRC at the beginning each ref was a pointer into a table that would store the actual address of the object. So every ref was a double pointer dereference.