You don’t even solve the problem by moving the memory on-die, as the majority of time is not spent sending the signal on traces, it’s waiting for the relatively slow memory to find, read, and send the data on the bus.
Expanding the cache so everything fits in it is one way to achieve the performance uplift of cache hits, but cache is expensive compared to memory, and current cache sizes are tiny compared to RAM. If cache gets 50x bigger tomorrow, chips will get at least 10x as hot, power-hungry, and expensive.
Apple RAM is not on the same silicon[1] as the CPU, just on the same package. Also it doesn't have significant latency advantages. It has higher throughput, possibly because of a custom controller. The biggest advantage is that the GPU doesn't have to go through the PCI bus to access it.
[1] my understanding is that RAM and CPU processes are very different and it is hard to produce a chip with both features while remaining optimal.
Distance from the CPU. And traditionally, RAM sits waaaaaay across the motherboard. If you put it on the CPU die, it’s much closer and much faster. Still slower than L1 and L2 cache, sure, but much faster than waiting on the fetch across the motherboard.
Apple machines are already pretty much not RAM-upgradeable. And they design their own chips too. So if there were benefits to putting RAM on the die (I don't know myself, others in this thread suggest there are in fact not), Apple would seem like a possible first mover, they don't seem subject to the downsides.
I know, sounds crazy. Who’s gonna make such a drastic change to the industry?