I suppose it's possible this could be considered "emulating the whole memory space," though I wouldn't consider it that, but you could just generate an offset table for all jumpable memory locations and use them to calculate the correct offset. It's quite likely you could even do that in less than O(n) space with some time trade-offs.
Upon rereading it, it seems the real insurmountable challenge is interrupts! It always seems like at that level of detail with this stuff, it becomes a decision to go slower in order to simulate NES hardware accurately. But my question is: how much code depends on that accuracy, and how much can you compromise for speed? A further question I have is can you do a cross game analysis of the whole NES library and find common patterns, reused functions, that you decompile into a kind of high level conversion that would hopefully or gloss over the need for specific functions to have that low level accuracy.
I'm pretty sure Nesticle had nothing resembling accurate timing, and it could run most games.
You can go well beyond several of the NES's limitations by fiddling with PPU parameters between scanlines, but that takes careful timing and almost nothing actually does that. Might only be democoders. If you're willing to ignore the possibility that someone did that, you should be able to just process a whole frame's worth of PPU at NMI time.
Super Mario Bros spends most of its time in an infinite loop, waiting for NMI to pull it out so it can start processing the next frame. If NMI triggers before this, you get a lag frame. Plenty of popular emulators get the wrong lag frames, but only speedrunners and TASers really care. So, for this game, you could possibly toss out the timing entirely and trigger NMI when you hit a busy loop. Your emulator will completely ignore all lag frames, but most people won't notice for at least this game.
(It's been over a decade since I was into ROM hacking so my memory could be faulty on any of this.)
SMB also relies on polling the sprite0 flag to know when to write to the x-scroll value. If the timing isn't right, most of the playfield jitters back and forth. Same thing happens with other ROMs that use the same method (like Balloon Fight's "Trip" mode and, bizarrely, Arkanoid). My emulator assumes that there won't be a lot of shenanigans (PPU writes) before the sprite0 flag, and pre-calculates the cycle that it'll hit on. That seems to work.
One of the Ninja Turtles games shifts the x scroll back and forth a couple times per line during some of the opening screens. Skate or Die changes which CHROM bank to use for background tiles a couple times per frame in its inter-level screens. I solve that by logging PPU writes (address, value, cycle it would be received). In practice, most of those mid-frame writes don't change anything that matters to the CPU, but I need the info for rendering.
Rendering the scene at NMI without logging writes to the PPU works for a lot of games, but there are plenty that do mid-frame PPU changes (and it starts getting more difficult with more-advanced mappers including their own interrupts).
I wonder if for things that happen between and among scan lines, how difficult it would be to statically analyse the code to find out what's running during that time, and precalculate when the writes would fire.
another approach could be getting runtime information from a running emulator, and record it onto a file the compiler could use to close the gaps you need but can't get from static analysis.
The issue is that nothing is stopping the developers from doing all kinds of crazy loops and jumps in the code that's invoked by the interrupt, making it into a halting problem.
Unless the interrupt handler is trivially simple, you can't know how many cycles it will take to finish.
the advantage the compiler writer has in this case is it doesn't have to deal with any possible code, just this set of well known already released and frozen in time programs.
I don't think that really avoids the problem. I mean, what if you do a JMP to an address decided based on a value gained from player input? How is the compiler supposed to guess that in advance?
the mathematical proof of the halting problem, as I understand it, would in this context be based on the notion that user input could somehow direct the program to write a program in ram and get it to execute.
This has in fact been achieved in games like super mario world or pokemon!
It seems to me sometimes that educated programmers call halting problem too early on situations like this where you don't actually need a rock solid "provably correct" compiler, just a compiler that does something reasonable in this bounded set of cases, and we don't mind that much if it crashes sometimes, since it's not like an NES game is a rocket or a therac.
Interrupts isn't something you can compromise on though, it would break all but the most trivial NES games (SMB and SMB3 both rely on interrupts).
I assume the trouble with interrupts is the "interrupting", as the location to jump to could be kept in a prefilled table like stormbrew suggested for JMPs.
If the target instruction-set doesn't have any way of "interrupting" that could be used, could it instead be solved by inserting a compact interrupt check between every "line" of assembly?
My understanding is the trouble with interrupting is that from the point of view of mainstream code, the state of all the global variables just changed out from under you. as if by another thread with shared memory.
Which makes me naively wonder if you could just run interrupts on a parallel thread.
The NES just let developers deal with that, pretty much all interrupt code starts with the developer frantically saving the registers somewhere so that he can restore them at the end of the interrupt.
Is it? I was so sure that modern architectures could just push your entire set of registers away to a stack with a single instruction, since it's so common.
I had a fun time playing Prince of Persia on DOSBox. You can change the number of CPU cycles the emulator uses[0], which can cause interesting glitches when it executes too fast.