IMHO that's a little backwards. I mean, yes, in its ultimate expression in systems like VAX that's how it ended up. But the "goal" of processor designers was never programmer comfort, and I think expressing it that way makes the whole thing seem like a mistake when in fact microcode was a critically important innovation.
What microcode really does is make it possible to share hardware resources inside the execution of the same instruction. Hardware is really (really) expensive, and a naive implementation of a CPU needs a lot of it that you can't really afford.
Take adders. You need adders everywhere in a CPU. You need to compute the result of an ADD instruction, sure, but you also need to compute the destination of a relative branch, or for that matter just compute the address of the next instruction to fetch. You might have segment registers like the 8086 that require an addition be performed implicitly. You might have a stack engine for interrupts that needs to pop stuff off during return. That's a lot of adders!
But wait, how about if you put just one adder on the chip, and share it serially by having each instruction run a little state machine program in "micro" code! Now you can have a cheap CPU, at the cost of an extra ROM. ROMs are, technically, a lot of transistors; but they're dense and cheap (both in dollars for discrete logic designs and in chip area for VLSI CPUs).
That is, microcode started as a size optimization for severely constrained devices. It was only much later that it was used to implement features that couldn't be pipelines. Only the last bit was a mistake.
I'm not really following that explanation. You can get the same sharing of hardware resources with hardwired control circuitry as with microcode. (In fact, probably more with hardwired control circuitry since micro-instructions put some limits on what you can do.) It's just a matter of how you implement the state machine.
(As an aside, the 8086 has a separate adder for address computations, independent of the ALU. You can see this in the upper left of the die photo.)
> You can get the same sharing of hardware resources with hardwired control circuitry as with microcode.
Only with some other kind of state machine, though. I was maybe being a little loose with my definition for "microcode" vs. other state paradigms.
Maybe the converse point makes more sense: "RISC", as a design philosophy, only makes sense once there's enough hardware on the chip to execute and retire every piece of an entire instruction in one cycle (not all in the same cycle, of course, and there were always instructions that broke the rules and required a stall). Lots and lots of successful CPUs (including the 8086!) were shipped without this property.
My point was just that that having 50k+ transistor budgets was a comparatively late innovation and that given the constraints of the time, microcode made a ton of sense. It's not a mistake even if it seems like it in hindsight.
There's another implicit assumption here, though, which is that code size optimization is important. Which is obviously true given the era we're discussing, but it makes it clear why the implicit third possibility between "microcoded CISC" and "fully pipelined RISC" wasn't an option. Microinstructions themselves tend, especially with earlier designs, to be "simple" in the RISC sense (even if not pipelined, they're usually a fixed number of cycles/phases each). Why, then, are developers writing instructions that are then decoded into microinstructions, instead of writing microinstructions directly in an early equivalent of VLIW? Initially, as a code size optimization so essential as to go beyond an optimization; but then, once the cost of a the microcode decoder and state machine are already paid for, because the chosen ISA, now isolated from the micro-ISA, is a better and more pleasant ISA to develop for.
Not just better and more pleasant, but also because having an abstraction layer between the code and the microarch is nice: it lets Intel modify their CPU's internals as they see fit without worrying about backwards compatibility; it allows Intel to make CPUs of many different speeds/complexities and, regardless of their insides, they all get to be compatible; and, as Intel builds more sophisticated microarchitectures, it allows old code to see speed improvements via faster microcode that has more resources to operate with.
What microcode really does is make it possible to share hardware resources inside the execution of the same instruction. Hardware is really (really) expensive, and a naive implementation of a CPU needs a lot of it that you can't really afford.
Take adders. You need adders everywhere in a CPU. You need to compute the result of an ADD instruction, sure, but you also need to compute the destination of a relative branch, or for that matter just compute the address of the next instruction to fetch. You might have segment registers like the 8086 that require an addition be performed implicitly. You might have a stack engine for interrupts that needs to pop stuff off during return. That's a lot of adders!
But wait, how about if you put just one adder on the chip, and share it serially by having each instruction run a little state machine program in "micro" code! Now you can have a cheap CPU, at the cost of an extra ROM. ROMs are, technically, a lot of transistors; but they're dense and cheap (both in dollars for discrete logic designs and in chip area for VLSI CPUs).
That is, microcode started as a size optimization for severely constrained devices. It was only much later that it was used to implement features that couldn't be pipelines. Only the last bit was a mistake.