IIRC, the original PS3 design was rumoured to have dual Cell processors. It didn't meet performance targets, so the Nvidia GPU was wedged in quite late on in the development cycle.
I think the story was a bit different. Sony had envisioned the the final Cell chip to be more powerful than what they launched (4 PPEs and 32 SPEs @ 4GHz versus 1 PPE and 8 SPEs @ 3.2GHz) and they though that would be enough to render graphics with it like they did on the PS2, and only realized later in development that it won't be enough so they went to Nvidia and asked for a discrete GPU.
AIUI it's basically that the plan was similar to the PS2, where you have a "smart" CPU (Emotion Engine / Cell) and a "dumb" rasteriser (Graphic Synthesiser on PS2), with the Cell SPEs taking the role of the vector units on the EE for things like transform + lighting. But the rasteriser chip project failed.
Then they tried the 2 Cells and software rendering approach, to try and keep it as an "in house" solution.
Finally they went to Nvidia and we got the final Cell + RSX solution.
This is what I've heard as well. They started with one cell, when it failed internal expectations they switched to two cells, then when they heard the rumors of what Microsoft was doing with the Xbox 360 they ran to Nvidia to get a real GPU.
There's an interesting book called The Race for a New Game Machine: Creating the Chips Inside the XBox 360 and the Playstation 3 by David Shippy and Mickie Phipps on the development of the Cell processor.
I can't recall if if had all the details regarding adding a GPU to the PS3, but I remember it had stories about the awkwardness of having the IBM Cell team also working on Microsoft's Xenon CPU for the 360, which IIRC used modified Cell PPE cores.
They absolutely did. The PPEs were far too anemic to run anything like halflife on their own.
Gabe Newell famously sounded off about how useless it was and wouldn't transfer to any other hardware, but he ended up being wrong. The architectural model forced on you by the SPEs is the model that the industry (not just games but any compute intensive work) has embraced for the heavy multicore but coherent system world.
Rather than the "thread per vague activity" model, we've embraced a one thread per core with a work stealing scheduler walking a DAG of chunked compute tasks.
For a modern view of this, this is exactly how Rust's rayon library works.
The whole thing reminds me of a lot of the complaints about the N64, a lot of which ended up being the fact that DRAM was no longer a single cycle away like it was for the SNES. Yes cache conscious code is more difficult to write, but that memory hierarchy architecture was more a harbinger of the new world rather than a one off weirdness of a single console.
Some games are built the way you've described with job threads and queues, but regrettably most are built with a main thread, a render thread, and sometimes use other threads for some compute.
> I think I heard the games Valve published for PS3 didn’t really use the cell stuff
Maybe - the Orange Box at least was outsourced to EA and the performance was reportedly not great. I think Portal 2 was done in-house though and I don't remember anyone complaining about it.
At lot of early games were something like that. They’d mostly ignore it or maybe use one SPU or shove tasks on them that really didn’t take advantage of their power at all. Just leaving a lot of performance untouched.