I have a tube of IDT R3041s and R3051s. I remember using GCC compiled for a DECStation 2000 to write code for them. (I made a hand-held computer based on R3041).
I also have a tube of ARM 610s (VY86C060s I think) from the same project, but R3041 was PLCC, whereas ARM was fine pitch PQFP. PLCC was easier to deal with at the time...
In the sense that it's part of the history of the decline and fall of MIPS. In terms of performance, software availability, desirability or anything else, well not so much :-)
In 2010 MIPS wanted $2 million from Berkeley to allow them to use the MIPS instruction encodings for processor cores that Berkeley would design entirely themselves. So they made up their own encodings instead.
The rest is history.
In many ways modern MIPS and RISC-V are pretty much just different binary encodings of the same ideas.
[this was already posted as a comment in a thread, but on reflection it probably deserves its own]
> In 2010 MIPS wanted $2 million from Berkeley to allow them to use the MIPS instruction encodings
Do you have a source for that? I see that you're a RISC-V expert, and I know MIPS, Inc. is notorious for patent lawsuits, so I trust you. I'm just curious about the Berkeley project.
Patterson always smirks when people ask him what the "V" is for, and mumbles an uncharacteristically vague reply (something about it being the fifth chip project he's worked on or something like that...)
It's no coincidence that MIPS used roman numerals for its architectures, MIPS-I, MIPS-II, MIPS-III, MIPS-IV, and MIPS-V.
So instead of using MIPS-V Berkeley created RISC-V.
You aren't going to get anybody who was involved to say this on the record -- that they changed just barely enough to evade the licensing (instruction encoding) and trademark ("MIPS-V") problems. Openly admitting in print that it was a "minimum noninfringing change" simply invites lawsuits claiming that they were one hair's width on the wrong side of that line.
I think you understate the level of IP vetting RISC-V has undergone. All but 6 instructions in the RV32G instruction set could be found in implementations that were at least 20 years old; the remaining six were novel [1].
RISC-V owes MIPS about as much as MIPS owes Patterson. MIPS-V was already ~15 years old when RISC-V was first used in the classroom. RISC-V purposefully paired back the layers of marketing cruft and failed experiments that had grown over Patterson et al.'s original RISC architecture.
There is less confidence in the extensions, but as RISC-V turns 10 ... patent trolls had better hurry if they want to extort anyone based on the ISA alone.
Part of joining RISC-V International is signing a document irrevocably certifying that RISC-V doesn't infringe any of your patents, so that door is definitely closed for MIPS now anyway.
It is a little weird that there is an actual preference in proposing instructions for RISC-V extensions to demonstrate that the instruction was patented but the patent has expired, or at least that the instruction was publicly documented in some ISA at least a couple of decades ago.
By and large, RISC-V is not trying to be novel, but to bring together and simplify established best practice.
I did invent what is believed to be an entirely novel instruction for the RISC-V B extension: GORC. The other instruction GORC shares circuitry with, GREV, is also believed to not have been implemented in an ISA before, though it has been proposed in the literature.
There have been "minimum noninfringing change" MIPS clones, by doing something like leaving out the patented unaligned load and store instructions, but otherwise being identical and compatible with MIPS software and compilers etc.
RISC-V is completely different and incompatible with MIPS at the binary level. The opcodes are all different. The opcode and register fields are build from opposite ends of the word. The conditional branching model is different (MIPS r6 later copied RISC-V's version). The sizes of immediate and offset values is only 12 bits vs 16 in MIPS – which is a major reason for RISC-V having a lot more encoding space free for future instructions. There are of course no branch or load delay slots in RISC-V – something that MIPS again copied in r6.
There is a certain flavour that is similar, but the details are utterly different.
Krste (or maybe Dave?) said so in a video somewhere not that long ago -- I assume in the RISC-V International channel on youtube but I don't recall which one.
Cheers! Dylan will yet have its day :-) I was actually surprised how much positive response it get when I posted on /r/riscv about OpenDylan having been ported to RISC-V (no thanks to me :-( )
This is more or less analogous to Blackberry moving to Android, isn’t it? Storied, old-guard tech company loses most of its market share, trades in its first-party stack for a rising open-source alternative.
Is MIPS still a big enough name to make this much of a coup for RISC-V? Or is this the last-ditch effort of a fallen star of the semi market?
One big difference is most consumers, and I think even many companies using the chips, don't really care what architecture they are using, unlike with an end-user OS/"ecosystem". So if they can use their abilities and experience from MIPS to make RISC-V chips with good price to performance, they could do OK.
> Is MIPS still a big enough name to make this much of a coup for RISC-V? Or is this the last ditch effort of a fallen star of the semi market?
Mips has seemingly been on life support sine the late 90's. I kind of think the SGI buyout and later spin-off doomed them as they were focused on building high performance workstation processors while Arm was busy focusing on low power and embedded systems. Guess who was better prepared for the mobile revolution of the 00's?
I imagine MIPS (via Atheros, Broadcom, etc) was broadly deployed in things like home routers because it was royalty free, power efficient, and already had Linux kernel mainline support. Though probably losing share to ARM now.
Indeed, Asus moved from MIPS to ARM between the RT-AC66U and RT-AC68U, and the various *pkg repos have since dropped support. MIPS may as well be dead.
Powered all SGI workstations up to the end (excluding Windows NT adventure SGI had), and powered both Nintendo 64 and Playstation (1). MIPS was quite hot back then.
Home routers used to be one of the last holdouts of MIPS, but all the modern ones have been switching to ARM. It's pretty much on its last legs there aside from the really cheap, low-end stuff.
It is not that way today but for a long time the entry-level Cisco IOS router was PowerPC based, while the highend was invariably MIPS. On the other hand both architectural choices had nothing to do with the architecture and everything to do with choice of usable SoCs for that particular application (with Motorola/Freescale/NXP's "m68k Cisco router/Sun2 on a chip" SoCs being somewhat ironic in this regard).
I have a few of them blinking next to me. They are fine albeit documentation is severely lacking at the moment, but once you realize how cheap they are compared to the alternatives, new possibilities open up.
Interestingly it's based on a 3 stage core from SiFive, not the RocketChip.
Most newer generation, upcoming replacement of lower cost SKUs, including NAS are all going to ARM.
At one point there were a few more MIPS and even OpenPOWER solution on NAS and router, but at the end having everything on ARM is just so much easier. Little sad to see it go. Especially good old Broadcom MIPS tends to be exceptionally stable.
Ho, ho. Graphene has left the lab, but it's only got as far as the hype mill, not the factory. https://www.manchester.ac.uk/discover/news/the-university-un... was a notable success though; and allegedly graphene-powered trainers there on a different date.
In contrast, with RISC-V you can put your hand in your pocket for early models, can't you?
2021 to me really marked at turning point for RISC-V embedded cores. $4 (+ S/H) for a BL602-based pinecone (or any of the many other BL602 boards). Pretty nice.
> Is MIPS still a big enough name to make this much of a coup for RISC-V?
No, not at this point. RISC-V already has plenty of momentum without this. The only thing that might change that analysis is if MIPS has some architecture patents that they could leverage to get some kind of RISC-V performance advantage. But I doubt they have anything like that now. And it's not like they have a stable of CPU designers that are now going to be switched from working on MIPS to working on RISC-V - they likely haven't had any of those folks working there since the 90s.
> Is MIPS still a big enough name to make this much of a coup for RISC-V? Or is this the last-ditch effort of a fallen star of the semi market?
Not really, I think the only thing they have going for them is that they probably have IP cores that would port easily and have history. However, places like SiFive [1] appear to have early mover advantage and are likely to be quicker to gain critical mass.
There's a lot to like about MIPS. It's a perfectly usable RISC architecture that:
- is easy to implement
- is supported by Debian, gcc, etc..
- is virtualizable
- scales from embedded systems (e.g. compressed MIPS16 ISA) up to huge shared-memory multiprocessor systems with hundreds of CPUs
Like RISC-V, MIPS traces its lineage to the dawn of the RISC revolution in the 1980s, though on the Hennessy/Stanford side rather than the Patterson/Berkeley side.
There is no, one, MIPS ISA. They've been through a number of incompatible changes over the years. MIPS r6 added some things more like RISC-V. NanoMIPS loops even more like RISC-V, though with its own twist (and with some 48 bit instructions, which RISC-V doesn't have yet).
In many ways modern MIPS and RISC-V are pretty much just different binary encodings of the same ideas.
In 2010 MIPS wanted $2 million from Berkeley to allow them to use the MIPS instruction encodings for processor cores that Berkeley would design entirely themselves. So they made up their own encodings instead.
It's funny how this story repeats itself over and over again, yet companies never seem to learn.
The flipside of this is when companies like Apple, Microsoft and Adobe, like it or not, become part of the de facto tech learning arc for college students, who one day grow up to be senior engineers, lead engineers, architects and principal engineers.
I don't know if MIPS is the same, but I worked on an other architecture where NOP is 0x0, and it had an interesting effect. If you called an uninitialized function pointer, and it happened to point into zero:ed out memory, the CPU would execute NOPs for a good while until it hit something else. If that something else was code, it would start executing some function from the start, but with garbage arguments. It would often get quite far in and several functions calls down before something crashed. Made for fun stack traces and interesting debugging :-)
Everyone's right to celebrate the success of RISC-V, but part of me thinks it's a shame that there's relatively little architectural diversity (edit I should have said ISA diversity) in modern CPUs. MIPS, Alpha, and Super-H, have all but faded away. Power/PowerPC is still out there somewhere though. Apparently they're still working on SPARC, too. [0]
At least we'll always have the PS2. ...until the last one breaks, I guess.
I wish the barriers to using new architectures were lower.
For instance, suppose binaries were typically distributed in a platform-agnostic format, like LLVM intermediate representation or something equivalent. When you run your program the first time, it's compiled to native code for your architecture and cached for later use.
I realize I've sort of just re-invented Javascript. But what if we just did away with native binaries entirely, except as ephemeral objects that get cached and then thrown away and regenerated when needed? It seems like this would solve a lot of problems. You could deprecate CPU instructions without worrying about breaking backwards compatibility. If a particular instruction has security or data integrity issues, just patch the compiler not to emit that instruction. As new side-channel speculation vulnerabilities are discovered, we can add compiler workarounds whenever possible. If you're a CPU architect and want to add a new instruction for a particular weird use-case, you just have to add it to your design and patch the compiler, and everyone can start using your new instruction right away, even on old software. You'd be able to trust that your old software would at least be compatible with future instruction architectures. Processors would be able to compete directly with each other without regard to vendor-lock-in.
That's how IBM implemented the AS/400 platform. Everything compiled down to a processor-agnostic bytecode that was the "binary" format. That IR was translated to native code for the underlying processor architecture as the final step. And objects contained both the IR and the native code. If you moved a binary to another host CPU, it would be retranslated and run automatically. The migration to POWER as the underlying processor was almost entirely transparent to the user and programming environment.
> That's how IBM implemented the AS/400 platform. Everything compiled down to a processor-agnostic bytecode that was the "binary" format
Originally, AS/400 used its own bytecode called MI (or TIMI or OMI). A descendant of the System/38's bytecode. That was compiled to CISC IMPI machine code, and then after the RISC transition to POWER instructions.
However, around the same time as the CISC-to-RISC transition, IBM introduced a new virtual execution environment – ILE (Integrated Language Environment). The original virtual execution environment was called OPM (Original Program Model). ILE came with a new bytecode, W-code aka NMI. While IBM publicly documented the original OPM bytecode, the new W-code bytecode is only available under NDA. OPM programs have their OMI bytecode translated internally to NMI which then in turn is translated to POWER instructions.
The interesting thing about this, is while OMI was originally invented for the System/38, W-code has a quite different heritage. W-code is actually the intermediate representation used by IBM's compilers (VisualAge, XL C, etc). It is fundamentally the same as what IBM compilers use on other platforms such as AIX or Linux, and already existed on AIX before it was ever used on OS/400. There are some OS/400-specific extensions, and it plays a quite more central architectural role in OS/400 than in AIX. But W-code is conceptually equivalent to LLVM IR/bitcode. So here we may see something in common with what Apple does with asking for LLVM bitcode uploads for the App Store.
> And objects contained both the IR and the native code. If you moved a binary to another host CPU, it would be retranslated and run automatically
Not always true. The object contains two sections – the MI bytecode and the native machine code. It is possible to remove the MI bytecode section (that's called removing "observability") leaving only the native machine code section. If you do that, you lose the ability to migrate the software to a new architecture, unless you recompile from source. I think, most people kept observability intact for in-house software, but it was commonly removed in software shipped by IBM and ISVs.
App uploads to the iOS store in the LLVM's own Bitcode format is a distant echo of the CPU ISA agnostic IR approach IBM employed at the time. Bitcode is transpiled down to the underlying CPU instructions via the static binary translation and optimisation, and the translation between Bitcode -> x86 or Bitcode -> ARM has been possible to do for some time: https://www.highcaffeinecontent.com/blog/20190518-Translatin...
Rosetta 2 AOT, whilst not being exactly the same thing as the ISA agnostic IR solution, is another example of the static binary translation. Theoretically, Apple could start requiring OS X app submissions to the app store in the Bitcode format as well, so they could be transpiled and optimised at the app download time and perform efficiently on M3, M4, M5 etc CPU's in the future. However, with their habit of obsoleting certain things fast, it is not clear whether they will choose to go down the Bitcode path for the OS X apps.
This is really fascinating. Is there a reason why we haven't seen more of this approach as it seems like it was pretty successful for IBM. Is there a practical reason that would prevent an open source project from building something similar?
> For instance, suppose binaries were typically distributed in a platform-agnostic format, like LLVM intermediate representation or something equivalent.
The Mill does something like this, but only for their own chips. "Binaries" are bitcode that's not specialized to any particular Mill CPU, and get run through the "specializer" which knows the real belt width and other properties to make a final CPU-specific version.
> I realize I've sort of just re-invented Javascript.
Or one of several bytecodes that get JIT or AOT compiled.
WASM in particular has my interest these days, thanks to native browser support and being relatively lean and more friendly towards "native" code, whereas JVM and CLR are fairly heavyweight, and their bytecodes assume you're going to be using a garbage collector (something that e.g. wasmtime manages to avoid.)
Non-web use cases of WASM in practice seem more focused on isolation, sandboxing, and security rather than architecture independence - stuff like "edge computing" - and I haven't read about anyone using it for AOT compilation. But perhaps it has some potential there too?
I think the short answer is that the performance penalty is so significant that it doesn't make sense to use WASM unless you're running untrusted code.
Ah yes, it even rates a mention on that Wikipedia page. Thank you for pointing this out. 1966! Funny how all these supposedly 'new' concepts really go back all the way to the beginning.
Come to Nix and Nixpkgs, where we can cross compile most things in myriad ways. I think the barriers to new hardware ISAs on the software side have never been lower.
Even if we get an ARM RISC-V monoculture, at least we are getting diverse co-processors again, which present the same portability challenges/opportunities in a different guise.
> For instance, suppose binaries were typically distributed in a platform-agnostic format, like LLVM intermediate representation or something equivalent. When you run your program the first time, it's compiled to native code for your architecture and cached for later use.
IBM's OS/400 (originally for the AS/400 hardware, now branded as System i) did precisely this: Compile COBOL or RPG to a high-level bytecode, which gets compiled to machine code on first run, and save the machine code to disk; thereafter, the machine code is just run, until the bytecode on disk is changed, whereupon it's replaced with newer machine code. IBM was able to transition its customers to a new CPU architecture just by having them move their bytecode (and, possibly, source code) from one machine to another that way.
Using ANDF you could produce portable binaries that would run on any UNIX system, regardless of CPU architecture. It was never commercially released though. I think while it is cool technology the market demand was never really there. For a software vendor, recompiling to support another UNIX isn't that hard; the real hard bit is all the compatibility testing to make sure the product actually works on the new UNIX. ANDF solved the easy part but did nothing about the hard bit. It possibly would even make things worse, because then customers might have just tried running some app on some other UNIX the vendor has never tested, and then complain when it only half worked.
Standards are always going to have implementation bugs, corner cases, ambiguities, undefined behaviour, feature gaps which force you to rely on proprietary extensions, etc. That's where the "hard bit" of portability comes from.
You've just reinvented bytecode. JVM/ART, WebAssembly, ActionScript, some versions of .net... You know, all the stuff that supposedly runs on everything.
Surely the barrier to using a new architecture is being able to boot a kernel and run (say) the GNU toolchain, as demonstrated with RISC-V. Then you just compile your code, assuming it doesn't contain assembler, or something. Whether or not you'll have the same sort of board support issues with RISC-V as with Arm, I don't know.
I don't know much about Android specifically. Is it still heavily Java-based?
There are a lot of universal-binary candidates both current and historical. Javascript, Java, SPirV, LLVM intermediate representation are some of the current ones. So, this isn't a new idea, it's just that most of the software I use regularly is compiled specifically for x86-64. Maybe it would be better if that were a rare exception rather than the norm.
> I wish the barriers to using new architectures were lower.
> For instance, suppose binaries were typically distributed in a platform-agnostic format, like LLVM intermediate representation or something equivalent.
We're doing a pretty good job on portability these days already. Well-written Unix applications in C/C++ will compile happily for any old ISA and run just the same. Safe high-level languages like JavaScript, Java, and Safe Rust are pretty much ISA-independent by definition, it's 'just' a matter of getting the compilers and runtimes ported across.
Adopting LLVM IR for portable distribution, probably isn't the way forward. I don't see that it adds much compared to compiling from source, and it's not what it's intended for. (LLVM may wish to change the representation in a subsequent major version, for instance.)
For programs which are architecture-sensitive by nature, such as certain parts of kernels, there are no shortcuts. Or, rather, I'm confident the major OSs already use all the practical shortcuts they can think up.
> When you run your program the first time, it's compiled to native code for your architecture and cached for later use.
Source-based package management systems already give us something a lot like this.
There are operating systems that take this approach, such as Inferno. [0] I like this HN comment on Inferno [1]: kernels are the wrong place for 'grand abstractions' of this sort.
> I realize I've sort of just re-invented Javascript
Don't be too harsh on yourself, JavaScript would be a terrible choice as a universal IR!
> [[ to the bulk of your second paragraph ]]
In the Free and Open Source world, we're already free to recompile the whole universe. The major distros do so as compiler technology improves.
> Processors would be able to compete directly with each other without regard to vendor-lock-in.
For most application-level code, we're already there. For example, your Java code will most likely run just as happily on one of Amazon's AArch64 instances as on an AMD64 machine. In the unlikely case you encounter a bug, well, that's pretty much always a risk, no matter which abstractions we use.
> "Adopting LLVM IR for portable distribution, probably isn't the way forward. I don't see that it adds much compared to compiling from source, and it's not what it's intended for. (LLVM may wish to change the representation in a subsequent major version, for instance.)"
Maybe, but PNaCl (unfortunately deprecated by Google) "defines a low-level stable portable intermediate representation (based on the IR used by the open-source LLVM compiler project) which is used as the wire format instead of x86 or ARM machine code"
https://www.chromium.org/nativeclient/pnacl/introduction-to-...
Sure. I'm not saying it's impossible to construct such an IR and get it to work, I'm saying I doubt it's the best way forward. See my other comment [0] where I mention Google Native Client.
It would be a poor fit for certain languages, there may be performance penalties depending on target platform, it would preclude legitimate platform-specific code such as SIMD assembly, it would preclude platform-specific build-time customization, etc.
The way toward painless portability is to move away from unsafe languages like C and C++, where you're never more than an expression away from undefined behaviour, and where programmers may be tempted to make silly mistakes like writing code sensitive to the endianness of the target architecture. [1] With C and C++, disciplined developers working carefully, can write portable code. With Safe Rust, code can be pretty close to 'portable by construction', like Java. If you feed Windows-style path strings to Linux, or vice versa, then things might go wrong, but for the most part you'll be on solid ground.
> Well-written Unix applications in C/C++ will compile happily for any old ISA and run just the same. Safe high-level languages like JavaScript, Java, and Safe Rust are pretty much ISA-independent by definition, it's 'just' a matter of getting the compilers and runtimes ported across.
That sort of works, but duplicating the proper development environment in the end-user's computer would take a lot of space and would be complicated by the enormous variety of programming languages and environments. Linux distros manage this with a lot of effort. I'm imagining something like a universal intermediate representation that can be compiled quickly (because a lot of the early language-specific part of compilation will have already been done by whoever you get your packages from) and in a uniform way because there's a common intermediate representation format that all the compiled languages use.
Universal binaries might also be acceptable for commercial, closed-source applications where source distribution would not.
> duplicating the proper development environment in the end-user's computer would take a lot of space
Most distros offer precompiled binaries, there are relatively few that use source-based distribution and expect the user to have all the necessary compilers installed.
> would be complicated by the enormous variety of programming languages and environments
That problem isn't effectively addressed by a universal IR. You can't have a single IR that works well for all languages, precisely because of the variety of languages.
> Linux distros manage this with a lot of effort.
Hopefully that should improve if the trend toward languages like Safe Rust continues. C and C++ are infamously full of footguns.
> I'm imagining something like a universal intermediate representation that can be compiled quickly (because a lot of the early language-specific part of compilation will have already been done by whoever you get your packages from) and in a uniform way because there's a common intermediate representation format that all the compiled languages use.
Again this can't be done effectively. There are good technical reasons why Java, Haskell, and JavaScript, don't generally use LLVM as their backend. The differences between languages aren't just skin deep, they extend right through the compiler stack.
To be more precise: it could be done, but there would be an unacceptable performance cost. After all, you could start distributing binaries for the SuperH SH-4, and just use emulation everywhere. The question is whether it could be done effectively.
I mentioned before that LLVM IR is not intended to be used this way, although the Google Native Client project took LLVM and turned it into what you're suggesting.
C and C++ are quite different from Java. The size of the int type varies between platforms, for instance. They also have a preprocessor which allows the programmer to conditionally compile platform-specific code, e.g. intrinsics, fragments of assembly code, or workarounds. The program might use system-specific macros that expand before compilation.
Languages like Haskell are very different from the sorts of languages that LLVM is built for. Even Java prefers to use its own backend, with tight integration with its GC.
There's also a package-management question, although this issue wouldn't be as significant. The C/C++ way is to have the build system (autotoools or CMake or whatever) detect what libraries are available on the system. If an optional library is missing, the C/C++ code is automatically adjusted by the build system, prior to compilation. It would be unusual to detect availability of libraries at runtime. This approach doesn't play nicely with a universal IR. This might not be an issue if the IR is treated as a surrogate for the native-code binary, but the IR wouldn't be a good surrogate for the source.
The C/C++ philosophy is to accommodate platform variations, in contrast with the JVM approach of mandating compliance to a virtual machine. With the JVM approach you forbid the sorts of variations that C and C++ permit (everything from int_fast32_t varying between platforms, to hand-written SIMD assembly).
Others have already mentioned WASM and Google Native Client, both of which are stable, but neither of which are going to become mainstream ways of distributing Unix application code.
This topic has turned up on HN before, but frustratingly I wasn't able to find the thread.
> Universal binaries might also be acceptable for commercial, closed-source applications where source distribution would not.
True, but I think modern Unix OSs do a pretty good job on ABI stability. If they want portability without releasing source (something I don't think GNU/Linux should aim to accommodate, incidentally) they already have other options, like Java. JavaFX doesn't get much attention but it pretty much 'just works' for portable GUI applications.
edit skissane has an interesting comment on ANDF, a solution I hadn't heard of before.
What RISC-V achieves is architectural diversity over the boring mov, add, mul instructions: the interesting part is in vector and matrix manipulation, and while RISC-V is working on a great solution, it allows for other accelerators to be added.
SPARC is well known to be different enough (big endian, register windowing of the stack, alignment, etc.) that it exposes a lot of bugs in code that would be missed in a little-endian, x86 derived monoculture.
I always found SPARC's stack handling to be very elegant and I write enough of low level code that these architectural details do from time to time impact me, but isn't it largely irrelevant for the industry at large?
After all MIPS's original insight was that machine code was now overwhelmingly written by compilers and not handwritten assembly, so they made an ISA for compilers. I think history proved them absolutely right, actually these days there are often a couple of layers between the code people write and the instructions fed into the CPU.
I guess my point is that nowadays I'm sure that many competent devs don't know what little-endian means and probably wouldn't have any idea of what "register windowing of the stack" is, and they're completely unaffected by these minute low level details.
Making it a bit easier for OpenBSD to find subtle bugs is certainly nice, but that seems like a rather weak argument for the vast amount of work required to support a distinct ISA in a kernel.
Honestly I'm not convinced by the argument for diversity here, as long as the ISA of choice is open source and not patent encumbered or anything like that. Preventing an x86 or ARM monoculture is worth it because you don't want to put all your eggs in Intel or Nvidia's basket, but if anybody is free to do whatever with the ISA I don't really see how that really prevents innovation. It's just a shared framework people can work with.
Who knows, maybe somebody will make a fork of RISC-V with register windows!
I had a neat experience a long time ago when I wrote a Perl XS module in C, in my x86 monoculture mindset. When you deploy something to their package manager (CPAN), it's automatically tested on a lot of different platforms via a loose network of people that volunteer their equipment to test stuff...https://cpantesters.org.
So, I immediately saw it had issues on a variety of different platforms, including an endianess problem. Cpantesters.org lets you drill down and see what went wrong in pretty good detail, so I was able to fix the problems pretty quickly.
It used to have a ton of different platforms like HPUX/PA-RISC, Sun/Sparc, IRIX/MIPS and so on, but the diversity is down pretty far now. Still lots of OS's, but few different CPUs.
MIPS and Berkeley RISC started an entire revolution. They appear "not unique" only because other ISAs copied them so thoroughly. I think it's safe to say that Alpha, ARM, POWER, PA-RISC, etc wouldn't have been designed as they were without MIPS.
Even today, comparing modern MIPS64 and ARM aarch64, I find ARM's new ISA to be perhaps more similar to MIPS than to ARMv7.
> What problem did MIPS solve in a unique way that others didn't?
The MIPS R2000 was debatably the first commercial RISC chip. It solved whatever problem you needed a really fast CPU for in 1985. The alternatives on the market were the Intel 386 and the Motorola 68000. The Intel 386 at 16 MHz did about 2 MIPS (heh - millions of instructions per second) with 32 bit integer math. At 16 MHz, the R2000 did about 10 MIPS. Even accounting for RISC code bloat, that's 3 - 4x faster.
Note how there were only two competitors selling 32-bit designs in the market they entered. I think that's probably the biggest impact of MIPS. They actually sold the chip! They wanted companies to design their own computer systems around it. Use it in an embedded device. Whatever. That was not the norm c. 1985 - 1988 for high-end silicon.
There were machines faster than the 386 or 68020 at that time. You could buy one of the fast microprocessor-based VAXes recently introduced. Or if not too squeezed for office space and with a blank cheque, one of the super-minis like a real VAX or IBM's new "mini-mainframe". After '86, maybe you'd buy one of the other RISC options, like SPARC or PA-RISC.
Whatever you bought, it would be the whole system. Take it or leave it. DEC would not sell you something like a CVAX processor all by itself just so you can build it into a product that will compete against them. (Well, they would sell you one, just not at a price you could afford if you aren't a defence contractor.)
Both DEC and SGI would use MIPS processors in their workstations of the late 80s, as did some less well-known names. The embarrassment of having to use a competitor's processor to sell a decently fast and affordable UNIX workstation would inspire DEC to create the Alpha. In this vein of "we'll sell it to whoever wants to buy it!" MIPS was also doing ARM-style core IP licensing, before ARM did. That's probably part of why MIPS was so prominent as an embedded architecture in the late 90s and early 2000s, in everything from handhelds to routers to satellites.
As far as I understand, it's not that MIPS is the best at embedded, it's just that it's cheaper to sell as the license cost is non-existing and good support already exists in kernels and so on.
If MIPS offered adequate performance and features, good performance-per-watt, and a competitive licence fee, and if none of its competitors could beat it, doesn't that count as 'best'?
> it's not that MIPS is the best at embedded, it's just that it's cheaper to sell
That sounds a lot like MIPS being the best at embedded. Not high-end, sure, but a lot of embedded is "what is the cheapest processor that can run Linux?"
ISA diversity can surface bugs in code, that only shows up in ISAs with different approaches. For example code that works on some arches but is slow due to alignment issues will just fail to run on other arches.
I remember CS 61C at Berkeley used to use MIPS to teach assembly language programming and a bit about computer architecture, using the original MIPS version of Patterson and Hennessy's Computer Organization and Design. Now that book is available in both MIPS and RISC-V versions, with, I've assumed, much more effort going into the RISC-V version...
I do think the simplicity of MIPS was a big plus there, including simplicity of simulating it (http://spimsimulator.sourceforge.net/). I suppose a lot of students may appreciate being taught on something that is or is about to be very widely used, even if it's more complicated in various ways -- and the fact that one of the textbook authors was a main RISC-V designer makes me assume that educational aspects are not at all neglected in the RISC-V world.
More on topic, though, RISC-V seems to really be designed in a way that makes it easy to teach. This is partially why I have doubts that it can be made very performant, but the focus of a prettier design over a more practical one is probably going to help it be more accessible to students.
> part of me thinks it's a shame that there's relatively little architectural diversity
Perhaps CPU diversity is in decline but it seems to me that the industry as a whole is moving towards more diversity. It's gotten significantly cheaper to roll your own chips to the point that we've seen entirely new processors emerging (e.g. GPUs, TPUs, etc) and becoming commonplace if not essential.
Isn't the point of RISC-V that the CPU is simple and augmented or complimented by any number of custom co-processors? If this is the general tend in the industry then the CPU itself might become a commodity part to be easily swapped out as something better emerges. Particularly if it can be abstracted away from the ISA.
Academic RISC was designed by Patterson and Hennessy. Hennessy went off and was one of the founders of MIPS, Patterson is one of the instrumental leaders in the RISC-V space.
Patterson and Hennessy were in competition with each other, at different universities. It's only much later they wrote text books together.
Patterson says RISC-V is derived from RISC-I and RISC-II. I think this doesn't really old water -- at least no more so than any other RISC.
RISC-I and RISC-II had condition codes and register windows, like SPARC. RISC-V doesn't have either, like MIPS. The RISC-V assembly language is also very similar to MIPS.
RISC-II had both 16 and 32 bit opcodes (as did IBM 801, and Cray designs) and RISC-V has inherited this (but also following the great success of ARM Thumb2).
Unfortunately it isn't gaining any traction. From a Long term Cost perspective it is actually cheaper choosing ARM even if OpenPOWER is free. And ARM is already inexpensive.
MIPSr6, aarch64 and riscv are siblings born from MIPSr5 plus the best of other RISC architectures.
ppc64le and 32-bit Arm are part of the wider family, but have some notable differences that make them slightly less RISC-like. Both include e.g. more complex condition code handling and instructions that operate on more than three registers.
Since you mentioned ppc64le, there's also aarch64eb (arm64 in big endian mode). I saw that NetBSD supports it. It seems like support for other operating systems is limited mostly because of issues around booting...not the actual kernel or userland itself.
There's a lot of diversity in software. Windows is quite different from GNU/Linux, for instance. There's quite a lot of diversity in the major programming languages.
If you want something really different, whether in operating systems or programming languages, you have it. KolibriOS and Haskell, say.
I just read the official statement[1] that's linked to in the article.
Just so I get this straight: Wave Computing, the company that bought the remains of MIPS, is now (after bancruptcy) spinning it off as a separate company, that is going to work under the name MIPS, holds the rights to the MIPS architecture, but is doing RISC-V?
That's what I got from it, but I have to say I don't understand the decision to throw away such a storied architecture as MIPS. I mean come on, the N64 runs on it!
This article from 2015 ("The Death of Moore’s Law Will Spur Innovation: As transistors stop shrinking, open-source hardware will have its day") is getting better and better with age.
As the cost of continuing Moore's Law approaches infinity yes, it will cease to continue. But not before all but one foundry has been driven out of business.
And then that one remaining foundry will dominate the entire chip industry.
Also: they'll treat startups like dogshit unless they use automatic place-and-route. So chip startups will be reduced to glorified FPGA jockeys. Hard to differentiate when all you're allowed to do is sling Verilog.
This is huge. It looks like the only architectures widely-deployed in ten years will be x86, ARM, Power, and RISC-V (maybe also SPARC64 in Japan, although that's rare in the US).
64-bit cores (aka: classic CPUs) are looking like a solved problem, and are becoming a commodity. SIMD compute however, remains an open question. NVidia probably leads today, but there seems to be plenty of room for smaller players.
Heck, one major company (AMD) is pushing 32x32-bit on one market (6xxx series) and 64x32-bit in another market (MI100 / Supercomputers).
The V in SVE is for vector, the S isn't for SIMD, and it's length-agnostic; I don't know how similar it is to the RISC-V vector extension. Think CDC, Cray, NEC, not AMD/Intel. I guess the recent innovation in that space is actual matrix multiplication instructions in CPUs.
* Bpermute and permute. (Pshufb is like permute, a gather operation. Bpermute is the opposite, like a scatter. Bpermute Doesn't exist on x86 yet)
* __shared__ memory crossbar: every simd unit can read, or write, to shared memory in parallel per clocktick. The crossbar can also broadcast 1-to-all each clocktick.
* Butterfly permute: the fundamental pattern in permutations for a variety of operations. Most noticably for scan, and FFTs. Butterfly networks are closely related to pext and pdep implementation (showing how common that particular permute is).
* 8+ way hyperthreads / SMT. GPUs have very bad latency, but very high SMT counteracts that problem well in practice.
* PCIe Atomics: perform those compare and swap negotiations over I/O, allowing tight CPU and GPU memory integration.
* Crazy RAM. 1000GBps on HBM2. 800GBps over GDDR6x thanks to 2-bits transferred per clock ticks.
* Crazy networks. AMD Infinity fabric pushes over 100GB. NVlink is 600GBps. A GPU network link has more bandwidth than a typical CPU's DDR4 RAM bandwidth.
* NVidia SASS has the craziest instruction set, the compiler figures out read / write hazards and publishes them in the SASS assembly itself. NVidias ISA decoder + assembler team is doing something crazy here, the likes I haven't seen in any other instruction set ever.
* "Ballot" instructions. Its... really hard to explain why these are useful. They just are, lol.
Just a few cool concepts I've seen in the GPU world recently. Sure, matrix multiplications get the headlines because of tensors / deep learning. But don't sleep on the obscure stuff.
SVE is looking like a general-purpose ARM instruction set in the future.
I believe the Neoverse V-cores (high-performance) will have access to the SVE instructions for example. So the SVE-SIMD is not necessarily locked to Fujitsu (though Fujitsu's particular implementation is probably crazy good. HBM2 + 512-bit wide and #1 supercomputer in the world and all...)
SIMD today is only really helpful with a few usecases. If you want to encode some video, decode some jpegs, or do a physics simulation quicker, it's really going to help. It won't boot Linux any quicker tho.
I suspect for consumer uses, SIMD is already used for nearly all the use cases it can be.
The original SIMD-papers in the 1980s show how to compile a Regex into a highly-parallel state machine and then "reduced" (aka: Scan / Prefix-operation: https://en.wikipedia.org/wiki/Prefix_sum).
A huge amount of operations, such as XML-whitespace removal (aka: SIMD Steam Compacting), Regular Expressions, and more, have been proven ~30 to 40 years ago to benefit from SIMD compute. Yet such libraries don't exist today yet.
SIMD compute is highly niche, and clearly today's population is overly focused on deep-learning... without even seeing the easy opportunities of XML parsing or simple regex yet. Further: additional opportunities are being discovered in O(n^2) operations: such as inner-join operations on your typical database.
Citations.
* For Regular Expressions: Read the 1986 paper "DATA PARALLEL ALGORITHMS". Its an easy read. Hillis / Steele are great writers. They even have the "impossible Linked List" parallelism figured out in there (granted: the nodes are located in such a way that the SIMD-computer can work with the nodes. But... if you had a memory-allocator that worked with their linked-list format, you could very well implement their pointer-jumping approach to SIMD-linked list traversal)
* For whitespace folding / removal, see http://www.cse.chalmers.se/~uffe/streamcompaction.pdf. They don't cite it as XML-whitespace removal, but it seems pretty obvious to me that it could be used for parallel whitespace removal in O(lg(n)) steps.
* Database SIMD: http://www.cs.columbia.edu/~kar/pubsk/simd.pdf . Various operations have been proven to be better on SIMD, including "mass binary search" (one binary search cannot be parallelized. But if you have 5000-binary searches operating in parallel, its HIGHLY efficient to execute all 5000 in a weird parallel manner, far faster than you might originally imagine).
----------
SIMD-cuckoo hashing, SIMD-skip lists, etc. etc. There's so many data-structures that haven't really been fleshed out on SIMD yet outside of research settings. They have been proven easy to implement and simple / clean to understand. They're just not widely known yet.
I'm very interested in this space! I've been hacking on some open-source libraries around these ideas: rsdict [1], a SIMD-accelerated rank/select bitmap data structure, and arbolito [2], a SIMD-accelerated tiny trie.
For rsdict, the main idea is to use `pshufb` to implement querying a lookup table on a vector of integers and then use `psadbw` to horizontally sum the vector.
The arbolito code is a lot less fleshed out, but the main idea is to take a small trie and encode it into SIMD vectors. Laying out the nodes into a linear order, we'd have one vector that maintains a parent pointer (4 bits for 16 node trees in 128-bit vectors) and another vector with the incoming edge label.
Then, following the Teddy algorithm[3] (very similar to the Hillis/Stele state transition ideas too!), we can implement traversing the tree as a state machine, where each node in the trie has a bitmask, and the state transition is a parallel bitshift + shuffle of parent state to children states + bitwise AND. We can even reduce the circuit depth of this algorithm to `O(log depth)` by using successive squaring of the transition, like Hillis/Steele describe too.
I've put it on the backburner, but my main goal for arbolito would be to find a way to stitch together these "tiny tries" into a general purpose trie adaptively and get query performance competitive with a hashmap for integer keys. The ART paper[4] does similar stuff but without the SIMD tricks.
A few years ago, I wrote AESRAND (https://github.com/dragontamer/AESRand). I managed to get some well-known programmers to look into it, and their advice helped me write some pretty neat SIMD-tricks. EX: I SIMD-implemented a 32-bit integer -> floating point [0.0, 1.0] operator, to convert the bitstream into floats. As well as integer-based nearly bias-free division / modulus free conversion into [0, WhateverInt] (such as D20 rolls). For 16-bit, 32-bit, and 64-bit integers (with less bias the more bits you supplied).
Unfortunately, I ran out of time and some work-related stuff came up. So I never really finished the experiments.
----------
My current home project is bump-allocator + semi-space garbage collection in SIMD for GPUs. As far as I can tell, both bump-allocation and semi-space garbage collection are easily SIMDified in an obvious manner. And since cudamalloc is fully synchronous, I wanted a more scalable, parallel solution to the GPU memory allocation problem.
Very cool! Independent of the cool use of `aesenc` and `aesdec`, the features for skipping ahead in the random stream and forking a separate stream are awesome.
> My current home project is bump-allocator + semi-space garbage collection in SIMD for GPUs. As far as I can tell, both bump-allocation and semi-space garbage collection are easily SIMDified in an obvious manner. And since cudamalloc is fully synchronous, I wanted a more scalable, parallel solution to the GPU memory allocation problem.
This is a great idea. I wonder if we could speed up LuaJIT even more by SIMD accelerating the GC's mark and/or sweep phases...
If you're interested in more work in this area, a former coworker wrote a neat SPMD implementation of librsync [1]. And, if you haven't seen it, the talk on SwissTable [2] (Google's SIMD accelerated hash table) is excellent.
> Very cool! Independent of the cool use of `aesenc` and `aesdec`, the features for skipping ahead in the random stream and forking a separate stream are awesome.
Ah yeah, those features... I forgot about them until you mentioned them, lol.
I was thinking about 4x (512-bits per iteration) with enc(enc), enc(dec), dec(enc), and dec(dec) as the four 128-bit results (going from 256-bits per iteration to 512-bits per iteration, with only 3-more instructions). I don't think I ever tested that...
But honestly, the thing that really made me stop playing with AESRAND was discovering multiply-bitreverse-multiply random number generators (still unpublished... just sitting in a directory in my home computer).
Bit-reverse is single-cycle on GPUs (NVidia and AMD), and perfectly fixes the "multiplication only randomizes the top bits" problem.
Bit-reverse is unimplemented on x86 for some reason, but bswap64() is good enough. Since bswap64() and multiply64-bit are both implemented really fast on x86-64-bit, a multiply-bswap64-multiply generator probably is fastest for typical x86 code (since there are penalties for going between x86 64-bit registers and AVX 256-bit registers).
---------
The key is that multiplying by an odd number (bottom-bit == 1) results in a fully invertible (aka: no information loss) operation.
So multiply-bitreverse-multiply is a 1-to-1 bijection in the 64-bit integer space: all 64-bit integers have a singular, UNIQUE multiply-bitreverse-multiply analog. (with multiply-bitreverse-multiply(0) == 0 being the one edge case where things don't really workout. An XOR or ADD instruction might fix that problem...).
---------
> This is a great idea. I wonder if we could speed up LuaJIT even more by SIMD accelerating the GC's mark and/or sweep phases...
Mark and Sweep looks hard to SIMD-accelerate in my opinion. At least, harder than a bump-allocator. I'm not entirely sure how a SIMD-accelerated traversal of the heap is even supposed to look like (aka: simd-malloc() looks pretty hard).
If all allocs are prefix-sum'd across the SIMD-units (ex: malloc ({1, 4, 5, 1, 2, 3, 20, 10}) == return (memory + {0, 1, 5, 10, 11, 13, 14, 34, 44})... for a bump-allocator like strategy... its clear to me that such a mark/sweep allocator would have fragmentation issues. But I guess it would work...
Semispace collectors innately fix the fragmentation problem. So prefix-sum(size+header) allocators are just simple and obvious.
--------
On the "free" side of Mark/sweep... I think the Mark-and-sweep itself can be implemented in GPU-SIMD thanks to easy gather/scatter on GPUs.
However, because gather/scatter is missing (scatter is missing from AVX2), or slow (AVX512 doesn't seem to implement a very efficient vgather or vscatter), I'm not sure if SIMD on CPU-based Mark/Sweep would be a big advantage.
------------
Yup yup. Semispace GC or bust, IMO anyway for the SIMD-world. Maybe mark-compact (since mark-compact would also fix the fragmentation issue).
The mark-phase is just breadth-first-search, which seems like a doable SIMD-pattern with the right data-structure (breadth-first is easier to parallelize than depth-first)
> Bit-reverse is unimplemented on x86 for some reason, but bswap64() is good enough.
You totally nerd-sniped me! I implemented a basic "reverse 128-bit SIMD register" routine with `packed_simd` in Rust. The ideas to process 4 bits a time:
let lo_nibbles = input & u8x16::splat(0x0F);
let hi_nibbles = input >> 4;
Then, we can use `pshufb` to implement a lookup table for reversing each vector of nibbles.
let lut = u8x16::new(0b0000, 0b1000, 0b0100, 0b1100, 0b0010, 0b1010, 0b0110, 0b1110, 0b0001, 0b1001, 0b0101, 0b1101, 0b0011, 0b1011, 0b0111, 0b1111);
let lo_reversed = lut.shuffle1_dyn(lo_nibbles);
let hi_reversed = lut.shuffle1_dyn(hi_nibbles);
Now that each nibble is reversed, we can flip the lo and hi nibbles within a byte when reassembling our u8x16.
let bytes_reversed = (lo_reversed << 4) | hi_reversed;
Then, we can shuffle the bytes to get the final order. We could use a different permutation for simulating reversing f64s in a f64x2, too.
Looking at the disassembly, if we assumed our LUT and shuffle vectors are already in registers, the core shuffle should be pretty fast. (I haven't actually benchmarked this or run it through llvm-mca, though :-p)
The goal is to find the values of k1 and k2 that resulted in an evaluate(seed, k1, k2) close to 16-bits (aka: 50% of bits change, the definition of "avalanche condition"). There's probably some statistical test I could have done that'd be better, but GPUs have single-cycle popcount and single-cycle XOR.
I forgot exactly which search methods I used, but note that a Vega64 GPU easily reaches 10 Trillion-multiplies / second, so you can exhaustively search a 32-bit space in a ~millisecond, and a 40-bit space in just a couple of seconds.
You can therefore search the values of k1 and k2 ~8-bits at a time every few seconds. From there, plug-and-play your favorite search algorithm (genetic algorithms? Gradient descent? Random search? Simulated annealing?).
--------
After that, I'd of course run it through PractRand or BigCrush (and other tests). In all honesty: random numbers (with bottom bit set to 1) from /dev/urandom are already really good.
---------
Exhausting the 64-bit space seems unreasonable however. I was researching FNV-hashes (another multiplication-based hash), trying to understand how they chose their constants.
It seems like there are a wide variety of ways to serialize and deserialize data, their performance sometimes varies by orders of magnitude, and the slow code persists because it doesn’t matter enough to optimize compared to other virtues like convenience and maintainability.
The key seems to be figuring out how to get good (not the best) performance when you mostly care about other things?
Machine learning itself seems like an example of throwing hardware at problems to try to improve the state of the art, to the point where it becomes so expensive that they have to think about performance more.
SIMD-compute is a totally different model of compute than what most programmers are familiar with.
That's the biggest problem. If you write optimal SIMD-code, no one else in your team will understand it. Since we have so much compute these days (to the point where O(n^2) scanf parsers are all over the place), its become increasingly obvious that few modern programmers care about performance at all.
Nonetheless, the more and more I study SIMD-compute, the more I realize that these expert programmers have figured out a ton of good and fast solutions to a wide-variety of problems... decades ago and then somehow forgotten until recently.
Seriously: that Data Parallel Algorithms paper is just WTF to me. Linked list traversal (scan-reduced sum from a linked list in SIMD-parallel), Regular Expressions and more.
--------
Then I look at the GPU-graphics guys, and they're doing like BVH tree traversal in parallel so that their raytracers work.
Its like "Yeah, Raytracing is clearly a parallel operation cause GPUs can do it". So I look it up and wtf? Its not easy. Someone really thought things through. Its non-obvious how they managed to get a recursive / sequential operation to operate in highly parallel SIMD operations while avoiding branch divergence issues.
Really: think about it: Raytracing is effectively:
If(ray hit object) recursively bounce ray.
How the hell did they make that parallel? A combination of stream-compaction and very intelligent data-structures, as well as a set of new SIMD-assembly instructions to cover some obscure cases.
There's some really intelligent stuff going on in the SIMD-compute world, that clearly applies beyond just the machine-learning crowd.
> A huge amount of operations, such as XML-whitespace removal (aka: SIMD Steam Compacting), Regular Expressions, and more, have been proven ~30 to 40 years ago to benefit from SIMD compute. Yet such libraries don't exist today yet.
That's exactly why I don't believe it's ever going to happen. If these things could actually be useful in practice, surely someone would have done it already.
This comment seems a bit shortsighted. GPUs and TPUs are SIMD and ML models are increasingly being used in consumer hardware. Video cards are selling out so fast that there's months worth of backorders.
SIMD processors are being put in self driving cars, robots with vision, doorbell cameras, drones, etc. We're only at the beginning of SIMD use-cases.
Depends where you look. Hard to see IBM's z/Architecture dying out in that timeframe (the latest branding for S/360-derived mainframes), for example, and the embedded space is likely to remain an odd bestiary for quite some time.
Embedded is going to become way less of a bestiary, at least in the five digit gate count RISC space.
ARC, Xtensa, V850, arguably MIPS, etc. all worked in the "we're cheaper to license than ARM, and will let you modify the core more than ARM will" space. I'm not sure how they maintain that value add when compared to lower end RISC-V cores. I expect half of them to fold, and half of them to turn into consulting ships for RISC-V and fabless design in general.
You probably mean "TIMI" which is the user-visible ISA of IBM's "midrange" systems (ie. AS/400 or System/i) which was from the start meant as virtual machine ISA that is then mostly AOT transpiled into whatever hardware ISA OS/400 or i5/OS runs on. z/Architecture (S/360, ESA/390, what have you...) is distinct from that and distinct from PowerPC. Modern POWER and z/Architecture CPUs and machines are somewhat similar when you look at the execution units and overall system design, but the ISA is completely different and even the performance profile of the mostly similar CPU is different (z/Architecture is "uber-CISC" with instructions like "calculate sha256 of this megabyte of memory").
I learned Z80 assembly in 1987, x86 assembly somewhere '91-92 can't exactly remember but it was in '94 when I met IBM Assembler (yes they called Assembler language which is also confusing) and I was like "what is this sorcery where assembly has an instruction to insert into a tree".
The "not ARM based" MCU market might be interesting to watch as well. Not just for RISC-V, but also as ARM MCUs continue to drop in power needs and unit cost.
The MIPS name was originally was an acronym for "Microprocessor without Interlocked Pipeline Stages". The RISC-V docs that I've skimmed seem to show quite a lot more hardware pipeline interlocking than the last gen MIPS processors. So the name is a bit funny now.
A few jobs ago, the company I worked for based their own custom processor on a MIPS core. Why MIPS? The answer I got when I asked was that it was the only affordable option. ARM in particular was called out as beyond reach financially. Years later, long after that company was gone, RISC-V came in at an even lower price point. AFAICT there's no need to look for other reasons behind this news.
My historical skepticism on the acceptance and proliferation of RISC-V looks more antiquated by the day. No real dog in the fight, but I would love to see this take off like ARM did.
It's very good move. First, it's for embedded: if you're not designing system on chip it won't matter to you.
But for those who design SoC, and who needed embedded CPU of intermediate power, it's very good news. ARM is expensive here (it's considered cheap compared to Intel, the embedded world is different). The RISC V newcomers are interesting but... new, and it always get a bit of time to bring solutions to maturity, which matters in embedded. And if ARM owns the high-end (where a design must be co-optimized for the latest advanced nodes to really shine, which is very labor intensive and costly), for the mid-range it's much more open.
MIPS had a good mid-range design with the I-7200. Their problem was that the old MIPS ISA was not dense enough (larger I cache, larger Flash footprint) compared to the competition, and their compact versions not good enough. So they designed for the I-7200 a new ISA, nanoMIPS, which has "MIPS" in its name but is completely different. And guess what: nobody cared. It became another proprietary ISA, only supported by MIPS own GCC version.
But still, the design was good, and in particular the LLC/coherency support is much more mature than what many newcomers offer today. Which shouldn't be a surprise, as it's the result of a long line of (good) mid-range CPUs.
By evolving their design to RISC-V, MIPS will have one of (if not the) best mid-range CPU/cluster IP in the market soon for quality vs price --- depending on how fast the competition move, they're definitely not sitting still! And RISC V will solve the toolchain/tools support nanoMIPS has. If there had been such a RISC I-7200 equivalent a few year ago, I may have used it.
So very nice, and I look forward for more competition in the embedded mid-range CPU IP market soon.
There are a couple of vendors of multi-architecture CPUs where the native architecture is MIPSish and the other supported arches are MIPS, ARM, RISC-V & x86. Tachyum is one of them, here is another:
That is a very interesting article. For how they aim to run existing software by binary translations to an ISA optimized for such binary translations. And how much they dread possibly being cut off from foreign foundries and ARM and x86 chips.
I consulted with MIPS and ARM many years ago - pre Y2K (background is microprocessor design and firmware). ARM was a little formal but easy to work with, cooperative and supportive (they opened doors and provided everything I asked for).
Working with MIPS was a nightmare, every step of the way. Weren't responsive, difficult to work with, engineers were stubborn, etc.
I realize these are generalizations and I'm a sample size of one. But I was plugged in at a pretty high technical level with both companies, and I remember telling my wife at the time that I thought ARM would skyrocket and MIPS would be unable to get out of its own way.
Glad I bought a lot of ARM stock before most people knew about them.
The next "loongarch" generation was already announced to move away from mips as the underlying ISA but instead allow running mips, arm64, risc-v, and x86 code in hardware assisted emulation.
"In this context, the “8th generation” refers to seven generations of the traditional MIPS architecture, followed by an upcoming RISC-V design. It sounds like the company is implying that this is a smooth transition with some level of compatibility between the old and the new. It isn’t. It’s a clean break as the company switches from the old CPU design, that it owned, to a new one that’s in the public domain."
Mips R6 (Warrior m62xx/i6400/i6500/p6600) was already somewhat incompatible with R5 and earlier, the seventh generation nanomips i7200 was incompatible with that again.
It could preempt RISC-V long time ago by doing this. I hope it's not too late.
MIPS is still used in routers and set-top-boxes, but the steam is running out quickly, nearly all routers/set-top-box new design are now using ARMs already. There is a last hope though.
The context was not "Who isn't using RISC-V?" It was whether anyone is using SPARC. The reality: they're not mutually exclusive. They offer both, contrary to the implied subtext of your comment that there has been a transition from SPARC to RISC-V.
I spent a summer interning with MIPS hardware when they were owned by Imagination.
It's kinda odd to see something you helped make die (be killed?) like this.
Strong "not with a bang but a whimper" vibes in my office today.
Does this mean that design features from MIPS can now be adopted / adapted in RISC-V? Or is it just a no-longer-used ISA and chip designs that become freely usable?
meh. there's more to a CPU architecture than its instruction set. this isn't the "death of MIPS." far from it, it's the remaining MIPS people applying their own understanding of design to implement a CPU with the RV64 instruction set (plus extensions, presumably.)
that being said... MIPS is certainly smaller than it once was. i hope there are still people around who remember how to build CPUs.
If synthesisable MIPS cores from R3k to R20k enter the public domain, that would be something special. based on how previous such "announcements" have turned out, it's not gonna happen this time either.
1) MIPS Strikes Back: 64-bit Warrior I6400 Arrives https://news.ycombinator.com/item?id=8258092
We are still in the game
2) Linux-running MIPS CPU available for free to universities – full Verilog code https://news.ycombinator.com/item?id=9444567
Okay, we are not doing so great, maybe we can get young kids hooked?
3) MIPS Goes Open Source https://news.ycombinator.com/item?id=18701145
Open Source is so hip and pop these days, lets do that!
4) Can MIPS Leapfrog RISC-V? https://news.ycombinator.com/item?id=19460470
Yeah, sure, that'll happen
5) Is MIPS Dead? Lawsuit, Bankruptcy, Maintainers Leaving and More https://news.ycombinator.com/item?id=22950848
Whoops
6) Loose Lips Sink MIPS https://news.ycombinator.com/item?id=24402107
And there is the answer to the question from previous headline
And now we are here.