RISC-V has taken an approach where they have a relatively small core instruction set[1], and then a relatively large set of instruction set extensions.
Think of it like SSE and AVX on the x86 platform, turned up to 11.
This makes it more attractive to chip designers, as they can pick and chose what to implement, making fairly specialized chips at potentially reduced cost.
However this also makes it more difficult to target for programmers, as technically a RISC-V chip can have any combination of these extensions.
To make things a bit more predictable they've come up with these profiles, which is just the core plus a defined set of extensions.
So if you find a chip that says it follows the announced RVA23 profile, you know for example the vector instructions must be available.
You can distribute a binary that uses "core", and then also uses extensions at runtime if they are available.
That's completely fine and standard. It's already what people do with x86 too, if you want to ship say AVX-512 code you have to deal with the fact that the latest Intel client CPUs don't yet have support (product segmentation issues), so at runtime you have to check if the extension is supported.
The situation is not different with RISC-V. You compile your binary for the standard profile, and optionally if you want to use a specialized extension to optimize some hot loop, you check at runtime if it's available.
Right, in the x86 example there are very few optional instructions and there aren't new instructions (at least of consequence) on the horizon. So you can compile with switches if you really want I guess. I think ARM is a better example of things being a mess where people compile for older/core instructions and new instructions are underutilized. I can never remember the whole ARM thing with all the code sequences.. it's highly confusing
Looking at this again - the naming convention is a bit worrying... Does the 23 in RVA23 refer to 2023? Do you need annual recompilations of software..? Do you need codepaths for every year since .. whenever the first core was released?
> Do you need annual recompilations of software..?
The profiles is designed to provide guarantees of what's available. Just because RVA23 was released now, does not mean the RVA22 CPUs and the software that runs on those CPUs stop working.
Also, at least for the foreseeable future, the profiles will be extensions of the previous ones[2]. Thus RVA22 should be a subset of RVA23, and any software targetting RVA22 should run fine on a CPU supporting RVA23.
How does this work long term? So say ten years down the line.. RVA34 comes out. But you will compile for RV22 and then have have spaghetti of 10+ platform specific codepaths in the binary?
I feel at some point you're either going to have penalties from the all the platform check and jumps or you're going to only use these extension in critical hotpaths and they get underutilized (what you see with SSE in x64 and the whole mess on ARM)
It's not going to be every year! And each one runs all software from all previous profiles.
Right now there is RVA20 (almost all shipping hardware), RVA22 (only Canaan K230 and SpacemiT K1/M1 shipping now, more coming next year e.g. SG2380), and RVA23 just ratified, hardware in probably 2-3 years.
Worrying about multiplication of RISC-V profiles is ignoring the existing reality of every other platform.
There are currently 17 different releases of ARMv8-A and ARMv9-A.
Have you seen the extensions list on any recent x86? My laptop reports:
Time will tell. As they note in the page I linked to, they've decided to tackle that one when they get there.
I'll agree it sounds like it could get messy. That said, I'm sort of imagining the core set of extensions to not grow significantly larger. Perhaps they'll add some granularity to the various profile levels, like embedded application CPUs like in set-top boxes might not need all the features of a full-blow desktop or server CPU.
If you have a RISC-V machine with an operating system and a C compiler then it will already include all the extensions from RVA20 (older boards such as VisionFive 2, Milk-V Pioneer, LicheePi 4A, Milk-V Duo, HiFive Premier), or RVA22 (the very newest boards such as LicheePi 3A, Milk-V Jupiter, DC-Roma II, BananaPi BPI-F3) ... and in 2 or 3 years boards meeting this RVA23 spec.
The only things that have mix-and-match extensions are embedded devices where you choose exactly what chip you buy (or build), you know what extensions it has, and you compile for that set of extensions. There is no confusion because everything is under the control of one company.
Machines intended to run software distributed in binary form follow the RVA specs which are few in number, only added to every few years, and each one includes everything in all the previous ones.
There is again no confusion -- or at least no more than in any other ecosystem that is not permanently frozen.
RVA23 is pretty much equivalent to ARMv9 or the latest x86_64 spec with a variable length version of AVX-512.
I really wish RISC-V had been designed in such a way that every RISC-V core must run every RISC-V binary.
However, that wouldn't mean every core must implement every possible instruction - instead, unimplemented features would be emulated. The standard would require that hardware trap everything unimplemented, and that software always contain a fallback implementation.
Then, the fragmentation problem turns from a "it won't work" problem into a "it might not be quite as fast" problem - and for general users, that effectively means everything is 100% compatible.
This does not make much sense to me. First off, extensions are added all the time. That means an application would crash if it used a newer extension that the kernel/firmware didn't know about when it was compiled.
Secondly, extensions cover a lot of things. They can do much more than just introduce instructions, like introduce new hardware registers. The RVA23 profile that's under discussion here for example includes a hypervisor extension[1]. I don't see how you can emulate that in software after the fact in a meaningful way.
Third, RISC-V was designed to be used all the way from microcontrollers to application processors. Especially for the embedded use-cases, you're targeting a well-defined collection of processors, so which extensions you can use is well-known and not subject to changes willy-nilly.
The RVA23 profile that's just been ratified is aimed at more general processors, which will run more varied software. There it makes sense to agree on a well-defined subset of extensions, which is why they do exactly that through these profiles.
> The RVA23 profile that's under discussion here for example includes a hypervisor extension[1]. I don't see how you can emulate that in software
In fact the hypervisor extension was explicitly designed to be relatively efficient to implement in software emulation.
It even says so at the URL you failed to include:
"The hypervisor extension has been designed to be efficiently emulable on platforms that do not implement the extension, by running the hypervisor in S-mode and trapping into M-mode for hypervisor CSR accesses and to maintain shadow page tables. The majority of CSR accesses for type-2 hypervisors are valid S-mode accesses so need not be trapped. Hypervisors can support nested virtualization analogously."
> In fact the hypervisor extension was explicitly designed to be relatively efficient to implement in software emulation.
Fair point, I stand corrected on that one. Should have checked more thoroughly before posting.
> It even says so at the URL you failed to include
That site doesn't render right on my mobile, which I was on at the time, so couldn't read the contents.
> "[...] by running the hypervisor in S-mode [...]"
That doesn't sound very efficient if all you got is an M-mode-only chip though.
In any event, you also have extensions like Ztso[2], which even the spec says to just let the binary crash in event of non-presence, or Zam[3] which you technically could emulate but surely would be dog slow.
> That doesn't sound very efficient if all you got is an M-mode-only chip though.
There is approximately a light-year of space between an M-mode-only chip and anything where you'd consider running a hypervisor.
Suggesting that you want a hypervisor is implicitly saying that you already have S and U modes, virtual memory, page tables, MMU etc.
Otherwise you're at much the same point as those news stories about how someone is running RISC-V Linux on an AVR or Pi Pico (original) or vim macros, by writing a RISC-V emulator on one of those.
This is really the whole problem that Profiles is aiming to solve. A core claiming to be RVA23 must implement certain extensions and may implement others. Your binary can therefore assume the required extensions and test for the optional ones.
This is no different from how x86 works today. Both Windows and Linux OSes and their binaries assume some minimum level of x86-64 support (eg. x86_64-v3 will be the base "profile" for RHEL 10 [1]). For other things like AVX-512, portable binaries test for the "extension" before using.
As the extension spaghetti from OpenGL and Vulkan have proven, alongside with the required software rendering fallback in OpenGL, this doesn't really quite work in practice.
That would involve a 100x slowdown in most cases, without dynamic code generation.
It's much better to compile multiple versions of the code or multiple binaries.
Also, any OS kernel could do transparent emulation like that with no need for CPU assistance (beyond trapping on unsupported instructions, which all modern ISAs of course already do), so it's more of a Linux/Windows ABI issue. You can also write an LD_PRELOAD library that does the emulation.
Why though? We don't expect that of X86, ARM, or anything else. X86, ARM or the hundreds of other ISAs that have come and gone. We don't ask because it's impossible without putting a freeze on adding new instrucions. Once you add a new instruction there's no way for binaries with that instruction have that instruction run on an older processor. A compiler reads a value from memory. If that value doesn't map to an instruction it can follow, what do?
RISCV is actually better than most in this. First of all there is a core ISA that will run on everything and the extensions can be setup to be ignored when running on a system without them, defaulting to some core-ISA alternative.
The only alternative is to get the entire ISA spec done before letting anyone make anything... and that's just not happening.
One would simply specify that for any instruction used by a binary outside some core set, a fallback must be offered. For many instructions, the fallback might be just a few bytes of code.
The hardware would provide some assistance for said fallback mechanism - for example in the form of a pointer to a "fallback vector". If you execute unknown instruction 0x80 then code at the fallback pointer+0x80 will be executed instead.
The binary doesn't have to use this fallback mechanism - for performance critical binaries which want to run fast on different hardware implementations, the compiler could use feature detection to make multiple implementations of some functions, the way many SSE/AVX extensions are dealt with in the x86 world.
Binaries that aren't intended to run on bare metal, but instead run on some OS (eg. windows/linux) could instead specify as part of their ABI which instructions are part of the core, and the OS will ensure that anything unsupported by the hardware but required by the binary will be provided by the OS.
Almost everything here is outside of the ISA-spec's scope though. Mandating a fallback-vector be specified for each compilation and there be compatible alts available there seems like you're going beyond an ISA spec and into how programs and data are stored. Sure, RISCV could say "here's what you run instead of new instruction Y if Y ins't available but I'm not sure it's in there remit to ensure these alt instructions are always there.
Your comment about the compiler using feature suggestions has got nothing to do with RISCV, it's up to the compiler developers how to do that and if they want to. This is how it works right now. The compiler inspects the hardwarew and works accordingly.
RISCV dont get to decide how the OS works. It's up to the OS developers on how a binary is handled. They can't stop me creating an OS that doesn't filter unsupported instructions and therefore the guarantee cannot be kept.
You could have an interrupt style thing for unsupported instructions with suggestions on how software/the OS handle them, but whether or not they choose to handle them is not RISC-Vs problem.
> One would simply specify that for any instruction used by a binary outside some core set, a fallback must be offered.
That is exactly what RVA23 does. It specifies what must work. It doesn't specify the performance of any instruction -- that is between a hardware vendor and their customers as to which ones are fast.
> If you execute unknown instruction 0x80 then code at the fallback pointer+0x80 will be executed instead.
RISC-V instructions are 4 bytes long (i.e. 2^32 of them), not 1 byte. Using that simple technique, with just a pointer to the actual handler included in the table, the table will be 32 GB in size.
> I really wish RISC-V had been designed in such a way that every RISC-V core must run every RISC-V binary.
Obviously impossible, given that commercially-available RISC-V chips start from 2 KB RAM, 16 registers, and 48 MHz.
> wouldn't mean every core must implement every possible instruction - instead, unimplemented features would be emulated
And that is exactly what RVA23 is.
Every instruction must work. Nothing requires them to be fast. Missing instructions or capabilities (e.g. misaligned load/store) are handled in M-mode software, transparent to both the User program and the operating system.
> Then, the fragmentation problem turns from a "it won't work" problem into a "it might not be quite as fast" problem - and for general users, that effectively means everything is 100% compatible.
I don't understand why this is something people bring up all the time. Downstream users can build the binaries for their arch on their own. Unless you want to distribute programs as binaries? But then why wouldn't you and your users be happy targeting ARM instead of RISC-V then?
Yeah, the extensibility of RISC-V that allows implementations to scale up and down on features, including non-standard ones is really cool, but compared to other ISA I think the biggest improvement is it being Open and royalty free (except for trademark+compliance certifications)
I think you want that, but in a reserved instruction space with mandatory fallbacks.
It's how things work anyway, things that are on the edge of what can be done are good for running business and maybe not finalized enough to become standards.
It'd be cool for it to be copy-left, but at the same time I guess it could end up being too hard to get business running as the apparent risk of a competitor just copying your product would scare investors and that would end up in lower availability of actual hardware you can buy.
> However, that wouldn't mean every core must implement every possible instruction - instead, unimplemented features would be emulated.
For (fixed-length-)vector extensions in particular, I’m not sure that would work well. For nontrivially vectorizable things (i.e. you’re not just doing a bunch of math in parallel), the usual tradeoff is that you use a couple times more compute, but because the vector unit can provide a couple dozen or more times the compute per clock the result is faster despite the apparent waste. Emulating such code, on the other hand, will likely be miserable.
But 99% of code does not use vector instructions. That last 1% will either get slower (emulating trapped instructions), or have a 2nd non-vectorized implementation, with the binary containing both.
Firstly, that's a high ask even for operating system software today. Something built for Mac 9 today is not expected to run on Mac OS 15. Demanding that from a cpu is quite the extreme.
Secondly, that's like asking cpu to do the job of a compiler. The compiler has one advantage over a cpu: it runs in a completely situation where memory and time do not matter. The cpu doesn't have the luxury of trading off memory or time.
By all means, feel free to design your own ISA to fit every use case from the get-go, publish it in its final state, port toolchains and relevant software, and convince the world to use it.
RISC-V chose a different approach, because this one, quite frankly, would not have worked.
Profiles are rolled up collections of RISC-V extensions[1]. RVAxx defines a set of standard extensions you can expect to be present in "application processors" (which is fairly loosely defined but is basically non-embedded stuff that you might run a full Linux on.)
Oh nice, recently I had some question about RISC-V extensions and iirc I found your article and it was the one that explained it the best. Thanks for the write up! You helped a minimum of one Person enjoying their journey through the ecosystem :-)
I believe I was reworking the instruction decoder on my own core and the specification sheet left some open questions
Thanks, it was a lot of fun writing it too. I learned a lot of unexpected things about extensions, including that at the instruction encoding level they are not as distinct as I expected, and that there are just so many of them now.
There's been a lot of debate in RVI about changing the naming scheme, because it's perceived (wrongly, I think) that putting a year in the name makes the profile seem obsolete already. Unfortunately the alternative suggestions so far have not been very well received.
A) Numbers in instruction set extensions tend to indicate the bit width, thus my confusion.
B) If you do later want to introduce wider instructions, its going to be confusing.
C) Software moved away from whole integer numbering for a reason. is RVA25 a completely new instruction set, a bug fix, or a superset of RVA23? RVA 1.1 or RVA 2.0 gives you more of a clue as to what youre dealing with.
D) Come 2100 and RVA00 we are going to have numerous issues with software checking that 'RVA >= 23'. I would like this one to be humourous, but unfortunately, experience shows it probably wont be.
A,B) yeah thats actually unfortunate with 23, but shouldn't be a problem in the future.
C,D) The current naming scheme is RVAxx where xx is increased for every major profile update, one that adds new mandatory extensions. Minor releases are RVAxx.y (iirc) which don't change mandated features but may allow more optional extension. The profiles are supposed to be backwards compatible, and will have a slow release cadence. The increment isn't fixed, but it's still unlikely we'll run out of two digit names. Regardless, if we ever are at RVA70, it's trivial to preserve ordering by going to something like RVA710 next.
RISC-V moving forwards. Transition will be hard, and will be costly: large RISC-V implementations on latest silicon process (for server/mobile/desktop).
I wish it to be successful. Worst case scenario, RISC-V is a nice byte code, much less toxic than any higher level computer language out there.
They have to be very careful about what they put in RVxx profiles: advanced application developers know already they will have to query the hardware in order to install proper (this must not be part of any file format to avoid toxic complexity: it must stay in full control of the application) machine code.
Currently I do personally code my "core" RISC-V own little applications, which I do interpret on x86_64/linux and using another executable file format than elf (of my own) but with transparent binary compatibility (aka no need to patch the kernel).
And guess what, to be not dependent on those horrible compilers like gcc/clang or ultra complex syntax languages (all full of planned obsolesence even on the medium run), or those completely PI locked hardware ISAs... is just a breeze of fresh air.
Ofc, this is a compromise as it is impossible to run a reasonable (interpretation may vary a LOT here) linux desktop without some of the worst software or dependency out there... but this is moving forwards, and it feels good.
> it was not designed for efficient execution by a software interpreter.
It's not bad at all for that. On my i9-13900HX laptop, if I take http://hoult.org/primes.txt and compile it for x86 and for RISC-V:
- 5.6 seconds for riscv64-linux-gnu-gcc -O primes.c -o prime run in QEMU [1]
- 3.8 seconds for gcc primes.c -o primes run natively
- 2.0 seconds for gcc -O primes.c -o primes run natively
Emulating RISC-V is worse than forgetting to compile with `-O`, but not much. It's far faster than Python or Ruby and comparable to Java, C#, JavaScript, or WebASM.
Also, QEMU is not the fastest emulator around, just the most flexible and complete. The experimental RV8 and the nearly-ready-for-prime-time RVVM [2] are much closer to native speeds.
Note that if I do exactly the same thing for arm64 (i.e. docker/QEMU) the primes program takes 14.4 seconds and the whole Ubuntu in docker experience just feels a lot more laggy. Running the x86_864 binary in qemu-x86_64 instead of natively takes 10.5 seconds.
RISC-V is a lot easier to emulate quickly than Aarch64 or x86.
[1] or if you have Docker Desktop (on Mac, Windows, or Linux) then you can just work like a native:
bruce@i9:~$ docker run --platform linux/riscv64 -it riscv64/ubuntu
Unable to find image 'riscv64/ubuntu:latest' locally
latest: Pulling from riscv64/ubuntu
53300d777b1a: Pull complete
Digest: sha256:6a392b2c64f4e0021bfcff62e58975ddce0f1eccc5a72708b114aeb50999ff22
Status: Downloaded newer image for riscv64/ubuntu:latest
root@5f2edc942403:/# apt update
:
root@5f2edc942403:/# apt install wget gcc
:
root@5f2edc942403:/# wget -q http://hoult.org/primes.txt
root@5f2edc942403:/# mv primes.txt primes.c
root@5f2edc942403:/# gcc -O primes.c -o primes
root@5f2edc942403:/# ./primes
Starting run
3713160 primes found in 5609 ms
216 bytes of code in countPrimes()
root@5f2edc942403:/#
I would agree that bytecode is different from machine code, but "efficient execution by a software interpreter" seems like a bad delineation considering that some virtual machines treat bytecode as simply an intermediate representation for JIT. WASM bytecode is very similar to machine code and meant to be translated to machine code rather than interpreted.
I would define bytecode as:
- Closer to machine operations than something like an AST, which is where I think the "efficient execution" part of the wikipedia definition came from
- Not directly executable on any physical CPU, or at least not typically. There are some CPUs that can execute Java/Python bytecode (sort of) but they're in the minority.
While not strictly required, bytecode tends to have opcodes meant to be used with a virtual machine, such as dealing with objects instead of linear memory.
This is all to say, the proper terminology for RISC-V is an ISA, but the distinction is more about how it's being used than how it's designed.
I don't have an army of AI bots to support obviously incoherent and significant experience lacking many posts (i.e. missing the point from light years away).
RISC-V has taken an approach where they have a relatively small core instruction set[1], and then a relatively large set of instruction set extensions.
Think of it like SSE and AVX on the x86 platform, turned up to 11.
This makes it more attractive to chip designers, as they can pick and chose what to implement, making fairly specialized chips at potentially reduced cost.
However this also makes it more difficult to target for programmers, as technically a RISC-V chip can have any combination of these extensions.
To make things a bit more predictable they've come up with these profiles, which is just the core plus a defined set of extensions.
So if you find a chip that says it follows the announced RVA23 profile, you know for example the vector instructions must be available.
[1]: https://riscv.org/technical/specifications/