Perhaps I'm jumping ahead of the series here, but one of the unique (I think?) features of ARM is that all the various conditionals listed (NE, EQ, LT etc) can be applied to just about any instruction, and not just a branch. So, you could have an instruction sequence like:
SUBS R1,R2,R3
ADDEQ R4,R4,#1
whereas in other instruction sets you'd have to branch over the 'ADD' instruction if the previous comparison was non-zero.
(Not sure if this applies to ARM thumb style instructions or not; my ARM experience is very out of date!)
Sort of. Thumb requires that you use an 'it' instruction before a block of up to 4 conditional instructions. Think "if-then". You can make each instruction a 'then' or an 'else' by instruction variants. E.g. something like
One thing I remember when RiscOS was still a bit of a thing was that a lot of its users praised ARM assembly. That, plus the short burst of 68k ASM when the Palm Pilot came out, was probably the last "big" hobbyist surge of assembly that I can think of right now.
I used to write games on RiscOS in ARM assembly. Sooo much nicer than any other the others I'd used before or since. One thing I vaguely remember (my memory is pretty poor) was being able to load 64 bytes worth of data into the registers with one instruction (and store them with another). This meant writing sprite blit routes was a joy and super fast. The conditional instructions also were lovely.
A few years later when I was writing R3000 assembly on the PS1 I remember thinking how poor it felt compared to writing ARM assembly (the R3000 was the only other RISC assembly language I'd used other than ARM at that point).
It was the only assembly language I genuinely enjoyed writing code in rather than what usually felt like a fight with most other assemblers. I give Sophie Wilson the credit for that, and probably the credit for my entire career, because of BBC BASIC on the BBC Micro before that!
> As you can see here the SIMD (NEON) extension adds 16 – 128 bit registers (q0-q15) onto the floating point registers. So if you reference Q0 it is the same as referencing D0-D1 or S0-S1-S2-S3.
Is this how SIMD is usually implemented in hardware? Have a set of circuitry that pushes one command to multiple registers?
SIMD does stand for Single Instruction Multiple Data [0]. So, the hardware performs some linear algebra operation on one or more vectors. Whether the vector is stored in unique registers, combined registers (as described above), or memory buffer(s), that is up to the hardware designer. In fact, on x86, the MMX registers were aliases to the normal floating point registers [1], and whereas on SSE, new registers were added [2].
> Have a set of circuitry that pushes one command to multiple registers?
A command is not "pushed" to a register; CPU registers are mostly passive, often on a very small and very fast block of memory called the "register file". Instead, values are read from the register file, processed, and written back to the register file (simplifying a bit).
How it's implemented is up to the hardware designer. For instance, adding a pair of 64-bit values from q0 to a pair of 64-bit values in q1 could be implemented as reading first d0 and d2 and adding them, followed by reading d1 and d3 and adding them (with pipelining, the second add can start while the first add is still executing). Or it could be implemented as reading d0 and d1 in parallel, reading d2 and d3 in parallel, and adding both pairs in parallel (using a pair of adders). Or it could be something even more complex.
One instruction is not being executed against multiple registers, it's that the same 128bit register range can used as q, d, or s registers depending on the which instructions you're executing
https://www.cl.cam.ac.uk/projects/raspberrypi/tutorials/os/
is a better intro to arm asm.