Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Understanding ARM Assembly, Part 1 (msdn.com)
90 points by ingve on Aug 11, 2015 | hide | past | favorite | 16 comments


I think this, while Raspberry Pi specific:

https://www.cl.cam.ac.uk/projects/raspberrypi/tutorials/os/

is a better intro to arm asm.


Perhaps I'm jumping ahead of the series here, but one of the unique (I think?) features of ARM is that all the various conditionals listed (NE, EQ, LT etc) can be applied to just about any instruction, and not just a branch. So, you could have an instruction sequence like:

  SUBS R1,R2,R3
  ADDEQ R4,R4,#1
whereas in other instruction sets you'd have to branch over the 'ADD' instruction if the previous comparison was non-zero.

(Not sure if this applies to ARM thumb style instructions or not; my ARM experience is very out of date!)


For ARM thumb2 style you need a third instruction between the two:

  IT EQ
The 'IT' is for 'if then'. There could also be more 'then's and 'else's for the conditional execution of muliple instructions:

  ITTE LT
(If then then else) The following instructions could for example be:

  ADDEQ ...
  STREQ ...
  BNE ... (Branch - not equal because of the else)

Source: http://community.arm.com/groups/processors/blog/2010/09/30/c...


Sort of. Thumb requires that you use an 'it' instruction before a block of up to 4 conditional instructions. Think "if-then". You can make each instruction a 'then' or an 'else' by instruction variants. E.g. something like

  ittee eq
  addeq ...
  addeq ...
  addne ...
  addne ...
I believe ARM v8 dropped all this.


Thumb and Arm64 dropped support for conditional execution.


You might be interested in Itanium and the IA-64 architecture, which has what is referred to as branch predication:

http://www.cs.umd.edu/class/fall2001/cmsc411/projects/IA64-2...

http://www.drdobbs.com/embedded-systems/predication-speculat...

Predication allows all of the code to be executed, and a subset of the results of the code — those from the failing test — discarded.


I believe this link came from today's Raymond Chen's blog post: http://blogs.msdn.com/b/oldnewthing/archive/2015/08/11/10634...

He has just finished a 10-part series on IA-64, and probably posted that to provide some contrast.


This series ends abruptly at Part 2. The author said there would be a Part 3, but I couldn't find anything.

In any case, if you want another overview article on ARM Assembly, try this: http://www.coranac.com/tonc/text/asm.htm

If you want a tutorial: http://thinkingeek.com/2013/01/09/arm-assembler-raspberry-pi...

And finally, this is a pretty good book if you've never looked at Assembly before and you've got a RPi or RPi emulator: http://www.amazon.com/Raspberry-Assembly-Language-RASPBIAN-B...



One thing I remember when RiscOS was still a bit of a thing was that a lot of its users praised ARM assembly. That, plus the short burst of 68k ASM when the Palm Pilot came out, was probably the last "big" hobbyist surge of assembly that I can think of right now.


I used to write games on RiscOS in ARM assembly. Sooo much nicer than any other the others I'd used before or since. One thing I vaguely remember (my memory is pretty poor) was being able to load 64 bytes worth of data into the registers with one instruction (and store them with another). This meant writing sprite blit routes was a joy and super fast. The conditional instructions also were lovely.

A few years later when I was writing R3000 assembly on the PS1 I remember thinking how poor it felt compared to writing ARM assembly (the R3000 was the only other RISC assembly language I'd used other than ARM at that point).

It was the only assembly language I genuinely enjoyed writing code in rather than what usually felt like a fight with most other assemblers. I give Sophie Wilson the credit for that, and probably the credit for my entire career, because of BBC BASIC on the BBC Micro before that!


> As you can see here the SIMD (NEON) extension adds 16 – 128 bit registers (q0-q15) onto the floating point registers. So if you reference Q0 it is the same as referencing D0-D1 or S0-S1-S2-S3.

Is this how SIMD is usually implemented in hardware? Have a set of circuitry that pushes one command to multiple registers?


SIMD does stand for Single Instruction Multiple Data [0]. So, the hardware performs some linear algebra operation on one or more vectors. Whether the vector is stored in unique registers, combined registers (as described above), or memory buffer(s), that is up to the hardware designer. In fact, on x86, the MMX registers were aliases to the normal floating point registers [1], and whereas on SSE, new registers were added [2].

[0] https://en.wikipedia.org/wiki/SIMD

[1] https://en.wikipedia.org/wiki/MMX_(instruction_set)#Technica...

[2] https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions#Regi...


> Have a set of circuitry that pushes one command to multiple registers?

A command is not "pushed" to a register; CPU registers are mostly passive, often on a very small and very fast block of memory called the "register file". Instead, values are read from the register file, processed, and written back to the register file (simplifying a bit).

How it's implemented is up to the hardware designer. For instance, adding a pair of 64-bit values from q0 to a pair of 64-bit values in q1 could be implemented as reading first d0 and d2 and adding them, followed by reading d1 and d3 and adding them (with pipelining, the second add can start while the first add is still executing). Or it could be implemented as reading d0 and d1 in parallel, reading d2 and d3 in parallel, and adding both pairs in parallel (using a pair of adders). Or it could be something even more complex.


One instruction is not being executed against multiple registers, it's that the same 128bit register range can used as q, d, or s registers depending on the which instructions you're executing


If you're interested in picking apart a small self-contained project, I wrote a simple IRC bot in ARM for GNU/Linux a while ago:

https://github.com/wyc/armbot

There are instructions to run/debug it on an x86_64 machine via qemu.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: