ARM: Cortex-M3 Thumb-2 instruction set
The instruction set of the ARM Cortex-M3 CPU used in the STM32 Microcontroller
Contents
- 1 Hardware registers
- 2 Register names
- 3 Immediate constants
- 4 Parameters
- 5 Optional parameters
- 6 Condition flags
- 7 Shifts
- 8 Thumb-2 instruction set
- 9 The stack
- 10 C language calling convention
- 11 Thumb-2 variable instruction length
- 12 How to enumerate the legal immediate constants for <Operand2>
- 13 Example code
Hardware registers
- R0-R12 General purpose registers
- R13 Used as stack pointer, is also called SP (can be used as a general purpose register with some restrictions)
- R14 Used as link register to keep the return address for fast function calls, also called LR (can be used as a general purpose register)
- R15 This is the program counter, also called PC
Register names
- Rd Destination register
- Rn First operand register (the operation is performed on this register using the second operand, so Rd = Rn - Rm)
- Rm Second operand register
- SP Stack pointer (R13)
- LR Link register (R14)
- PC Program counter (R15)
- <reglist> means a list of registers like {R0, R3, R7-R10} (R7-R10 is the range R7, R8, R9, R10)
Immediate constants
- imm<n> means a constant of n bits (a value that is fixed as assemble time and can not be changed during execution)
- # tells the assembler that the following is an immediate constant
Parameters
- <x> means always x
- <x|y> means either x or y
Optional parameters
- {x} means x or nothing
- {x|y} means either x or y or nothing
Condition flags
- Some instructions will update the condition flags if <S> (set condition flags) is added to the instruction name
- N Negative Bit 31 of the result
- Z Zero 1 if all bits of the result are 0
- C Carry Carry from the ALU adder, otherwise from the last bit shifted out of the barrel shifter
- V Overflow Overflow from the ALU adder, 0x7fffffff + 0x7fffffff are two positive numbers that gives a negative result and sets the overflow flag
<cond> Flag state Integer ALU / Shifter Vector Floating Point coprocessor ----------------------------------------------------------------------------------------------------- EQ Z = 1 Equal (to zero) Equal NE Z = 0 Not equal Not equal, or unordered CS / HS C = 1 Carry Set / Unsigned higher or same Greater than or equal, or unordered CC / LO C = 0 Carry Clear / Unsigned lower Less than MI N = 1 Negative Less than PL N = 0 Positive Greater than or equal, or unordered VS V = 1 Overflow Unordered (at least one NaN operand) VC V = 0 No overflow Not unordered HI C = 1 and Z = 0 Unsigned higher Greater than, or unordered LS C = 0 or Z = 1 Unsigned lower or same Less than or equal GE N = V Signed greater than or equal Greater than or equal LT N <> V Signed less than Less than, or unordered GT Z = 0 and N = V Signed greater than Greater than LE Z = 1 or N <> V Signed less than or equal Less than or equal, or unordered AL Any Always (normally omitted) Always (normally omitted) * If a two character condition code is added to the end of the instruction name, the assembler will generate the correct IT (If-Then) instructions E.g. ADDEQ r0,R0,#1 (execute the instruction if the zero flag is set) will be converted by the assembler to IT EQ ADD r0,R0,#1 <Operand2> may be one of the following: #imm8<<imm5 One byte shifted left by a constant to form a 32 bit value #(imm8 imm8 imm8 imm8) The same byte copied 4 times to create a 32 bit value #( 0 imm8 0 imm8) Same but two bytes are set to zero #(imm8 0 imm8 0) Same with the other two bytes set to zero Rm Normal register operation Rm, <LSL|LSR|ASR|ROR> #imm5 Register operation with constant shift Rm, RRX Register operation with rotate right with extend <S> Update the condition flags after the instruction has executed If using this together with condition codes, it is in form of: ADDSEQ R0,R1,R2
Shifts
LSL Logical shift left 0xFFFFFF00 LSL #4 = 0xFFFFF000 (shifts in zero at the bottom) LSR Logical shift right 0xFFFFFF00 LSR #4 = 0x0FFFFFF0 (shifts in zero at the top) ASR Arithmetic shift right 0xFFFFFF00 ASR #4 = 0xFFFFFFF0 (shifts in the original bit 31 at the top) ROR Rotate right 0x12345678 ROR #4 = 0x81234567 RRX Rotate right with extend Rotates the operand one bit to the right through the carry as a 33 bit value, Carry -> operand -> Carry
Thumb-2 instruction set
MOV{S} Rd, <Operand2> Move Rd = Operand2 MVN{S} Rd, <Operand2> Move not Rd = 0xFFFFFFFF EOR Operand2 MOV Rd, #<imm16> Move wide Rd = imm16 MOVT Rd, #<imm16> Move top Rd[31:16] = imm16, the constant is put in the upper 16 bits of Rd, the lower 16 bits are unaffected ADD{S} Rd, Rn, <Operand2> Add Rd = Rn + Operand2 ADD Rd, Rn, #<imm12> Add wide Rd = Rn + Imm12 ADC{S} Rd, Rn, <Operand2> Add with carry Rd = Rn + Operand2 + Carry SUB{S} Rd, Rn, <Operand2> Subtract Rd = Rn - <Operand 2> SBC{S} Rd, Rn, <Operand2> Subtract with carry Rd = Rn – Operand2 - (1 - Carry) SUB Rd, Rn, #<imm12> Subtract wide Rd = Rn - imm12 RSB{S} Rd, Rn, <Operand2> Reverse subtract Rd = <Operand 2> - Rn RSC{S} Rd, Rn, <Operand2> Reverse subtract with carry Rd = Operand2 – Rn – (1 - Carry) MUL{S} Rd, Rm, Rs Multiply Rd = Rn * Rm Return 32 least significant bit MLA Rd, Rm, Rs, Rn Multiply and accumulate Rd = (Rn + (Rm * Rs)) Return 32 least significant bit MLS Rd, Rm, Rs, Rn Multiply and subtract Rd = (Rn - (Rm * Rs)) Return 32 least significant bit UMULL RdLo, RdHi, Rm, Rs Multiply unsigned long, 64 bit result UMLAL RdLo, RdHi, Rm, Rs Multiply unsigned accumulate long, 64 bit result SDIV Rd, Rn, Rm Signed division 0x80000000 / 0xFFFFFFFF = 0x80000000, Rn / 0 = 0 UDIV Rd, Rn, Rm Unsigned division Rn / 0 = 0 ASR{S} Rd, Rm, <Rs|#imm5> Arithmetic shift right, canonical form of MOV{S} Rd, Rm, ASR <Rs|#imm5> LSL{S} Rd, Rm, <Rs|#imm5> Logical shift left LSR{S} Rd, Rm, <Rs|#imm5> Logical shift right ROR{S} Rd, Rm, <Rs|#imm5> Rotate right RRX{S} Rd, Rm Rotate right with extent, uses Carry as a 33rd bit CLZ Rd, Rm Count leading zeros RBIT Rd, Rm Reverse bits in register, so bit 0 becomes bit 31 REV Rd, Rm Byte-Reverse Word, reverses the byte order in a 32-bit register REV16 Rd, Rm Byte-Reverse Packed Halfword, reverses the byte order in each 16-bit halfword of a 32-bit register REVSH Rd, Rm Byte-Reverse Signed Halfword, reverses the byte order in the lower 16-bit of a 32-bit register, and sign extends to 32 bit UXTB Rd, Rm{, <ROR #><0|8|16|24>} Unsigned Extend Byte, extracts an 8-bit value from a register, zero extends it to 32 bit. UXTH Rd, Rm{, <ROR #><0|8|16|24>} Unsigned Extend Halfword, extracts a 16-bit value from a register, zero extends it to 32 bit CMP Rn, <Operand2> Does the same as SUBS Rd, Rn, <Operand2> but the result is not written to Rd, only the condition flags are updated CMN Rn, <Operand2> Rn + <Operand2> TST Rn, <Operand2> Rn AND <Operand2> TEQ Rn, <Operand2> Rn EOR <Operand2> AND{S} Rd, Rn, <Operand2> Bitwise AND, Rd = Rn AND <Operand2> ORR{S} Rd, Rn, <Operand2> Bitwise OR, Rd = Rn OR <Operand2> EOR{S} Rd, Rn, <Operand2> Bitwise Exclusive-OR. Rd = Rn EOR <Operand2> ORN{S} Rd, Rn, <Operand2> Or not, Rd = Rn OR NOT <Operand2> BIC{S} Rd, Rn, <Operand2> Bit clear, Rd = Rn AND NOT <Operand2> BFC Rd, #<lsb>, #<width> Bit field clear BFI Rd, Rn, #<lsb>, #<width> Bit field insert SBFX Rd, Rn, #<lsb>, #<width> Signed bit field extract UBFX Rd, Rn, #<lsb>, #<width> Unsigned bit field extract <Address> can be one of the following Example Action [Rn {, #<-imm8|+imm12>}] LDR R0, [R1, #8] R0 = [R1 + 8] [Rn {, #<+-imm8>}]! LDR R0, [R1, #8]! R1 = R1 + 8, R0 = [R1] [Rn], #<+-imm8> LDR R0, [R1], #4 R0 = [R1], R1 = R1 + 4 [Rn, Rm {, <LSL #0-3>}] STR R0, [R1, R2, LSL #2] R0 = [R1 + (R2 * 4)] <label> LDR Rd, <Address> Load 32 bit word from memory LDRH Rd, <Address> Load 16 bit half-word from memory LDRSH Rd, <Address> Load signed 16 bit half-word from memory LDRB Rd, <Address> Load 8 bit byte from memory LDRSB Rd, <Address> Load signed 8 bit byte from memory STR Rd, <Address> Store 32 bit word to memory STRH Rd, <Address> STRB Rd, <Address> <AddressDual> can be one of the following [<Rn>{, #+/-<imm8>}] [<Rn>], #+/-<imm8> [<Rn>, #+/-<imm8>]! LDRD<c>
The stack
A stack is a last in first out data structure, it is used to store temporary variables and data. It grows from high to low memory address, SP (R13) points to the last piece of data written. A set of registers will be transferred with the lowest numbered registers at the lowest addresses. Use the PUSH and POP instructions to transfer any set of registers containing R0-R12, LR and PC.
If SP contains 0x8000 and we execute the instruction PUSH {R0,R1,R7} the result will be
0x8000 .. <- Original address in SP 0x7ffc R7 0x7ff8 R1 0x7ff4 R0 <- SP points here now
If we now execute POP {R10-R12}
0x8000 .. <- SP points here now 0x7ffc R7 -> R12 0x7ff8 R1 -> R11 0x7ff4 R0 -> R10 <- Original address in SP
C language calling convention
Parameters are passed and returned in R0-R3 A double-word sized type is passed in two consecutive registers. A 128-bit containerized vector is passed in four consecutive registers. The content of the registers is as if the value had been loaded from memory with a single LDM instruction A subroutine must preserve the contents of the registers r4-r8, r10, r11 and SP (and r9 in PCS variants that designate r9 as v6). Return by doing BX LR
Thumb-2 variable instruction length
It is important to have at least half the instructions encoded as 16 bit to get maximum performance from flash memory. IT instructions can also be paired for free with 16 bit instructions.
The general rules for generating the 16 bit form of the instructions
- Use registers in the range R0-R7
- Set the condition flags unless the instruction is conditional wherever possible
- Use immediate constants in the range 0-7 or 0-255
Instructions encoded in 16 bit when using registers R0-R7
ADR Rd, <label> (range 0 to 1020) <ADDS|SUBS|MOVS> Rd, #imm8 <ADDS|SUBS> Rd, Rn, #imm3 <ADDS|SUBS> Rd, Rn, Rm <ADCS|ANDS|EORS|BICS|MVNS|ORRS|SBCS|UXTB|UXTH|MULS> Rd, Rm (MULS may be slower than MUL on some CPUs) RSBS Rd, Rn, #0 <REV|REV16|REVSH> Rd, Rm <ASRS|LSRS|LSLS> Rn, Rm, #imm5 CMP Rn, #imm8 <CMP|CMN|TST> Rn, Rm (Rm can be any register for CMP) <LDM|STM> Rn!, <registers> <LDR|STR>{H|B} Rt, [Rn{, #imm5}] <LDR|STR>{H|B} Rt, [Rn, Rm ] LDRS<H|B> Rt, [Rn, Rm] LDR Rt, <label> (0-1020) <PUSH|POP> <registers> IT{x{y{z}}} <cond> CB{N}Z Rn, <label> (range 0 to 126) B<cond> <label> (range -256 to 254) B <label> (range -2048 to 2046)
Instructions encoded in 16 bit using registers R0-R15
MOV Rd, Rm ADD Rd, Rm BLX Rm BX Rm
How to enumerate the legal immediate constants for <Operand2>
'abcdnnnnnnnn' is a 12 bit bitfield to be expanded
ThumbExpandImm()
if 'ab' = '00' case 'cd' when '00' imm32 = 'nnnnnnnn' ( Always encode 0 like this ) when '01' imm32 = '00000000 nnnnnnnn 00000000 nnnnnnnn' when '10' imm32 = 'nnnnnnnn 00000000 nnnnnnnn 00000000' when '11' imm32 = 'nnnnnnnn nnnnnnnn nnnnnnnn nnnnnnnn' else imm32 = ROR('1nnnnnnn', 'abcdn')
Example code
Condition flags
It is important to make full use of the condition flags to write efficient code. This code will set R0 to 0 or -1 depending on if R1 + R2 is 0 or not. ADD R0,R1,R2 CMP R0,#0 BEQ zero MOV R0,#-1 zero ...
The optimised code using the condition flags becomes easier to read, more compact and faster. ADDS R0,R1,R2 MOVNE R0,#-1 ...
A branch is better if more than a few lines of code is to be skipped. ADDS R0,R1,R2 BEQ zero ... Block of code to be skipped ... zero ...
If-then
The IT instruction will make 1 to 4 following instructions conditional. The letter T specifies <cond> and E specifies inverse of <cond>. The first letter of the pattern is always T so the first conditional instruction will always have the condition <cond>. IT EQ Read this as If EQual Then ADD R0,R0,#1 ADD R0,R0,#1 <- This will only be executed if the Z condition flag is 1
ITE EQ Read this as If EQual Then ADD R0,R0,#1 Else ADD R1,R1,#1 ADD R0,R0,#1 <- This will only be executed if the Z condition flag is 1 ADD R1,R1,#1 <- This will only be executed if the Z condition flag is 0
It is easier to let the assembler generate IT instructions automatically, just append the condition to the end of the instruction name. The assembler will enforce this form for the code affected by the IT instruction anyway. ADDEQ R0,R0,#1 ADDNE R1,R1,#1
Table branch
The table branch byte instruction loads a byte from (Rn + Rm) and adds twice its value to the program counter. TBB [PC,R0] table dcb (case0 - table) >> 1 We divide by 2 here because the instruction will multiply by 2 dcb (case1 - table) >> 1 dcb (case2 - table) >> 1 align Align here because instructions must start at an even address case0 nop If R0 = 0 we arrive here case1 nop If R0 = 1 we arrive here case2 nop If R0 = 2 we arrive here
Finding the span of the leftmost and rightmost ones
CLZ R1,R0 R1 now contains the number of zeros to the left of the leftmost 1 in R0 RBIT R0,R0 R0 is now mirrored CLZ R0,R0 R0 now contains the number of zeros to the right of the rightmost 1 in the original value ADD R0,R1 R0 now contains the number of bits that are not part of the span RSB R0,R0,#32 R0 now contains the span (R0 = 32 - R0)