ARM: Division by 10

From ScienceZero
Jump to: navigation, search

Fast division by 10 on the ARM can be done with this formula: R0 = (R0 * 3277) / 32768. It works for numbers in the range 0 - 16 388 and on certain nice numbers like 100 000. The timing is 4 or 5 cycles depending if the right shift by 15 can be embedded in the following code or not. This code works on all ARM models. Inserting sub r0,r0,r0,lsr #14 at the top of the code costs one extra cycle and extends the range to 0 - 81 919.

add 	r1,r0,r0,lsl #1
add 	r0,r0,r1,lsl #2
add 	r0,r0,r1,lsl #6
add 	r0,r0,r1,lsl #10
mov	r0,r0,lsr #15


This code runs in 9 cycles and works on all 32 bit numbers. The middle block of 5 instructions is never wrong by more than 1 but it takes an additional 4 cycles to correct it.

sub   r1,r0,#10

sub   r0,r0,r0,lsr #2
add   r0,r0,r0,lsr #4
add   r0,r0,r0,lsr #8
add   r0,r0,r0,lsr #16
mov   r0,r0,lsr #3

add   r2,r0,r0,lsl #2
subs  r1,r1,r2,lsl #1
addpl r0,r0,#1


This formula works on all 32 bit numbers: R0 = ((R0 - (R0 >> 30)) * 429496730) >> 32. It is very efficient on ARM chips with a full 32 bit multiplier capable of generating a 64 bit result. R2 is discarded and the result is in R0. The timing on ARM7 is 4 to 10 cycles depending on the value of R0 and if the constant can be loaded outside the loop.

ldr	r1,=429496730
sub 	r0,r0,r0,lsr #30
umull	r2,r0,r1,r0