ARM: Division by 10
Fast division by 10 on the ARM can be done with this formula: R0 = (R0 * 3277) / 32768. It works for numbers in the range 0 - 16 388 and on certain nice numbers like 100 000. The timing is 4 or 5 cycles depending if the right shift by 15 can be embedded in the following code or not. This code works on all ARM models. Inserting sub r0,r0,r0,lsr #14 at the top of the code costs one extra cycle and extends the range to 0 - 81 919.
add r1,r0,r0,lsl #1 add r0,r0,r1,lsl #2 add r0,r0,r1,lsl #6 add r0,r0,r1,lsl #10 mov r0,r0,lsr #15
This formula works on all 32 bit numbers: R0 = ((R0 - (R0 >> 30)) * 429496730) >> 32. It works on ARM chips with a full 32 bit multiplier capable of generating a 64 bit result. R2 is discarded and the result is in R0. The timing is 4 to 10 cycles depending on the value of R0 and if the constant can be loaded outside the loop.
ldr r1,=429496730 sub r0,r0,r0,lsr #30 umull r2,r0,r1,r0