Fixed-point

From ScienceZero
Jump to: navigation, search

A fixed-point number is an approximation of a real number. It has higher accuracy than floating-point numbers of the same size but limited dynamic range. They are particularly useful in microcontrollers and CPUs that don't have dedicated floating-point units.


Example

We divide up a byte into two fields, two bits for the whole part of the number and six bits for the fractional part.

nn.ffffff

The value of the fractional part can be calculated by

(1/(2^number of bits)) * ffffff2

The value 1 has the bit pattern

01000000 (64)

To show an example of how the fractional part works, we will perform an example mathematical function upon the number 1.0 stated above.
If we divide by 2 by using a right shift we get

00100000 (32)

The value we have now is 0 + (1/(2^6)) * 32 = 0.5

Another example:
If we multiply it by itself we get

 00100000 *  00100000 = 0000010000000000 (8 by 8 bit multiplication gives a 16 bit result)
       32 *        32 =             1024

multiplication of two 2.6 format numbers results in a 4.12 format result (each field will double in size after the multiplication). So to convert to our original 2.6 format we have to chop off the least significant 6 bits and the most significant 2 bits. Remember that the main objective is not to get back to our 2.6 format but to keep as much resolution as required for the task so a 4.4 format might be approprite in some cases. Some of the art of fixed-point arithmetic is to choose the correct format at every given time.

 0000010000000000
 xx00010000xxxxxx
=
 00010000 (16)

The value we have now is 0 + (1/(2^6)) * 16 = 0.25


It is also possible to scale the numbers before the multiplication to speed up the multiplication process at the cost of resolution. Addition and subtraction is straight forward and requires no scaling.

By ordering the operations carefully and scale the numbers to keep maximum resolution without overflow it is possible to make efficient code that completely avoids using floating-point operations.

The location of the "decimal point" - rather a binary point - is often specified by a letter Q when working algorithms out and documenting them. For instance, Q15 designates a number with 15 bits for the fractional part. Hence, the scaling processes are sometimes referred to as "Q Math".

Q math can be very economical of processing power, but DSP routines using Q math are harder to write and debug than floating-point. In these days of cheap silicon and high programmer salaries, hardware floating-point often makes more sense for smaller commercial projects.

External links