Floating-point
From ScienceZero
Almost all modern computers approximates real numbers by using floating point arithmetic as defined in the IEEE 754 standard.
Single Precision
The IEEE 754 single precision number requires 32 bits of storage.
0 1 8 9 31 S EEEEEEEE FFFFFFFFFFFFFFFFFFFFFFF
- S - Sign bit
- E - Exponent
- F - Fraction
The value of the 32 bit word:
- If E = 255 and F is nonzero, then V = NaN ("Not a number")
- If E = 255 and F is zero and S is 1, then V = -Infinity
- If E = 255 and F is zero and S is 0, then V = Infinity
- If 0<E<255 then V = (-1)**S * 2 ** (E-127) * (1.F) where "1.F" represents the binary number created by prefixing F with an implicit leading 1 and a binary point.
- If E = 0 and F is nonzero, then V = (-1)**S * 2 ** (-126) * (0.F) These are "unnormalized" values.
- If E = 0 and F is zero and S is 1, then V = -0
- If E = 0 and F is zero and S is 0, then V = 0
Double Precision
The IEEE 754 single precision number requires 64 bits of storage.
0 1 11 12 63 S EEEEEEEEEEE FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
- S - Sign bit
- E - Exponent
- F - Fraction
The value of the 64 bit word:
- If E = 2047 and F is nonzero, then V = NaN ("Not a number")
- If E = 2047 and F is zero and S is 1, then V = -Infinity
- If E = 2047 and F is zero and S is 0, then V = Infinity
- If 0<E<2047 then V = (-1)**S * 2 ** (E-1023) * (1.F) where "1.F" represents the binary number created by prefixing F with an implicit leading 1 and a binary point.
- If E = 0 and F is nonzero, then V=(-1)**S * 2 ** (-1022) * (0.F) These are "unnormalized" values.
- If E = 0 and F is zero and S is 1, then V = -0
- If E = 0 and F is zero and S is 0, then V = 0