Floating-point
From ScienceZero
Almost all modern computers approximates real numbers by using floating point arithmetic as defined in the IEEE 754 standard.
[edit] Single Precision
The IEEE 754 single precision number requires 32 bits of storage.
0 1 8 9 31 S EEEEEEEE FFFFFFFFFFFFFFFFFFFFFFF
- S - Sign bit
- E - Exponent
- F - Fraction
The value of the 32 bit word:
- If E = 255 and F is nonzero, then V = NaN ("Not a number")
- If E = 255 and F is zero and S is 1, then V = -Infinity
- If E = 255 and F is zero and S is 0, then V = Infinity
- If 0<E<255 then V = (-1)**S * 2 ** (E-127) * (1.F) where "1.F" represents the binary number created by prefixing F with an implicit leading 1 and a binary point.
- If E = 0 and F is nonzero, then V = (-1)**S * 2 ** (-126) * (0.F) These are "unnormalized" values.
- If E = 0 and F is zero and S is 1, then V = -0
- If E = 0 and F is zero and S is 0, then V = 0
[edit] Double Precision
The IEEE 754 single precision number requires 64 bits of storage.
0 1 11 12 63 S EEEEEEEEEEE FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
- S - Sign bit
- E - Exponent
- F - Fraction
The value of the 64 bit word:
- If E = 2047 and F is nonzero, then V = NaN ("Not a number")
- If E = 2047 and F is zero and S is 1, then V = -Infinity
- If E = 2047 and F is zero and S is 0, then V = Infinity
- If 0<E<2047 then V = (-1)**S * 2 ** (E-1023) * (1.F) where "1.F" represents the binary number created by prefixing F with an implicit leading 1 and a binary point.
- If E = 0 and F is nonzero, then V=(-1)**S * 2 ** (-1022) * (0.F) These are "unnormalized" values.
- If E = 0 and F is zero and S is 1, then V = -0
- If E = 0 and F is zero and S is 0, then V = 0