Week 6
Representing real numbers: Fixed and Floating-point
COS10004 Computer Systems · Lecture 5.2 Representing real numbers: Fixed and
Floating-point
Number systems: fixed-point numbers
- Integers are much easier to work with than real numbers!
- What’s a real number?
- Numbers that consist of an integer part and fractional part:
- E.g., 3.1, 6.443442, 100.0 etc
- Major representational trade-offs:
- Space efficiency (and precision)
- Computational efficiency
Not all real numbers can be represented!
- Real numbers can be infinitely precise
- Consider numbers like Pi (3.1415965358979323846264…..)
- Or 1/3 = 0.3333333333333333333333….
- Computers only have finite memory
- The number of bits available will directly impact the value range of values, and the precision
- How we choose to represent real numbers will greatly impact the bits we have.
- Firstly
- recall how decimal works
5123.124
Weight 103 102 101 100 Decimal 10-1 10-2 10-3 Point
Digit 5 1 2 3. 1 2 4
NO THINK HOW BINARY WORKS
- To represent numbers, the binary system uses base 2. Therefore, the binary system is also known as base-2 system and represented by two symbols. 1011.101
Weight 23 22 21 20 Binary 2-1 2-2 2-3 Point
Digit 1 0 1 1. 1 0 1
FIXED-POINT REPRESENTATIONS
- One approach for representing real number apprxoimations is to dedicate some fixed number of bits for the integer and fractional parts.
- We call this fixed point representation because the binary point (i.e. the split between the two parts) is at a fixed location in the word.
Fixed binary point
24 23 22 21 20 2-1 2-2 2-3 16 8 4 2 1 0.5 0.25 0.125
Number systems: fixed-point numbers
Fixed binary point
24 23 22 21 20 2-1 2-2 2-3 16 8 4 2 1 0.5 0.25 0.125 0 1 0 1 0 1 0 1 • With the above numbering system 01010 101 = 10.625
- Can use all standard arithmetic and also represent in 2’s complement form.
- Just have to remain consistent as to which bits are fractional part, i.e. must fix the binary-point.
Fixed Point Example:
Using the fixed<16,7> binary point representation show below, represent the number 25.6640625 (2 marks):
8 7 6 5 4 3 2 1 0 . -1 -2 -3 -4 -5 -6 -7
- Don’t panic. Start by converting 25 to binary: 25 = 16+8+1 = 000011001.
- Then the “decimal” point.
- convert 6640625 to binary (starting with 0.5, 0.25, 0.125...); 0.5+0.125+0.03125+0.781259 converts to.1010101;
- Concatenate the two numbers: 000011001.1010101.
Binary-coded decimal (BCD)
- Each digit of a decimal number represented by a nibble in the data word.
- For example: 7926.34 = 0111 1001 0010 0110.0011 0100
7 9 2 6 . 3 4
- Conceptually simple means to convert and represent large fractional decimal numbers in binary form on basic CPUs.
- Can make arithmetic algorithms simple.
- Used in basic calculators.
- Overhead is inefficient storage c.f. fixed point. (37% wastage)
Binary-coded Decimal (BCD) Example:
Using BCD and the fixed point representation show below, represent the number 29.95 (2 marks):
7 6 5 4 3 2 1 0 -1 -2 -3 -4 -5 -6 -7 -8
.
- It’s a 4-digit number so we will need 4 nibbles (2 bytes). Fixed decimal point, so (for instance) 265.5 would overflow
- 2 converts to 0010;
- 9 converts to 1001;
- Then the “decimal” point.
- Then 9 (1001.);
- then 5 (0101) 7 6 5 4 3 2 1 0 -1 -2 -3 -4 -5 -6 -7 -8 0 0 1 0 1 0 0 1 1 0 0 1 0 1 0 1
Floating Point Representation
- Issue with Fixed-point:
- Precision needs vary with numbers. Fixed point can be wasteful
- Floating point representations:
- Represent real numbers with variable bit allocations for the fractional component
- Supports a trade-off between value range and precision
- BUT – much more complex!
- Follows a similar idea to scientific notation:
Floating point
- Floats, doubles.
- Separate into mantissa and exponent.
- Mantissa:
- Add / subtract (2’s compliment) with adders
- Multiply / divide with shift registers (+ counter and adder)
- Exponent:
- Multiply/divide – add / subtract (2’s compliment) with adders
- Add / subtract – If the exponents are the same, just work on the mantissas
IEEE 754 Standard
- IEEE 754: a standardised specification for allocating bits
- Below is for 32 bit floating point, made up of three parts:
- Sign bit (1 bit)
- Exponent (8 bits)
- Fraction/Mantissa (23 bits)
Floating point representation
- Floats, doubles.
- Separate into mantissa and exponent.
- Mantissa/significand:
- Add / subtract (2’s compliment) with adders
- Multiply / divide with shift registers (+ counter and adder)
- Exponent:
- Multiply/divide – add / subtract (2’s compliment) with adders
- Add / subtract – If the exponents are the same, just work on the mantissas
Example:
Using the IEEE 754 floating point standard (shown below), represent the number -273.5 as a 32-bit single precision floating
point number (2 marks) bit 23 = binary point
9 between 20 and
- This is –ve, so bit 31 will be 1 2 =256 20.5
- Convert 273.5 to binary: 256+16+1+0.5 = 100010001.100...(trailing 0s)
- Shift binary point left to right of bit 23 (count the shifts) (=8) (1.00010001100..)
- Pad with 0s (LSB) and remove bit 23 (always 1) (.00010001100000000000000)
- Add 27-1 to the number of shifts (in binary) the mantissa 8 00001000 +127 01111111 =135 10000111 <-the exponent
- Concatenate sign bit, exponent and mantissa:
17/8/20 15
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 1 0 0 0 0 1 1 1 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Using the IEEE 754 floating point standard (shown below), represent the number -273.5 as a 32-bit single precision floating
point number (2 marks) bit 23 = binary point
9 between 20 and
- This is –ve, so bit 31 will be 1 2 =256 20.5
- Convert 273.5 to binary: 256+16+1+0.5 = 100010001.100...(trailing 0s)
- Shift binary point left to right of bit 23 (count the shifts) (=8) (1.00010001100..)
- Pad with 0s (LSB) and remove bit 23 (always 1) (.00010001100000000000000)
- Add 27-1 to the number of shifts (in binary) the mantissa 8 00001000 +127 01111111 =135 10000111 <-the exponent
- Concatenate sign bit, exponent and mantissa:
17/8/20 16
Number systems: floating point representation
- Most significant bit of mantissa not included as it
- Exponent is in 2’s complement form (positive and negative) and added to +127 (b’0111 1111’)
- The term significand has tended to replace mantissa
- There are special patterns to represent: +/- infinity; +/- 0 (zero), NaN (not a number)
Floating-point Operations – Where?
- Floating-point numbers typically handled using dedicated circuits to perform arithmetic operations:
- Sometimes referred to as the math co-processor or Floating Point Unit
- One or more FPUs typically resides in the CPU
- Some simpler computers may not offer floating point hardware:
- May still be emulated using ALU and supporting floating- point library
Summary
- Real numbers pose a specific challenge for representing in binary
- Fixed-point representations offer simplicity, but can be wasteful
- Floating point representations standard in modern computers
- IEEE 754 standard
- Allows trade-offs of range and precision
- Requires dedicated FP arithmetic hardware:
- FPU – floating point unit