Using Fixed point maths in microcontrollers

You can use Fixed Point mathematics to save Flash memory and increase performance.

Typically you will need to calculate a result as a 'real' number, for which you usually turn to the floating point library (because its easy). Floating point has advantages, the main one is that it is simple to use, but behind the scenes memory is being consumed. In addition the processor must do more work.

Surprisingly you can do many calculations without loss of accuracy, accepting that calculations are made to a specific number of decimal places - which is true in engineering anyway.

Warning: Floating point is inaccurate - you can not represent every number as a float e.g. 1/3 can not be represented accurately by a floating point representation since it is a recurring number. Rounding errors creep in which is why you have to use larger and larger floating point representations to get an accurate result.

Instead of using a floating point representation you use an integer variable and choose where the fixed decimal point will be placed.

Theoretical operation of an 8 bit ADC

Lets say that you have an 8 bit ADC and you want to figure out the voltage that the ADC is seeing as a real number. To do that you would normally workout the floating point multiplier needed for each bit of the ADC:

bit_value = 5/pow(2,8) = 0.01953125

Where bit_value represents the voltage per bit of the ADC.

So if your ADC returned a value of 127 then the ADC voltage would be:

0.01953125 * 127 = 2.48V

For the maximum adc value you would have:

0.01953125 * 255 = 4.98V

Note: An ADC does not return 5V for maximum voltage input as it can only return values in steps of 0-255  and not 0~256 (256*0.1953125 is 5V).

Do not try to force the value by changing the scaling factor using 5/(pow(2,8)-1) as this introduces more error even though the maximum output of the ADC will now show 5V (it is a fudge and the value shown is not truly 5V).

The question is how can you use an integer variable to store and manipulate this real operation?

Example of using Integer Fixed point

An example problem

Here is the problem to solve:

Using minimum memory resources of a microcontroller, use an 8 bit ADC with 5V reference to transmit serial data to a PC via the serial port an ADC reading every second.

The most important decision is not to use floating point - this saves lots of memory resource - all the rest; sending data to the serial port and timing will not take up much memory (the soft serial port (TX) that you can find here takes about 90 memory words, the hardware internal serial module needs even less memory).

For this problem you will want to calculate an ADC voltage which needs a multiplier of 0.0195. 

Normal maths using a floating point variable would result in the following:

Results using float for ADC Example

ADC reading Floating point calculation Floating point result
0 0 * 0.0195 0.0
127 127 * 0.0195 2.4765
255 255 * 0.0195 4.9725

With Fixed point mathematics you can do this quickly and efficiently using only integer type variables.

Note: Floating point variables use up large amounts of resources in a microcontroller as microcontrollers are only really good at integer type variables.  To do floating point complex library functions are called up. This is similar to the old 386 processor which was no good at floating point operations, being far too slow, so a separate floating point processor was added.

How to setup for fixed point

So how to do it : First of all the step size of each ADC bit is:

5V/256 = 0.0195 (19.5mV)

Where 256 is the expanded value of the number of bits in the ADC 2^8  pow(2,8).

Next, work out the size of the integer variable that is big enough to hold the maximum expected value but is also the minimum size integer you can get away with. Factors such as the number of ADC bits and required calculation accuracy affect the variable size.

Ignoring the leading zeros, choose a multiplier and check the sizes as follows:

In this example choose a multiplier of 195 : Here's the maximum output value:

Maximum value : 255 * 195 = 49725

Max value that unsigned int can store is 216-1 = 65535.

Here you can use a 16 bit unsigned int since 49725 is smaller than 65535.

Note: For larger calculation results use a larger integer type.e.g "long" or "long long" and also remember that integers are different lengths depending on the compiler setup! (this is why types such as uint32_t are used to specify bit length).

Results for fixed point ADC Example

This has calculated the result using fixed point as shown in the table below:

ADC reading Integer calculation  Integer result
0 0 * 195 0
127 127 * 195 24765
255 255 * 195 49725

Note how every number above as an integer value yet the result matches the floating point calculations in the previous table. All that changes is the location of the decimal point which is fixed four places to the left. This decimal point is the part that you as the programmer must remember and take into account for further calculations.

This code uses the internal compiler routine '16 bit multiply' which is much simpler than a floating point multiply so you save memory.

Displaying Output

Simple Algorithm

To display fixed point representation you convert the integer value to a string representation (easy to do ) then test the value for size i.e

If the value is greater than 1000 then print out the left most digit else print zero.

Print out a dot (the decimal point).

Then carry on in the same way:

if >100 print the next digit else print zero.

if >10 print the next digit else print zero.

if >1 print the next digit else print zero.

Alternative Generic print method

Another, more generic way, is to use the integer divide (and remainder) routines and loop through until the value is zero.  Example pseudo code (if the value is held in 'fixed'):

while(fixed) {
    if (fixed % 10) store fixed % 10 as character in string. Move to next string position.
    fixed /= 10;
reverse the string.

You would use a string buffer and pointers to do the above job.

Going Further

What about multiplying or dividing

When you start using fixed point you will probably need to multiply or divide fixed point numbers. The key to doing this is to know where the decimal point will be after the calculation.

In the example above the numbers were chosen to put the decimal point 4 places to the left
so we had the maximum output from the ADC:


and a scaling factor


If you normalize these numbers to scientific notation you have:


This is useful as it now tells you where the decimal point will be i.e. adding powers when multiplying gives 2.55*102 *1.95*102 or 4.9725 * 104 or places to the left.

So for the maximum output number 49725 we have 4 [dp] 9725. The [dp] is imaginary - you have to remember where it is.

This seems liek a lot of work but once you have figured out the size of variable required and the expected output value range you just use the result and interpret that result in the rest of the program.

Remember, you don't lose accuracy with this method and you save a lot of Flash memory (probably around 1k words or more). In addition the execution speed will be vastly improved.

Example using 10 bit ADC

As a quick example if you were to use the standard 10bit ADC found in PIC and AtMega chips then the following calculations would follow:

Voltage value per bit:

5V/1024 = 0.0048828125  (4.882mV)

The scale factor now becomes 488.

Note: Select the number of digits for the required precision e.g. you could have chosen scale factor 4882 - but remember it changes the decimal point location.

Maximum value : 1023.0*488 = 499224

Now you can see that this number is too big for a 16 bit integer so you would choose a long to hold the number.

The decimal point location is 1.023e3 * 4.882e2 so adding powers gives e5 = 5 places to the left. You can see that this is true since 4.99224 is one bit value off of 5V. Here the accuracy is to 2 decimal places.

Improving accuracy

By increasing the number of digits in the scaling value (voltage value per bit) to 4882 you would get the following maximum output:

1023*4882 = 4994286

Adding one bit value of 4882 gives: 4994286 + 4882 = 4999168

So the above is accurate to 3 decimal places with the decimal point now located 6 digits to the left. Ultimately accuracy required depends on the accuracy required at the output.


Have your say about what you just read! Leave me a comment in the box below.

Don’t see the comments box? Log in to your Facebook account, give Facebook consent, then return to this page and refresh it.

Privacy Policy | Contact | About Me

Site Map | Terms of Use