Table of Contents
In this
blogpost, Fabien
Sanglard explains how floating point numbers are generally displayed in modern
computers (using the IEEE754
Norm). His alternative perspective using
buckets and offsets is brilliant and explains a fact that may be
overlooked when using the standard explanation: the accuracy of floating point
numbers declines with a factor of 2 by each bucket! That is because the same
'amount' of numbers is dispersed in any bucket, but the size of the bucket
doubles every time.
I wanted to empirically show this and wrote this short C program:
#include <stdint.h> | |
#include <stdio.h> | |
typedef union { | |
float f; | |
struct { | |
uint32_t mantissa : 23; | |
uint32_t exponent : 8; | |
uint32_t sign : 1; | |
} parts; | |
} float_parts; | |
/* Print any memory, given a start pointer and a size, as binary. | |
*/ | |
static void print_any(void *content, size_t bytes) { | |
if (content == NULL) { | |
return; | |
} | |
uint8_t *content_as_byte = content; | |
while (bytes > 0) { | |
--bytes; | |
/* divide byte_index by 2 each iteration */ | |
for (int byte_index = 128; byte_index; byte_index >>= 1) { | |
printf("%x", (content_as_byte[bytes] & byte_index) != 0 ? 1 : 0); | |
} | |
printf(" "); | |
} | |
printf("\n"); | |
} | |
/* Increase a float by exactly one tick. | |
* Size of increment is dependent on magnitude of float. | |
*/ | |
static void increase_float(float *content) { | |
float_parts fc = {.f = *content}; | |
if (fc.parts.mantissa == 0b11111111111111111111111) { | |
fc.parts.exponent += 1; | |
fc.parts.mantissa = 0; | |
} else { | |
fc.parts.mantissa += 1; | |
} | |
*content = fc.f; | |
} | |
/* Prints how the accuracy changes over an interval. | |
* The factor between the last and current step size is also printed, always 2. | |
* (Error is twice as big because all 2^23 values are dispersed over a twice as large interval) | |
*/ | |
static void print_accuracy(float low, float high) { | |
float last_val = low; | |
float last_difference = 0.0f; | |
float difference = 0.0f; | |
for (float val = low; val < high; increase_float(&val)) { | |
difference = val - last_val; | |
if (difference != last_difference) { | |
/* Filter first time to avoid division by zero. | |
* I am sure there is a better way that avoids having to do this. | |
*/ | |
if (last_difference == 0.0f) { | |
last_difference = difference; | |
last_val = val; | |
continue; | |
} | |
printf("value: %.24f\n", val); | |
printf("difference: %.24f\n", difference); | |
printf("factor: %.2f\n\n", difference / last_difference); | |
last_difference = difference; | |
} | |
last_val = val; | |
} | |
} | |
int main() { | |
puts("Accuracy decreases with larger magnitude:"); | |
print_accuracy(0.999999f, 2048.00001f); | |
puts("\nMantissa and exponent change when incrementing over a power-of-two:"); | |
for (float counter = 0.999999f; counter < 1.000001; | |
increase_float(&counter)) { | |
printf("%.24f: ", counter); | |
print_any(&counter, sizeof(counter)); | |
} | |
return 0; | |
} |
One reason Unions are useful
This program uses a C union. Unions allow you to use the same memory for
multiple members, and every time you write one member, it also changes the
others. This may seem useless, but it is not. In this case, it allows you to
write to memory using the memory representation of float
, while reading the
same content of memory as individual bits. The union has 2 members, one
allowing to write and read a float, the other resembling the bit pattern of a
float as defined in the IEEE754
Norm. After writing to the float, you can
read and manipulate the exponent and mantissa from the embedded struct.
Printing any memory content as binary [print_any(content pointer, size)]
Casting the pointer to the memory content to a byte pointer (uint8_t *
)
allows us to index individual bytes. To access individual bits, we need to
generate a bit mask for each of them. We simply start at 0b1000 0000
and
keep shifting left while using the index to access a single bit and print it.
Since the bit at position n
will still be interpreted as being of value 2^n
we need to check if it is set and print a 1
if this is the case, hence the
ternary operator.
We need this function to inspect a floating point number.
Floating point numbers are discrete, too... [increase_float(content)]
We imagine floats as being very fine-grained. That is exactly what they are - they do not allow for arbitrary accuracy, they are not what mathematicians would call dense.
That means, we can take a float and increase it by exactly one tiny amount,
yielding the next-larger float in that bucket! That is what the function
increase_float
does.
We construct a float_parts
union using the float as content. Then, we look
at the mantissa: if it is filled to the brim, we reset it to 0 and increase the
exponent by one. Otherwise, we simply increase the mantissa.
Putting it together: How accuracy declines with growing exponent [print_accuracy(low, high)]
Here we keep increasing a float, and when the 'step size' changes, we log it. We also print the factor between the last and the current step size.
This function runs way faster than I expected for some reason - it visits every
float there is between low
and high
.
There you have it! Empirical demonstration of how inaccurate floats (and doubles, just later) really are.