Floating point issue in mhd header file

fool2nd · May 19, 2024, 11:47pm

Hi,

I ran into a problem when writing an image to mhd/raw file. Here’s the code snippet.

import SimpleITK as sitk
image = sitk.ReadImage(‘image.mhd’)
print(image.GetSpacing())
(0.002, 0.002, 0.002)
image.SetSpacing((0.075015,0.075015,0.075015))
sitk.WriteImage(image, ‘result.mhd’)

The spacing in .mhd file is 0.075014999999999998 instead of 0.075015. Is there a way to output the spacing exactly to 0.075015?

Thanks,
Sam

zivy · May 20, 2024, 12:46pm

Hello @fool2nd,

The mhd file is a text file and you are writing a floating number, so there is a conversion to string that happens along the way. Generally speaking, this conversion has a limited precision and may be different from the actual binary value stored in the variable. To maintain precision it is better to use a binary format.

Having said this, the representation of the spacing when read back as an image is what you expect it to be:

import SimpleITK as sitk

dim=3
file_name = "result.mhd"

write_spacing =[0.075015]*dim
image = sitk.Image([512]*dim, sitk.sitkUInt8)
image.SetSpacing(write_spacing)
sitk.WriteImage(image, file_name)

print(write_spacing)

read_image = sitk.ReadImage(file_name)
print(read_image.GetSpacing())

fool2nd · May 21, 2024, 7:00am

Hi @zivy ,

Thanks a lot for the reply. I tried the code and the spacing read from the result image is 0.075015 but the spacing text in result.mhd is still 0.075014999999999998.

The reason I hope the spacing can be exactly same in .mhd file is, the user may open the mhd file in a text editor and complain that the spacing values are not exactly the same. You said “To maintain precision it is better to use a binary format.” Which format do you refer to?

Thanks again,
Sam

zivy · May 21, 2024, 8:44pm

Hello @fool2nd,

The scenario you are describing is dangerous. You don’t want to enable folks who are not very knowledgeable to easily make changes to the image metadata. Using a text editor anyone can make catastrophic changes to the image spacing, origin etc. As a rule of thumb, prefer data integrity over ease of data manipulation.

With respect to binary formats there are many. You can use the single file version of the meta-image mha or other common formats such as nrrd or nii.gz. See this page for the list of supported file types.

imikejackson · May 22, 2024, 2:37pm

The reason that “0.075015” will be printed as “0.075014999999999998” into the file is because of the conversions between the binary representation and the text representation of the number. If you want exact, then do as Sam has suggested and go with a binary format.

fool2nd · May 22, 2024, 10:47pm

Hi @zivy, @imikejackson,

Thanks for the reply. I will take a look at the binary format.

Sam

lassoan · May 24, 2024, 4:25pm

Well, I would recommend to not do that. It seems that two different meanings of “binary” is mixed here.

Binary (raw) vs. plain text (string): Advantage of storing metadata as plain text is that it is directly accessible to humans, without some custom parser implementation, and generally it is more flexible. Advantage of storing values in binary representation is that parsing can be simpler and faster.
Binary (base 2) vs. decimal (base 10): We use base 10 for displaying numbers, while base 2 is more efficient for computations. Most numbers in base 10 require infinite number of digits in base 2. For example, 1/10 in base 10 is finite (0.1), while in base 2 it is infinite (0.00011001100110011001100110011001100…). For the specific 0.075015 value that is discussed above, you cannot exactly represent as either a float or double. Closest float is 0.075015000998973846435546875, closest double is 0.0750149999999999983479881393578 (source). So, in general when we convert a displayed number (decimal string) into binary representation then information is lost. We don’t worry about this much, though, because this loss is insignificant and usually the only place where it comes up is display and some rookie programming errors (e.g., using exact comparison for floating-point numbers).

Practical consequences:

“Binary” (in any sense) is not “exact” if the goal is to exactly store a displayed number. If you want exact, go with base 10 (which can losslessly store any finite base 2 or base 10 number) and for additional convenience go with plain text. Note that many file formats, such as DICOM (that stores most numbers in “decimal string” representation), NRRD, MetaIO, etc. do exactly this. In contrast, you cannot store displayed numbers exactly if you use NIFTI format (although I admit that this is not a practically relevant limitation because we typically drop the exact spacing value much earlier: when we read the the image from DICOM; and the loss of accuracy is negligible anyway).
Since the lossy conversion to base 2 (IEEE754 single or double precision) is practically inevitable (because efficient implementation of floating-point operations is important and the loss during conversion is negligible), people who have access to raw data must be educated on limitations of base 2 floating point representations. For example, this Python tutorial page provides a really good summary.

imikejackson · May 25, 2024, 5:45pm

Not really the rabbit hole I wanted to go down this weekend, but here we go.

@lassoan I take great exception to this:

Closest float is 0.075015000998973846435546875

How? A 32 bit floating point number has 7.2 digits of precision. I’ll round up to 8 since that seems pretty common. So I’m not sure how you are expressing a 32 bit floating point number with 28 digits? Even a 64 bit float only has 16 digits of precision.

0.0750150                 // 32 bit float
0.075014999999999 // 64 bit float

are all the digits you are allowed. Whether or not IEEE-754 can actually display that number is the next question. Base 2 can only display certain floating point numbers as you point out. There are great articles on the internet that explain in detail why.

@lassoan points still stand. Essentially, there are tradeoffs to how you store floating point numbers and the methods that are employed to convert them between their various representations.

lassoan · May 25, 2024, 6:25pm

I included a link to the source in my post above. It is the exact value of the closest float. There may be numbers with less digits that are closest to the same float, but this is the exact value.