Get path of the mhd's data file

I use MetaImage files (mhd) a lot and sometimes need the path of the data file. Usually, I have the file loaded via SimpleITK already. Thus, is there a way to retrieve the name of the data file from the python interface somehow?

>>> import SimpleITK as sitk
>>> img = sitk.ReadImage('myimage.mhd')
>>> img.????()
'myimage.zraw'

It would be easy if the file would be named always filename.raw but sometimes I have compressed files too…

Hello @reox,

This is not built into SimpleITK. You can trivially add it via the metadata dictionary, see code below for SimpleITK<2.1 and SimpleITK>=2.1 which will be out shortly and supports more Pythonic access:

import SimpleITK as sitk
from pathlib import Path

def sitk_read_image(file_name):
  image = sitk.ReadImage(file_name)
  image.SetMetaData('file_path', str(Path(file_name).absolute()))
  return image
  
file_name = 'training_001_ct.mha'
image = sitk_read_image(file_name)
print(image.GetMetaData('file_path'))

Upcoming more Pythonic version:

import SimpleITK as sitk
from pathlib import Path

def sitk_read_image(file_name):
  image = sitk.ReadImage(file_name)
  image['file_path'] = str(Path(file_name).absolute())
  return image

file_name = 'training_001_ct.mha'
image = sitk_read_image(file_name)
print(image['file_path'])
2 Likes

That means there is also no access to the data file? But the SetMetaData sounds good, I can simply write a wrapper for ReadImage, just like in your example, and read the .mhd textfile on my own to extract the path.

Thanks a lot!

Note that the metaimage format allows using many data files and there are other fields in the header that are essential for interpreting the data file (offset, compression method, etc), so just storing one data file name will not be sufficient for reliably accessing the data. It is better to let ITK take care of interpreting the header and reading and decoding the data files.

Metaimage is not a very good file format by the way. It does not have standard way to specify axis types (so you cannot tell if you have a 2D color image or a 3-slice volume; or robustly store time sequence or tensor data), there is ambiguity in interpretation of axis directions, it is unnecessarily complex (allowing using many data files), and does not support measurement frames. I would recommend using nrrd file format instead, which is very similar but does not have any of these limitations. Nifti format is popular, too, but it has many issues.

I agree, Metaimage is not the best format. However, it is used a lot at our institute and we have TB of data in that format… It has simply developed over time. But fortunately, our data is in 99.5% CT volume data, which makes things a bit easier :slight_smile:

I wont use the data file name for reading purposes but only for managing files. That is the reason why I want it in addition to the Image class.

right now, I have this piece of code:

def sitk_read_image(path):
    img = sitk.ReadImage(path)
    img.SetMetaData('file_path', os.path.abspath(path))
    _, ext = os.path.splitext(path)
    if ext in ['.mhd', ]:
        with open(path, 'r') as fp:
            for line in fp.read().splitlines():
                if line.startswith('ElementDataFile'):
                    img.SetMetaData('data_path',
                                    os.path.join(os.path.dirname(path), line.split('=', 1)[1].strip()))
                    break
    return img

Which is a bit ugly but works for my purposes.
With this I can for example get the size on disk of the file:

>>> os.path.getsize(x.GetMetaData('data_path'))
744065076

But to avoid the XY problem, maybe I should have asked in the first place: Is there a way in SITK to retrieve the size on disk for metaimage files?

Hello @reox,

No, SimpleITK does not have a way of retrieving the file size on disk.

One last word of caution, there is a reason why most file formats do not contain the file name or file path as part of the header/internal information, you can readily change the path or file name without the “file” being aware of this making that internal information wrong. We want our IO to be oblivious to file names as these are just one way for humans to specify what data they want to work with and is not inherently part of the data. Think patient name in database of patients as another way to specify which “files” you want to display/process.

yes, that is clear.
To not fall into this trap accidentally, I’ll put the on-disk file size as metadata and not the path.