Reading dicom series with SimpleITK and filesort


(Jonas Teuwen) #1

I am trying to read DICOM series using SimpleITK. The current situation is that I have a folders with a lot of dicom series in there, and I only need a few. Luckily I know the prefix of the filenames so I can select the proper files.

However, reader.SetFileNames(fns) does not sort the dicom slices correctly. For this I would need to use sitk.ImageSeriesReader.GetGDCMSeriesFileNames(data_directory, series_ID) which would read through my data folders a couple of times, which is quite inefficient.

So what is the appropriate way to tell the ImageSeriesReader to sort my dicom files?

(Ziv Yaniv) #2

Hi @jonasteuwen,

Not sure if just reading the meta-data for all files is that expensive. You can try the following two options:

  1. Standard SimpleITK.
  2. Python acrobatics + standard SimpleITK.

Option 1:

import SimpleITK as sitk

# A file name that belongs to the series we want to read
file_name = '1.dcm'
data_directory = '.'

# Read the file's meta-information without reading bulk pixel data
file_reader = sitk.ImageFileReader()

# Get the sorted file names, opens all files in the directory and reads the meta-information
# without reading the bulk pixel data
series_ID = file_reader.GetMetaData('0020|000e')
sorted_file_names = sitk.ImageSeriesReader.GetGDCMSeriesFileNames(data_directory, series_ID)

# Read the bulk pixel data
img = sitk.ReadImage(sorted_file_names)

Option 2:

import tempfile
import os
from random import shuffle

# Our list of known file names, just a shuffled version of the 
# sorted tuple above
known_file_names = list(sorted_file_names)

# Create the file reader and get the series_ID
file_reader = sitk.ImageFileReader()
series_ID = file_reader.GetMetaData('0020|000e')

# Create a temp directory and symlink all of the files, 
# then read from there.
with tempfile.TemporaryDirectory() as tmpdir_name:
    # Create symbolic links to the original files from our tmpdir
    for f in known_file_names:
        os.symlink(os.path.abspath(f), os.path.join(tmpdir_name, os.path.basename(f)))
   # Now get the sorted list of file names
    sorted_file_names = sitk.ImageSeriesReader.GetGDCMSeriesFileNames(tmpdir_name, series_ID)
    img = sitk.ReadImage(sorted_file_names)

Hope this helps. Please update the thread with which option you chose.

(Jonas Teuwen) #3

Dear Zivy,

Thanks for the nice solutions. I will go for Option 2, as these directories are really large. However, it is a little bit hacky. Would it be a possibility to add sort_fns=True parameter to the GetGDCMSeriesFileNames option? That would be an SimpleITK-only solution.

(Ziv Yaniv) #4

Hi @jonasteuwen,

We will likely not modify this in SimpleITK as you have a somewhat unique setting (knowing the files belonging to the series without opening all of the files in a directory).
The previous solution with the tempdir may be too acrobatic, though it leaves the sorting to the underlying library.

Below is option 3, where we do the sorting ourselves:

from random import shuffle

known_file_names = list(sorted_file_names)

file_reader = sitk.ImageFileReader()
file_names_and_image_position = []
for f in known_file_names:
    file_reader.ReadImageInformation() # get Z coordinate from image position (patient)
    file_names_and_image_position.append((f, float(file_reader.GetMetaData('0020|0032').split('\\')[2])))
# Sort list according to z
file_names_and_image_position.sort(key= lambda x: x[1])    
z_sorted_file_names,_ = zip(*file_names_and_image_position) 

img = sitk.ReadImage(z_sorted_file_names)

(Jonas Teuwen) #5

Hi @zivy,

Thank you for the final solution and your efforts. This seems like the most elegant way to do it with the current library.