Using SimpleITK to Convert Dicom to Nifiti With Misleading Series Tags

Hi All,

I’m trying to convert a CT abdomen triple-phase dicom data folder into nifti 3D data by series. The approach I’m using is below and works great, BUT for some of the data the slices are not in the correct order due to something specific to the scanner might be writing to the meta data. I’m hoping someone else here has crossed this bridge before and may be able to share advice.

For example, there can be random few slices from the chest that suddenly appear in the abdomen (probably the images that were used for bolus timing). Also, what should be the ends of the data sometimes appear in the middle, like a frame shift.

series = #list of dicom file paths with common acquisition time and series description
reader = sitk.ImageSeriesReader()
reader.SetFileNames(series)
reader.Execute()

One work around I tried was to group .dcm by “acquisition time” and “series description” and/or “series ID” before feeding to “reader.SetFileNames()”. This helped with some initial issues, but not the ones described above.

Given a list of .dcm files, how does “reader.SetFileNames()” know what order to write/stack them in? Are there specific DICOM tags I’m neglecting that can be leveraged to preprocess before sending them to “reader.SetFileNames()”?

Any suggestions or guidance is greatly appreciated.

Thank you!

Brett

Have you looked at our dicom examples:
http://simpleitk.readthedocs.io/en/master/Examples/index.html

These are for the recently release 1.1rc1 which includes improved metadata support.

Hi Brad,

These examples are really great and actually helped a lot to get me started initially.

I’m able to convert 70% of my dicom data folders into individual nifti files by series perfectly. However, there must be a scanner at my institution that either incorrectly assigns meta-data or is being misinterpreted by simpleITK’s reader/writer as slices in the resultant 3D volume are not in the correct order.

If given a list of .dcm file paths, do you know if there is a specific dicom tag (eg Patient Position?) that simpleitk uses to determine how a stack of dicom files are ordered before being written into a 3D volume? Or maybe can direct me to that code?

Thanks so much for your help!

Can you please provide a minimal example of your code that is not reading the series properly?

I’m about to board a plane…

##This is first block of code and creates list of the unique acquisition time and series descriptions (no SimpleITK used yet)

#Output Subject Dir
  in_dir = os.listdir(name)
  #print (name)
  if len(in_dir) < 4:
      pass
  else:
    
    # Group DICOM Files By Acquisition Time + Series Description
      for j,i in enumerate(in_dir):
        
          info = dicom.read_file(os.path.join(name,i))
            
          try:
            
              acc = info[0x008,0x0050][:]
              acquisition_time = info[0x0008,0x0032].value
              slice_position = info[0x0020,0x1041].value
              series_num = info[0x0020,0x0011]
              series_desc = info[0x0008,0x103e].value
              series_uid = info[0x0020,0x000E].value
       
              dicom_table.append([j,acc,slice_position,series_desc+"_"+acquisition_time,os.path.join(name,i)])

          except KeyError:
                pass
            
  unique_times = [file[3] for file in dicom_table]
    
  return set(unique_times), dicom_table
##Then I use the unique list generated in the first block to inform SimpleITK which groups of .dcm file paths should be made into a series-specific 3D data set

def nifti_by_series(unique_column_values,column_number,dicom_table,output_folder):
    
    # Check and Make Study Folder if Does Not Exist
    
    nreader = sitk.ReadImage(dicom_table[0][-1])

    acc = nreader.GetMetaData("0008|0050")

    if os.path.exists(os.path.join(output_folder,acc)) == True:
        return
    else:
        os.mkdir(os.path.join(output_folder,acc))
    
        series = list()

        series_names = list()

        for i,value in enumerate(unique_column_values):
            
            reader = sitk.ImageSeriesReader()
            series_num = i
            series.clear()

            #Group Common Series File Paths By Time Acquistion
            for file in dicom_table:

                if file[column_number] == value:

                  series.append(file[-1])

            nreader = sitk.ReadImage(series[0])

            series_desc = nreader.GetMetaData("0008|103e")

            if len(acc) == 0:
                acc = 'blank_acc_tag'
            print (acc)

            print (series_desc,value)

            series_names.append([acc,series_desc,value,series_num])

            reader.SetFileNames(series)

            try:
              image = reader.Execute()
            except RuntimeError:
             print ("--> Fundamental error in image layer, skipping...")
             continue

            print ("--> Writing image:", series_desc+"_"+value)

            sitk.WriteImage(image,os.path.join(output_folder,acc,series_desc+"_"+value+".nii.gz"))

        return series_names

You should be using ImageSeriesReader::GetGDCMSeriesIDs and GetGDCMSeriesFileNames.

That is what I tried initially and ended up with even more issues.

I may not be explaining the issue well. For some scanners identical series IDs are assigned to multiple 3D images (eg small images used for bolus timing or reformatting from axial to saggittal etc). This prevents efficient use of simpleITK to convert from a clinically-derived dicom study folder.

The above code is an attempt to work around this. Still having issues, but if there’s a way to find out how “reader.SetFileName()” or "reader.Execute()) from above choose to sort perhaps I think I could be able to improve this further.

Thank you!

The SimpleITK ImageSeriesReader it the class which handles a list of files when passes to sitk.ReadImage. The series reader does not sort the list of files, the files are read in the order they are specified. The method ImageSeriesReader::GetGDCMSeriesFileNames returns the files in the proper order. That is why I asked if you were using it.

Have you looked/tired at the all the options to GetGDCMSeriesFileNames:

Parameters
directory	Set the directory that contains the DICOM data set.
recursive	Recursively parse the input directory.
seriesID	Set the name that identifies a particular series. Default value is an empty string which will return the file names associated with the first series found in the directory.
useSeriesDetails	Use additional series information such as ProtocolName and SeriesName to identify when a single SeriesUID contains multiple 3D volumes - as can occur with perfusion and DTI imaging.
loadSequences	Parse any sequences in the DICOM data set. Loading DICOM files is faster when sequences are not needed.

This static member function is a wrapper foritkGDCMSeriesFileNames, perhaps the implementation will shed some light.

I am unfortunately not a dicom expert so I can’t tell you how the tags are suppose to be interpreted.

My 2 cents,
Usually , the set of images that are from the same volume are identified by having the same study (0020,000D) and series IDs (0020,000E), nothing more. In cases that include a temporal dimension you may need to look at other tags, such as Frame Content Sequence (0020,9111) or Stack ID (0020,9056), see DICOM standard.

I believe the ‘useSeriesDetails’ option in gdcm (which is what SimpleITK wraps), should work. If not, for us to debug the issue we will need to get a minimal set of slices from you (anonymized).

1 Like

Bradley and Ziv,

Thanks so much for these suggestions. I’m going to leverage the
useSeriesDetails option and dig further into what dicom tags aren’t
behaving as expected for this particular scanner.

Do you suggest any lite anonymizers so that I can provide everything needed
for debugging, if it’s still not working?

I would recommend taking a look at David Clunie’s “DICOM stuff”. While we are knowlegable on the subject, he is an expert. The tools of interest :

  1. dciodvfy, dcentvfy DICOM verification tools, though the sign says they are “NOT officially recognized or supported tools for certifying DICOM compliance” - they are more robust than anything I’ve written.
  2. DicomCleaner - modify the header so you can anonymize.
  3. Another option is to use Osirix’s anonymization functionality, though that assumes you are on a Mac, see this youtube video.
1 Like

I’ve tried to break this down into two main issues and have provided code and de-identified data as well. For reference, the machine is a GE Lightspeed VCT and the study is a triple-phase CT AP. Nothing turned up on Google search of dicom issues with that scanner… but that probably doesn’t mean much.

https://www.dropbox.com/sh/tcnq9bdso7s046a/AACWmSvv5jaCLKJ33_WsZFFZa?dl=0 (CLEAN DICOM DATA)

First Problem: Multiple Volumes Per Series Using GetGDCMSeriesFileNames.

And when using GetGDCMSeriesFileNames with useSeriesDetails it generates an empty list :frowning:. With plain old GetGDCMSeriesFileNames, as in code below, the STANDARD RECON series includes multiple volumes, specifically multiple contrast phase acquisitions in one series.

data_directory = "1.2.840.113619.2.334.3.2831163449.135.1463999344.729"
out_dir = "./useSeriesDetailsTest"
series_IDs = sitk.ImageSeriesReader.GetGDCMSeriesIDs(data_directory)

if not series_IDs:
    print("ERROR: given directory \""+data_directory+"\" does not contain a DICOM series.")
    sys.exit(1)


for i,series_ID in enumerate(series_IDs):

    series_file_names = sitk.ImageSeriesReader.GetGDCMSeriesFileNames(data_directory, 
                                                       series_ID,useSeriesDetails=False) #useSeriesDetails ?
    series_reader = sitk.ImageSeriesReader()
    series_reader.SetFileNames(series_file_names)

    try:
        img = series_reader.Execute()

    except RuntimeError:
        print ("--> Fundamental error in image layer, skipping...")
        continue

    nreader = sitk.ReadImage(series_file_names[0])

    print (str(nreader.GetMetaData("0008|103e"))+" is being processed")

    series_desc = nreader.GetMetaData("0008|103e")

    sitk.WriteImage(img,os.path.join(out_dir,str(i)+"_"+str(series_desc)+".nii.gz"))

Printing out dicom tags for slice position, series description, acquisition time and series uid (in that order) we see why this is. A single series UID is associated with data from different time points–clinically these are two different contrast phases.

image

Problem Two: Out of Order Slices When Applying Work-around That Feeds Only DCMs From Single Volume

To combat problem one I wrote code for a work around that is included earlier in this thread. Essentially, it makes a list of all dcms as shown above, then groups file paths for dcms with the same acquisition time AND series uid. This works to isolate dcms associated with a single volume, but when these lists are then fed back into SimpleITK to generate a volume some are out of order (…and some work). And those out of order are not randomly out of order…but not consistently out of order in the same way either. See an example here:

And when I drag only those dcms associated with single volume into Slicer or Osirix they magically appear in the right order!! So there may be something unique to SimpleITK causing this…

In the meantime, any suggestions on an alternate dicom to nifti conversion tool would be greatly appreciated as I already have this “sorting dcms by series” solution above.

Apologies for the insanely long post. SimpleITK has been a cornerstone for tons of my work and greatly appreciate all that you guys do. I hope this might help your efforts in some way.

-Brett

1 Like

Thanks for providing the data. This will allow us to better investigate what is going on.

We are using gdcm through ITK wrapping and Slicer is using DCMTK so that is probably the explanation for the different behavior.

A “hacked” simpleitk solution that should work (famous last words) is that you separate the files into different directories using your code and then do a GetGDCMSeriesFileNames pointing to the correct directory. This will allow gdcm to sort the files in that directory according to the patient position and should return the list of files in the correct order for reading.

1 Like

Hi Ziv,

Thanks so much for taking a look.

I did separate one of the problematic series (with regard to image
ordering) into it’s own directory but kept getting the same "out of order"
conversion problem. Would be interesting if that problem is consistently
encountered by others.

Probably just going to go with nipype using the initial "series separating"
code I already wrote. Still curious to find out what might have been
throwing SimpleITK for a loop, but not essential as there’s a work-around
now.

Best,

Brett

There are many command line tools for reconstructing 3d volume from a DICOM image series acquisition. I am not sure about SimepleITK, but last time I tried, ITK does not sort the instances geometrically. That should be done prior to passing the instance list to ITK. I would try using some of the established tools linked above, such as plastimatch or dcm2niix, for reconstructing individual contrast phases.

1 Like

Slicer has a setting that allows you to choose between GDCM and DCMTK for reading DICOM. I believe the default is GDCM, and also by default DCMTK will be used as fallback when GDCM fails.

Also note that Slicer sorts the instances geometrically prior to passing the list to either GDCM and DCMTK.

@bsmarine
Hi Brett,
Bringing this post back-
Did you solve this problem completly?
If so, do you mind sharing your code?
I’m facing a similar problem,
Thanks
Hadas.

I ended up sorting all images into folders by series using pydicom then using Nipype at link below.

https://nipype.readthedocs.io/en/latest/interfaces/generated/nipype.interfaces.dcm2nii.html