(Simple)ITK not checking DICOM integrity?

I have a corrupted DICOM series: https://www.dropbox.com/s/i4wdj7ehkivijh5/corrupted_dicom.tar.bz2?dl=0 There are a couple of slices (z=193-194) with incorrect pixel values. When trying to read the image with pydicom with GDCM backend, I’m getting the following error:

ValueError: The length of the pixel data in the dataset (517118 bytes) doesn't match the expected length (524288 bytes). The dataset may be corrupted or there may be an issue with the pixel data handler.

I am expecting similar behaviour from SimpleITK, but instead it reads the series without complaining, and the result contains “weird” slices, as seen in ITK-SNAP:

I used the following to read in the series and save it as a 3D image:

reader = sitk.ImageSeriesReader()
filenames = reader.GetGDCMSeriesFileNames('corrupted_dicom/')
im = reader.Execute()
# for easier viewing, offset and clamp
im += 1000
clamp_filter = sitk.ClampImageFilter()
im = clamp_filter.Execute(im)
im = sitk.Cast(im, sitk.sitkInt16)
sitk.WriteImage(im, 'corrupted_dicom.gipl')

Hello @alkamid,

This appears to be an issue in GDCM. The following gdcm code read the corrupted image without an error:

import gdcm
corrupt_file_name = 'IM-0001-0044.dcm'
reader = gdcm.ImageReader()
gdcm_image = reader.GetImage()

Please inquire on that project’s mailing list https://sourceforge.net/p/gdcm/mailman/ .

p.s. You only need to provide the single corrupted slice, not the whole series.

Thanks for your reply @zivy. Looking at the code, it seems that GDCM is compiled with GDCM_SUPPORT_BROKEN_IMPLEMENTATION flag ON by default. Turning that flag off doesn’t change that behaviour and gdcm still fails silently, so I’m slightly confused. Maybe it’s worth investigating and adjusting flags for the GDCM version bundled with ITK.

Hello @alkamid, based on your analysis this really points to an issue with GDCM. Resolving it will require changes in GDCM (unless there is some setting we are missing). Please post your issue on their mailing list (referenced above).

Thanks @zivy, I posted to that mailing list a week ago — no reply so far.
Anyway, I found this “feature” when experimenting with pydicom, which raises an exception on this DICOM slice. The exception does not come from catching the GDCM warning, but from post-processing of the series. pydicom checks for expected length and real length of the pixel data, and raises an exception if there is a mismatch, which looks like a sensible thing to do: https://github.com/pydicom/pydicom/blob/master/pydicom/pixel_data_handlers/numpy_handler.py#L252. Maybe SimpleITK could do something similar?

Hi @alkamid,

Saw the message you posted on the GDCM mailing list.

Unfortunately, I don’t think we can directly address this in SimpleITK. Primarily because we are wrapping the ITK components which use GDCM. When there is no error generated in the underlying layers and we get data with the expected size (corrupted but the size we get is what it should be) then we cannot recover because we don’t know something unexpected happened and our data “is valid”.

I also tried compiling SimpleITK with DCMTK (add the boolean variables to Cmake Module_ITKDCMTK and Module_ITKIODCMTK and set to ON). I then forced reading using this image IO file_reader.SetImageIO('DCMTKImageIO'). Unfortunately, the result was a segfault on this image.

So at this time we do not have a good solution.