Parallelize itk::ImageSeriesReader (ITK 4.12)

Hi
I am currently working on a project that imports multiple DICOM files in one sitting. this results in some rather lengthy read times. My problem is that I have been unsuccessful in using multiple threads for reading multiple files at the same time. The problem seams to be related to GDCM::Reader that introduces errors when running in parallel.
Here is a code snippet where you can see that I use itk::ImageSeriesReader with GDCMimageIO and a mutex to guard to prevent the error:

typedef itk::ImageSeriesReader< FloatImageType4D > ReaderType;
ReaderType::Pointer reader = ReaderType::New();

typedef itk::GDCMImageIO       ImageIOType;
ImageIOType::Pointer dicomIO = ImageIOType::New();

dicomIO->SetLoadPrivateTags(true);
dicomIO->SetKeepOriginalUID(true);

reader->SetImageIO( dicomIO );

std::vector<std::string> fileNames;
fileNames.push_back(FileTools::filePathSanitisation(sourceFilePath).toStdString());

reader->SetFileNames( fileNames );

try {
    QMutexLocker locker(GlobalGDCMLock::lock());
    reader->Update();
}

Am I holding it wrong or is this a limitation of the itk::ImageSeriesReader?

Welcome to ITK Simon!

The ultimate authority on this question is @mathieu.malaterre.

Are you trying to divide you input series into chunks, read each chunk in a separate thread, and then merge them using TileImageFilter? What is the error message you get, or the problem you observe?

@Simon @dzenanz I did not realize I was the ultimate authority here :slight_smile: Anyway GDCM is thread safe, so I would guess the issue (if any) is rather on the ITK side. I suspect ITK is itself already threaded internally, so I do not believe user code should be doing the thread, esp. regarding I/O (unless you have a special NAS with parallel access).

@mathieu.malaterre You are the authority on GDCM-related questions!

Sadly, most (maybe even all) compression/decompression algorithms we use are single threaded. If the input is compressed in any way, reading it in multiple threads is super-helpful. I guess that’s what he is trying to do?

Thank you for the fast response :slight_smile:
Let me clarify a bit: My case is a bit abnormal, because multiple 3D/4D images are read in order for the headers and pixel data to be parsed and then written to a “datastore”. This is needed because our viewer needs to be able to get one slice (file) at a time. In the case where ITK processing is needed the 3D/4D image is readed ones again but now from the “datastore”.

The error is random segfaults or incomplete files. Hence why I think it has something to do with unsafe threading.

Note: I am amazed by your response times :smiley:

I have also found it helpful in some cases to read images in parallel from network file systems, when latency is more the bottle neck than bandwidth. My first choice is generally local SSD drive, uncompressed with a nearly raw file format read single threaded single threaded.

I wrote a test for SimpleITK for testing parallel IO:

Quickly adding DICOM seems to work there for reading, but I only testing OSX.

If I am correct this only tests for multi processes an not multi threading (https://docs.python.org/2/library/multiprocessing.html). By spawning multiple processes in python e.g. static objects are not shared.

Currently my setup is written in C++ using a single process with multiple threads.

The ThreadPool uses light weight threads:
https://uwpce-pythoncert.github.io/SystemDevelopment/threading-multiprocessing.html#threadpool

@blowekamp could you elaborate? I am not sure that I understand your logic. I am using a threadpool and not a processing pool.

My code uses threads too.

I think that conclusion here is that there is probably a bug in your code @svg. There seem to be no blockers for what you want to do.

Well okay then. I will be spending more time on it. I am grateful for all the help :slight_smile:.

Just to finish up the unit test:
“from multiprocessing.pool import ThreadPool” are used and then individual processes are spawned using p.map().
I still read the documentation as running individual processes in parallel (https://docs.python.org/3.6/library/multiprocessing.html#multiprocessing.pool.Pool) meaning that the test dont test for it being thread safe…

As far as I know, Python’s multiprocessing module is just that - a multi processing module. The ThreadPool uses processes (not light weight threads) in order to side-step the problem of the Global Interpreter Lock, so that you may actually utilize multiple cores efficiently in Python. It’s all done using a “thread like” API, to make it easier to understand for developers. The GIL is only released on I/O operations, so without using multiple processes, Python is in effect single threaded and thread-safe in itself - and horrible at performing CPU bound work :slight_smile:

@svg’s problem is in C++, where multiple threads can in fact run at the same time (no GIL), so it may very well be a problem within GDCM or ITK it seems to me :thinking:

1 Like

Welcome to ITK, Simon!

Whether reading the images in parallel is implemented in C++ or Python, it is important that each thread gets its own instance of the itk::ImageSeriesReader. There should be independent sets of pipeline / image / data state per series / thread. It is not necessary to lock on reader->Update().

I hope this helps.

1 Like

If I read your response correct @matt.mccormick. Your saying that the code snippet included in the first comment is correct and should work with multiple threads?

Create incense of reader:

typedef itk::ImageSeriesReader< FloatImageType4D > ReaderType;
ReaderType::Pointer reader = ReaderType::New();

Create instance of ImageIO:

typedef itk::GDCMImageIO ImageIOType;
ImageIOType::Pointer dicomIO = ImageIOType::New();

One more question :slight_smile: regarding your comment seen below. Your simply saying that each reader must read its own file (they are not allowed to read the same file), right?

Have a great day and thank you for replying :slight_smile:

Hello,

Thank you for clarifying that :+1:

Seems I’ve learned something new about Python’s standard library today, so that’s a good day :slight_smile: From reading their own documentation, it sounds like processes are used.

1 Like

It is not necessarily correct or incorrect – it depends how it is called. It should be fine if a new reader created in every thread.

Multiple readers could read the same file.

There might be something different about your files. Perhaps they use some type of compression which is supported by a third partly library that isn’t full thread safe? Perhaps try your code with some other DICOM files?

1 Like

Recap from my findings:
I am currently way past the time limit on this problem, meaning that I sadly don’t have any more time to spend on this issue so I will shortly sum up my findings.
After not being able to make multi-threaded reading work in my project I made a smaller test where I read 1 file and then wrote it to another location (I needed this in order to compare the results) 10000+ times in parallel. However, because the “itk::ImageSeriesWriter” turned out to be far from thread safe I had no luck using this approach.
In regards to the reading in parallel I had random segfaults an no luck validating that the files was correctly read in my small test, leading me to the conclusion that I am unable to parallelize this unfortunately.

If someone are reading this thread and have been successful in using C++ and multi-threaded ITK reader (and possibly writer ) I am interested in hearing you solution :grinning: .

To all that have tried to help, I say thank you :+1: :slightly_smiling_face: