Hi
I am currently working on a project that imports multiple DICOM files in one sitting. this results in some rather lengthy read times. My problem is that I have been unsuccessful in using multiple threads for reading multiple files at the same time. The problem seams to be related to GDCM::Reader that introduces errors when running in parallel.
Here is a code snippet where you can see that I use itk::ImageSeriesReader with GDCMimageIO and a mutex to guard to prevent the error:
Are you trying to divide you input series into chunks, read each chunk in a separate thread, and then merge them using TileImageFilter? What is the error message you get, or the problem you observe?
@Simon@dzenanz I did not realize I was the ultimate authority here Anyway GDCM is thread safe, so I would guess the issue (if any) is rather on the ITK side. I suspect ITK is itself already threaded internally, so I do not believe user code should be doing the thread, esp. regarding I/O (unless you have a special NAS with parallel access).
Sadly, most (maybe even all) compression/decompression algorithms we use are single threaded. If the input is compressed in any way, reading it in multiple threads is super-helpful. I guess that’s what he is trying to do?
Thank you for the fast response
Let me clarify a bit: My case is a bit abnormal, because multiple 3D/4D images are read in order for the headers and pixel data to be parsed and then written to a “datastore”. This is needed because our viewer needs to be able to get one slice (file) at a time. In the case where ITK processing is needed the 3D/4D image is readed ones again but now from the “datastore”.
The error is random segfaults or incomplete files. Hence why I think it has something to do with unsafe threading.
I have also found it helpful in some cases to read images in parallel from network file systems, when latency is more the bottle neck than bandwidth. My first choice is generally local SSD drive, uncompressed with a nearly raw file format read single threaded single threaded.
I wrote a test for SimpleITK for testing parallel IO:
Quickly adding DICOM seems to work there for reading, but I only testing OSX.
Well okay then. I will be spending more time on it. I am grateful for all the help .
Just to finish up the unit test:
“from multiprocessing.pool import ThreadPool” are used and then individual processes are spawned using p.map().
I still read the documentation as running individual processes in parallel (https://docs.python.org/3.6/library/multiprocessing.html#multiprocessing.pool.Pool) meaning that the test dont test for it being thread safe…
As far as I know, Python’s multiprocessing module is just that - a multi processing module. The ThreadPool uses processes (not light weight threads) in order to side-step the problem of the Global Interpreter Lock, so that you may actually utilize multiple cores efficiently in Python. It’s all done using a “thread like” API, to make it easier to understand for developers. The GIL is only released on I/O operations, so without using multiple processes, Python is in effect single threaded and thread-safe in itself - and horrible at performing CPU bound work
@svg’s problem is in C++, where multiple threads can in fact run at the same time (no GIL), so it may very well be a problem within GDCM or ITK it seems to me
Whether reading the images in parallel is implemented in C++ or Python, it is important that each thread gets its own instance of the itk::ImageSeriesReader. There should be independent sets of pipeline / image / data state per series / thread. It is not necessary to lock on reader->Update().
If I read your response correct @matt.mccormick. Your saying that the code snippet included in the first comment is correct and should work with multiple threads?
One more question regarding your comment seen below. Your simply saying that each reader must read its own file (they are not allowed to read the same file), right?
SimpleITK unlocks the GIL when it enters C++ ITK code. So, you can only have one thread running python code, multiple threads can run SimpleITK code. This would very well when you have a stack of images to process concurrently.
Seems I’ve learned something new about Python’s standard library today, so that’s a good day From reading their own documentation, it sounds like processes are used.
There might be something different about your files. Perhaps they use some type of compression which is supported by a third partly library that isn’t full thread safe? Perhaps try your code with some other DICOM files?
Recap from my findings:
I am currently way past the time limit on this problem, meaning that I sadly don’t have any more time to spend on this issue so I will shortly sum up my findings.
After not being able to make multi-threaded reading work in my project I made a smaller test where I read 1 file and then wrote it to another location (I needed this in order to compare the results) 10000+ times in parallel. However, because the “itk::ImageSeriesWriter” turned out to be far from thread safe I had no luck using this approach.
In regards to the reading in parallel I had random segfaults an no luck validating that the files was correctly read in my small test, leading me to the conclusion that I am unable to parallelize this unfortunately.
If someone are reading this thread and have been successful in using C++ and multi-threaded ITK reader (and possibly writer ) I am interested in hearing you solution .