Parallelize itk::ImageSeriesReader (ITK 4.12)

@mathieu.malaterre You are the authority on GDCM-related questions!

Sadly, most (maybe even all) compression/decompression algorithms we use are single threaded. If the input is compressed in any way, reading it in multiple threads is super-helpful. I guess that’s what he is trying to do?

Thank you for the fast response :slight_smile:
Let me clarify a bit: My case is a bit abnormal, because multiple 3D/4D images are read in order for the headers and pixel data to be parsed and then written to a “datastore”. This is needed because our viewer needs to be able to get one slice (file) at a time. In the case where ITK processing is needed the 3D/4D image is readed ones again but now from the “datastore”.

The error is random segfaults or incomplete files. Hence why I think it has something to do with unsafe threading.

Note: I am amazed by your response times :smiley:

I have also found it helpful in some cases to read images in parallel from network file systems, when latency is more the bottle neck than bandwidth. My first choice is generally local SSD drive, uncompressed with a nearly raw file format read single threaded single threaded.

I wrote a test for SimpleITK for testing parallel IO:

Quickly adding DICOM seems to work there for reading, but I only testing OSX.

If I am correct this only tests for multi processes an not multi threading (https://docs.python.org/2/library/multiprocessing.html). By spawning multiple processes in python e.g. static objects are not shared.

Currently my setup is written in C++ using a single process with multiple threads.

The ThreadPool uses light weight threads:
https://uwpce-pythoncert.github.io/SystemDevelopment/threading-multiprocessing.html#threadpool

@blowekamp could you elaborate? I am not sure that I understand your logic. I am using a threadpool and not a processing pool.

My code uses threads too.

I think that conclusion here is that there is probably a bug in your code @svg. There seem to be no blockers for what you want to do.

Well okay then. I will be spending more time on it. I am grateful for all the help :slight_smile:.

Just to finish up the unit test:
“from multiprocessing.pool import ThreadPool” are used and then individual processes are spawned using p.map().
I still read the documentation as running individual processes in parallel (https://docs.python.org/3.6/library/multiprocessing.html#multiprocessing.pool.Pool) meaning that the test dont test for it being thread safe…

As far as I know, Python’s multiprocessing module is just that - a multi processing module. The ThreadPool uses processes (not light weight threads) in order to side-step the problem of the Global Interpreter Lock, so that you may actually utilize multiple cores efficiently in Python. It’s all done using a “thread like” API, to make it easier to understand for developers. The GIL is only released on I/O operations, so without using multiple processes, Python is in effect single threaded and thread-safe in itself - and horrible at performing CPU bound work :slight_smile:

@svg’s problem is in C++, where multiple threads can in fact run at the same time (no GIL), so it may very well be a problem within GDCM or ITK it seems to me :thinking:

1 Like

Welcome to ITK, Simon!

Whether reading the images in parallel is implemented in C++ or Python, it is important that each thread gets its own instance of the itk::ImageSeriesReader. There should be independent sets of pipeline / image / data state per series / thread. It is not necessary to lock on reader->Update().

I hope this helps.

1 Like

If I read your response correct @matt.mccormick. Your saying that the code snippet included in the first comment is correct and should work with multiple threads?

Create incense of reader:

typedef itk::ImageSeriesReader< FloatImageType4D > ReaderType;
ReaderType::Pointer reader = ReaderType::New();

Create instance of ImageIO:

typedef itk::GDCMImageIO ImageIOType;
ImageIOType::Pointer dicomIO = ImageIOType::New();

One more question :slight_smile: regarding your comment seen below. Your simply saying that each reader must read its own file (they are not allowed to read the same file), right?

Have a great day and thank you for replying :slight_smile:

Hello,

Thank you for clarifying that :+1:

Seems I’ve learned something new about Python’s standard library today, so that’s a good day :slight_smile: From reading their own documentation, it sounds like processes are used.

1 Like

It is not necessarily correct or incorrect – it depends how it is called. It should be fine if a new reader created in every thread.

Multiple readers could read the same file.

There might be something different about your files. Perhaps they use some type of compression which is supported by a third partly library that isn’t full thread safe? Perhaps try your code with some other DICOM files?

1 Like

Recap from my findings:
I am currently way past the time limit on this problem, meaning that I sadly don’t have any more time to spend on this issue so I will shortly sum up my findings.
After not being able to make multi-threaded reading work in my project I made a smaller test where I read 1 file and then wrote it to another location (I needed this in order to compare the results) 10000+ times in parallel. However, because the “itk::ImageSeriesWriter” turned out to be far from thread safe I had no luck using this approach.
In regards to the reading in parallel I had random segfaults an no luck validating that the files was correctly read in my small test, leading me to the conclusion that I am unable to parallelize this unfortunately.

If someone are reading this thread and have been successful in using C++ and multi-threaded ITK reader (and possibly writer ) I am interested in hearing you solution :grinning: .

To all that have tried to help, I say thank you :+1: :slightly_smiling_face:

Recipe for concurrent disaster:

ReaderType::Pointer reader = ReaderType::New(); // global reader instance
void threadFunction()
{
  unsigned i = getMyIndexSomehow();
  reader->SetFileName(filenames[i]);
  reader->Update();
  reader->GetOutput();
}

but this should work:

void threadFunction()
{
  unsigned i = getMyIndexSomehow();
  ReaderType::Pointer reader = ReaderType::New(); //thread-local reader instance
  reader->SetFileName(filenames[i]);
  reader->Update();
  reader->GetOutput();
}

If you took the approach that should work, please share the code of your small example so we can diagnose the problem.

Of course I did not create the reader globally.

I created a worker class, when spawned workers and gave them to a thread pool. Here is an example of how I did in QT:

class Worker : public QRunnable
{
private:
    QString mFilePath;

public:
    Worker(QString filePath)
    {
        mFilePath = filePath;

    }

    void run()
    {

            ReaderType::Pointer reader = ReaderType::New();

            typedef itk::GDCMImageIO       ImageIOType;

            ImageIOType::Pointer dicomIO = ImageIOType::New();

            dicomIO->SetLoadPrivateTags(true);
            dicomIO->SetKeepOriginalUID(true);

            reader->SetImageIO( dicomIO );

            std::vector<std::string> fileNames;
            fileNames.push_back(mFilePath.toStdString());

            reader->SetFileNames( fileNames );

            try {
                reader->Update();
            }
            catch(itk::ExceptionObject& err) {
                qDebug() << mFilePath;
                qDebug() << err.GetDescription();
                return;
            }


            // Write file
            typedef itk::ImageSeriesWriter<FloatImageType4D, FloatImageType2D>  SeriesWriterType;

            SeriesWriterType::Pointer seriesWriter = SeriesWriterType::New();

            seriesWriter->SetImageIO( reader->GetImageIO() );

            QString writePath = "/mnt/e357c8aa-9d7c-4668-bac6-fc6b286ff9f6/dicomfiles/output/";
            QString fileName = QFileInfo(QFile(mFilePath)).fileName();

            std::vector<std::string> outputFileNames;
            outputFileNames.push_back(QString(writePath + fileName).toStdString());
            seriesWriter->SetFileNames(outputFileNames);

            seriesWriter->SetInput(reader->GetOutput());

            try
            {
                seriesWriter->SetDebug(true);
                seriesWriter->Update();
            }
            catch( itk::ExceptionObject & excp )
            {
                qDebug() << writePath + fileName;
                qDebug() << excp.GetDescription();
                return;
            }
    }
};

Please note that this code has not been run, I just made it from my test code that had several modifications from my testing…

Since you are reading/writing just one file at a time, why do you use Series reader/writer instead of the plain reader/writer? And if you are just copying, why not use fread/fwrite or fstream read/write?

Still, I don’t quite see a problem with this code. To check whether it is a bug in the rest of the code, can you replace ITK-specific code by fread/fwrite or fstream read/write? If it is still problematic, the problem is in your code.