Image Registration with Pseudo-Random Sampling throws exception randomly (maybe bug)

registration

#1

Hi all,

It will take a lot of words to describe the issue. But I think it might be a bug.
Thank you for your time reading this.

Let me first specify the functions I used.
I’m doing image registration, translation first, then B-Spline.
Before using sampling, everything works fine.
In order to speed up the process, I use stochastic gradient descent.
by adding the following lines

    registration_->SetMetricSamplingPercentage(0.1);
    registration_->SetMetricSamplingStrategy(RegistrationType::RANDOM);
    registration_->MetricSamplingReinitializeSeed(1);// if no para : random (take time as seed)
    registration_->InPlaceOn();

between SetInitialTransform(transform_); and Update();

If there’s any problem with the code, please let me know.
If not, here’s the long story.

What I’m working on is doing pairwise registration on all image pairs of one video (~30000 frames).
I would like to reproduce the result, therefore I use “MetricSamplingReinitializeSeed(1)”
After test, the result is reproducible so I believe I’m using correct code.

However, the strange thing is, when I tried to process the same video 8 times in a single execution of program
It always throws exception at the 4th and 7th trial, but at different frame index. (this “semi-reproducible” behavior really confused me)
By tracing the dump file and also reading the exception message, there are 2 lines of code where exception occured.

  1. itkObjectToObjectMetric.hxx, line 425. which means the ObjectToObjectMetric::m_VirtualImage is null.
    If I understand correctly, this line (or this function) is for dense sampling, which means in principle, it should not be called.
  2. itkImageToImageMetricv4.hxx, line 532. which means ImageToImageMetricv4::m_FixedSampledPointSet is null.
    This function is in normal function call routine. m_FixedSampledPointSet should not be null

Since most of the time, the result is reproducible, the only reason I know for these kinds of exception (wrong function call, pointer becomes null) is some part of the code writes to where it should not write (like out of array boundary)

Things are more complicated since I used 44 threads. Each thread calls its own itk registration,(for example, thread 0 processing frame 1~800, thread 2, 801~1600, … ,etc.)
Inside each thread, I have set
itk::MultiThreader::SetGlobalMaximumNumberOfThreads(1);
So there should not be multi-threading inside itk registration object

Since everything is fine before using the sampling method.
Either I used the function in a wrong way, or there might be some bug in these function.
(or it’s my other code having bug, but didn’t affect anything before using sampling)

I’m also trying to check the source code, I will report if I found any clue.


(Dženan Zukić) #2

Which version of ITK are you using, 4.13.1 or some recent master?@blowekamp made some changes to virtual image part of registration framework, so he might know more. See the discussion.

Since you are dividing your overall video by frames, it is possible that the crashing can occur in each segment of frames, and which one is encountered first depends on operating system’s scheduling of threads. For example, if the crash is on index 5, it means thread 0 reached its crashing point first. If the crash is on index 1605, it means thread 2 reached its crashing point first.


#3

Thanks @dzenanz!
I’m using 4.12.2. I have tried to update to 4.13.1 but it didn’t solve the problem.
I’ll look into the discussion you mentioned.

About your comment on threads, I think it’s not the case.
This is just more details.
I handled the exception so the code won’t crash and will continue the registration of next pair.
Which means I can (and I do) record in which frames the exception occurred.
It differed at each tried, (roughly 5~10 exception in one video, but again, just in the 4th and 7th trial).
The exception might be triggered in the same thread, and might be not.
Even if it was in the same thread, the frame count differs.


(Dženan Zukić) #4

Some work has been done to improve reproducibility in ITK, so it makes sense to try ITK 5.0 beta 3 which is the most recent pre-release. If the problem is still not deterministic, that would be a bug in either library or your code. If it is deterministic, it will allow you to set conditional breakpoints based on frame index etc allowing easier debugging.


(Bradley Lowekamp) #5

I have not looked deep into this, but it seem like you have a complicated registration system needed to create this issue. I have a couple thoughts:

  • You are doing all the pair wise registrations in one process? or do you create a new process for each registration?
  • Are you reusing any registration objects between registrations? I suggest creating all new ones for each registration
  • Creating a minimal sharable and reproducible example will enable others to track down the bug.

#6

Thanks @dzenanz, @blowekamp!

dzenanz,
I did tried update to ITK 5.0 beta 3. The exception is still present (In trail 6 instead of trial 4&7 though)

blowekamp,
Answering your questions

  • I’m doing all the pairwise registrations in one process.
  • I create new registration objects for each image pair.
  • I will tried

#7

Hi,

I’ve finally come to a minimum shareable and reproducible example
I put it on github


Please let me know if there’s any problem or you need any other information


(Bradley Lowekamp) #8

Great! Thanks for taking the time to create that!

It says that you are using a system with 44 cores. Any idea if it is reproducible with fewer?

Have you tried or been able to reproduce it on Linux/Mac?

I don’t think I can get a system with that many cores on windows. So looking for alternative ways to repoduce.


#9

I haven’t try it on Linux/Mac.
I have a 8 core Windows machine.
I will test on that one.
(By the way, I know single thread process runs fine without exceptions)

Is there any other thing I can do?


#10

It’s reproducible on my Windows10 PC with 8 core (using 8 threads)
But the occurrence rate might be lower.
In one test, the exception occur at 52xxx, 79xxx


(Bradley Lowekamp) #11

This is great info! I think with the information you provided anyone can try to tackle this problem!

I am not sure when I’ll have time and the proper system, but this issue is very interesting to me.


#12

Thanks! What about the coding itself? Am I using ITK library properly?