Difference between ShrinkFactorPerLevel and Shrink

reox · June 16, 2020, 6:24am

I want to register some images, however the image size is too fine for my purposes and lower resolutions do need less memory and registration time. Resizing them to a 10th of the size should also work.

I tried now two different approaches: Shrinking the image manually using Shrink and using SetShrinkFactorsPerLevel (with a single level and sigmas set to 0) on the ImageRegistrationMethod.
I know that the latter has a little bit different meaning, however I thought that both should end up with the same results.
But instead, using Shrink runs in almost no time, while SetShrinkFactorsPerLevel required roughly 30x longer.

What I noticed is, that when SetShrinkFactorsPerLevel runs, it seems to only use a single core, while Shrink seems to run multithreaded.
I thought that SetShrinkFactorsPerLevel is a convenient wrapper, but it looks like it does something different?

I’m using SimpleITK 2.0.0rc1.post224 with ITK 5.2.

blowekamp · June 16, 2020, 1:38pm

When you say you are reducing by 10x, is that 10^2 or 10^3? Either way you are reducing the the size of the image significantly. I’m wondering if you are running out of memory during the scaling process?

You are using a simga of 0 so there is no Gaussian filtering. Note that in ITKv4 the DiscreteGaussian was used, but in ITKv5 it has been changed to the SmoothingRecursiveGaussian for significant performance improvements.

The Shrink is a quick filter but without and low-pass or bandpass filter. I generally prefer the the BinShrink as it at least uses a square kernel.

The ImageRegistrationMethod uses the ResampleImageFilter which is also multi-threaded ( and slower compared to the shrinkers without much benefit if no Gaussian smoothing is used ). Since you are observing that it appears single threaded, I suspect that there may be some memory limitations here.

If further assistance is needed, please create a minimal reproducible example with synthetic/generated images so we can test the reproducibility your observations.

reox · June 16, 2020, 4:51pm

Thank you for all the information, that was already helpful!
By 10x I meant I either use SetShrinkFactorsPerLevel([10]) or Shrink([10, 10, 10]) - I guessed that both does the same?

Here is a simple testprogram, measured on 32 cores with enough RAM.

loaded and casted images...
Runtime Shrink: 2.800898551940918
ConjugateGradientLineSearchOptimizerv4Template: Convergence checker passed at iteration 33.
(10.000000002169225, -6.000000002288125, -4.999999997720458)
WARNING: In /tmp/SimpleITK-build/ITK-prefix/include/ITK-5.2/itkCorrelationImageToImageMetricv4HelperThreader.hxx, line 85
CorrelationImageToImageMetricv4HelperThreader (0x55e29f4fa610): collected only zero points

WARNING: In /tmp/SimpleITK-build/ITK-prefix/include/ITK-5.2/itkObjectToObjectMetric.hxx, line 579
CorrelationImageToImageMetricv4 (0x55e29f0e6780): No valid points were found during metric evaluation. For image metrics, verify that the images overlap appropriately. For instance, you can align the image centers by translation. For point-set metrics, verify that the fixed points, once transformed into the virtual domain space, actually lie within the virtual domain.

Runtime Full Resolution: 2042.098801612854
ConjugateGradientLineSearchOptimizerv4Template: Convergence checker passed at iteration 35.
(9.99999974948408, -6.000008512885042, -4.999999966672995)
Runtime Laplace: 21.282774209976196
ConjugateGradientLineSearchOptimizerv4Template: Convergence checker passed at iteration 31.
(-6.421247795372761e-05, 0.00034781758850254953, 0.00033390488498277537)

The full resolution is there as a reference - what is interesting that in that case there is this warning, however it produced the same result.
Even more interesting, that my naive usage did not produced any reasonable result for the last test case!

So Shrinking is ~10 times faster than the Levels which is about 100 times faster than using the full image.

benchmark_laplace.py (2.4 KB)

I played around with the parameters, for example set the Sigma to 1:

Runtime Laplace: 27.285879135131836
ConjugateGradientLineSearchOptimizerv4Template: Convergence checker passed at iteration 34.
(10.000000001971904, -6.000000701954836, -5.000000026585222)

Okay, so a smoothing is definitely required! However, the runtime is still the same.

regarding the multi-threading: I found that this might not actually be the case, but I’m not quite sure.
What I do see is, that after sitkStartEvent there is a long time, where the cores are not fully used. Sometimes for longer periods only a single core. However, it seems to do some multi-threading also.
Then, if sitkIterationEvent is reached, I see full core usage.

blowekamp · June 19, 2020, 1:09pm

Thank you for providing the example code.

The shrink factors only effect the virtual domain to effectively reduce the number of samples for metric evaluation. The Gaussian smoothing and derivative are still performed at full resolution. Setting the smoothing sigma to 0 disables the Gaussian smoothing filter.

The gradient is still computed on the full resolution on the fixed and moving images. The gradient filtering can be turned off with:

    R.MetricUseFixedImageGradientFilterOff()
    R.MetricUseMovingImageGradientFilterOff()

However, the the gradient is still computed on a per sample basis with a neighborhood calculator. Please see the ImageToImageMetricv4 class for more details. Generally disabling the gradient filter is done in conjunction with setting an aggressively small sampling percentage.