Thank you for all the information, that was already helpful!
By 10x I meant I either use SetShrinkFactorsPerLevel([10])
or Shrink([10, 10, 10])
- I guessed that both does the same?
Here is a simple testprogram, measured on 32 cores with enough RAM.
loaded and casted images...
Runtime Shrink: 2.800898551940918
ConjugateGradientLineSearchOptimizerv4Template: Convergence checker passed at iteration 33.
(10.000000002169225, -6.000000002288125, -4.999999997720458)
WARNING: In /tmp/SimpleITK-build/ITK-prefix/include/ITK-5.2/itkCorrelationImageToImageMetricv4HelperThreader.hxx, line 85
CorrelationImageToImageMetricv4HelperThreader (0x55e29f4fa610): collected only zero points
WARNING: In /tmp/SimpleITK-build/ITK-prefix/include/ITK-5.2/itkObjectToObjectMetric.hxx, line 579
CorrelationImageToImageMetricv4 (0x55e29f0e6780): No valid points were found during metric evaluation. For image metrics, verify that the images overlap appropriately. For instance, you can align the image centers by translation. For point-set metrics, verify that the fixed points, once transformed into the virtual domain space, actually lie within the virtual domain.
Runtime Full Resolution: 2042.098801612854
ConjugateGradientLineSearchOptimizerv4Template: Convergence checker passed at iteration 35.
(9.99999974948408, -6.000008512885042, -4.999999966672995)
Runtime Laplace: 21.282774209976196
ConjugateGradientLineSearchOptimizerv4Template: Convergence checker passed at iteration 31.
(-6.421247795372761e-05, 0.00034781758850254953, 0.00033390488498277537)
The full resolution is there as a reference - what is interesting that in that case there is this warning, however it produced the same result.
Even more interesting, that my naive usage did not produced any reasonable result for the last test case!
So Shrinking is ~10 times faster than the Levels which is about 100 times faster than using the full image.
benchmark_laplace.py (2.4 KB)
I played around with the parameters, for example set the Sigma to 1:
Runtime Laplace: 27.285879135131836
ConjugateGradientLineSearchOptimizerv4Template: Convergence checker passed at iteration 34.
(10.000000001971904, -6.000000701954836, -5.000000026585222)
Okay, so a smoothing is definitely required! However, the runtime is still the same.
regarding the multi-threading: I found that this might not actually be the case, but I’m not quite sure.
What I do see is, that after sitkStartEvent
there is a long time, where the cores are not fully used. Sometimes for longer periods only a single core. However, it seems to do some multi-threading also.
Then, if sitkIterationEvent
is reached, I see full core usage.