Does SimpleITK return final transformation corresponding to the best similarity metric value or it return the final transformation obtained the last iteration?

dada · September 3, 2020, 11:34am

Hello, I am doing B-spline registration with SimpleITK, and I set MeanSquares as similarity metric. Can I ask: does SimpleITK return final transformation corresponding to the best similarity metric value or it return the final transformation obtained the last iteration?

I check the final metric value using registration_method.GetMetricValue(), and I get 94.98, however, the minimum similarity should be 58.14, as shown in the plot. How can I get the final transformation or final registration parameters corresponding to lowest similarity? Many thanks!!

zivy · September 3, 2020, 1:00pm

Hello @dada,

The registration returns the final metric value which is also the minimum value for the lowest pyramid level (highest image resolution).

Based on your graph I see that you are running a multi-resolution/multi-scale registration (2 scales). In the first phase you are using a smaller image than in the second phase, additionally that image is likely blurred compared to the second phase (smoothing_sigmas in the registration framework). This is why at higher levels of the pyramid (initial phases/lower resolution image) the error metric is most often lower than at the lower levels of the pyramid.

dada · September 3, 2020, 6:09pm

Hello zivy,

Thank you for the clear answer, can I ask 4 more questions about this topic?

I learn from the SimpleITK B-spline registration tutorial, I use the following code in my experiment:

registration_method.SetInitialTransformAsBSpline(initial_transform,
inPlace=False,
scaleFactors=[1,2,4])

registration_method.SetMetricAsMeanSquares()
registration_method.SetMetricSamplingStrategy(registration_method.RANDOM)
registration_method.SetMetricSamplingPercentage(0.01)

registration_method.SetShrinkFactorsPerLevel(shrinkFactors = [4,2,1])
registration_method.SetSmoothingSigmasPerLevel(smoothingSigmas=[2,1,0])
registration_method.SmoothingSigmasAreSpecifiedInPhysicalUnitsOn()

registration_method.SetInterpolator(sitk.sitkLinear)
registration_method.SetOptimizerAsGradientDescent(learningRate=0.001, numberOfIterations=100,
convergenceMinimumValue=1e-8, convergenceWindowSize=10)
registration_method.SetOptimizerScalesFromPhysicalShift()

Q1: I am using multi-resolution/multi-scale registration with 3 scales (b-spline grid scaleFactors=[1,2,4] together with image shrinkFactors = [4,2,1] respectively, I mean one grid style with one image resolution. ), however, the metric value plot of my last post shows the registration only performed on 2 scales?

Q2: I redo the registration experiment on the same fixed_image and moving_image using exactly the same script. And I got this:
It seems the result SSD plot is quite different compared to my last post… how can I have more reproducible result?

Q3: I record all the similarity metric values for all iterations that the registration runs, I only got 30 SSD values, but I set numberOfIterations=100, so is there an early stopping here? What does “Convergence checker passed at iteration 12” means?

Q4: It seems the registration stop at iteration 30th, does iteration 30th corresponding to obtaining minimum similarity metric value for the lowest pyramid level? If so, should the final metric value be 191? Instead, the final metric value shown as 165, which is closest to 164.8 that iteration 26th obtained. Does it means it should be iteration 26th corresponding to minimum value for the highest image resolution pyramid level? Will the final transformation take the registration result of iteration 26th?

Sorry that if my description was not clear enough. Many thanks.

zivy · September 3, 2020, 6:42pm

Hello @dada,

Q1. The graph is likely showing three levels of registration as that is what your code is specifying. Identifying the jumps between resolution levels is based on visual inspection so I was mistaken in my original assessment of number of levels. If you look at the Jupyter notebooks there is a marker plotted for each change in resolution, so it is clearer when those happen.

Q2. To get reproducible results you need to set the seed for the random number generator used in SetMetricSamplingPercentage to a constant value. Default is to use wall-clock and that is different for every run.

Q3. The numberOfIterations is an upper bound (per pyramid level). If set to 100 and you have 3 pyramid levels then you will have at most 300 iterations. If the optimizer converges earlier, then you will have less iterations. This is what you want. If it stops due to the iteration limit it means the optimizer could continue improving and was terminated too early. “Convergence checker passed at iteration 12”, means just that. The criteria for convergence were satisfied and the optimizer stops. In the case of SetOptimizerAsGradientDescent the criteria depend on the convergenceMinimumValue, and convergenceWindowSize parameters. For additional details see the original ITK documentation.

Q4. Didn’t really understand this one. The last iteration is at the lowest pyramid resolution and that has the minimal metric value for that resolution. The iteration number will not correspond to the total iteration number as the optimizer is restarted per level. So if we have two levels and in the first it ran for 50 iterations and in the second for 20 iterations, the optimizer’s reported stopping condition will be “stopped after 20 iterations” even though it did 70 iterations. Hopefully this clarifies the discrepancy between the graphs and the optimizer reporting (the graphs keep track of all iterations).

dada · September 7, 2020, 3:23pm

Hi @zivy ,
big thanks for letting me understand more. This time, I use only one grid style and one image resolution (no pyramid). The optimizer reports it converge at 50, and correspondingly I got 50 error matric values in total. The metric (SSD) value at the last iteration is 67. 57 which is shown in the plot, while the final metric value is 63.98, I was expecting the final metric value would be the same as SSD at last iteration. Is this difference because the last iteration is still under the process of optimization so it calculate the SSD with SamplingPercentage=0.01, while the final metric value calculate the SSD based on all pixels in the image? Many thanks.

zivy · September 8, 2020, 2:35pm

Hello @dada,

No, different samplings are not the reason for the difference you are seeing. The reason for the difference is that when the registration ends it does not generate an sitk.sitkIterationEvent it generates a sitk.sitkEndEvent, so the last value printed from the command observing the sitkIterationEvent is the value prior to the final value. If you also print the value when the sitkEndEvent happens you will see that it matches the value you obtain after the registration is done.

dada · September 9, 2020, 2:47pm

Hello @zivy,

yeah, I got matching metric value at last iteration and printed final one as you advise !!

When setting “registration_method.SetMetricSamplingPercentage(percentage=0.01, seed=1)” , will the same set of pixels be used for similarity metric estimation at every iteration? Thanks!

zivy · September 9, 2020, 3:54pm

Hello @dada,

Yes, the same set of points is used for each iteration at a specific resolution, the SetMetricSamplePoints method called from the InitializeRegistrationAtEachLevel method. They do change per image pyramid resolution.

dada · September 10, 2020, 2:45pm

Hello @zivy,

Thank you. Sorry that I’ve asked many because couldn’t find the answer in the itk documents or websites.
Does optimizer’s learning rate use unit of a pixel/voxel? does it possible to use learning rate decay with iterations if certain criteria meets?

when use LBFGS2 as optimizer, I got “Optimizer’s stopping condition, A rounding error occurred” , what does rounding error means? thanks.

zivy · September 10, 2020, 3:51pm

Hello @dada,

Learning rate is unit-less, it scales the gradient direction. With respect to decay or other strategies for using an “appropriate” learning rate, all gradient based optimizers have a estimateLearningRate parameter which is an EstimateLearningRateType. This can be never, once, or for each iteration. That is the level of control SimpleITK exposes for the learning rate.

If you feel you need more manual control you can obviously run multiple registrations one after the other setting the learning rate per registration and initializing with the prior registration result (not recommended).

Rounding error means just that, computation issues. As a side note, the LBFGS2 is primarily intended for optimizing the BSplineTransform, so if you are working with any of the other transform types we recommend that you use one of the gradient based optimizers.

dada · September 25, 2020, 5:12pm

Hello @zivy,

Thanks a lot for helping me to move on with my experiment.