Unexpected Behaviour in MultiStartOptimizerv4

spinicist · March 17, 2025, 11:55am

Hello, I’m trying to use the MultiStartOptimizerv4. My current code is here: riesling/cxx/riesling/merlin.cpp at merlin2 · spinicist/riesling · GitHub, apologies it is a little messy at the moment.

The results of the local optimizer within MultiStartOptimizer appears to be dependent on the order of the parameter search space, to the extent that if I pass in the same parameters twice to the search list, I get different results. Here is some logging output to demonstrate the problem:

[11:49:09] [MERLIN] Optimizer start
[11:49:09] [LOCAL ] Local optimizer start [0.000,0.000,0.000,0.000,0.000,0.000]
[11:49:09] [LOCAL ] 00 -1.269 [0.000066,-0.000003,0.000003,-0.006091,-0.063948,-0.001232]
[11:49:09] [LOCAL ] 01 -1.275 [0.000190,-0.000009,0.000007,-0.017514,-0.183217,-0.002952]
[11:49:09] [LOCAL ] 02 -1.286 [0.000418,-0.000018,0.000010,-0.038572,-0.402470,-0.004482]
[11:49:09] [LOCAL ] 03 -1.305 [0.000825,-0.000025,0.000000,-0.074768,-0.790295,-0.003310]
[11:49:09] [LOCAL ] 04 -1.333 [0.001465,-0.000007,-0.000091,-0.124831,-1.384617,-0.001261]
[11:49:09] [LOCAL ] 05 -1.356 [0.001700,0.000034,-0.000292,-0.144228,-1.582003,-0.024013]
[11:49:09] [LOCAL ] 06 -1.358 [0.001789,0.000073,-0.000493,-0.157121,-1.642256,-0.035922]
[11:49:09] [LOCAL ] 07 -1.358 [0.001894,0.000152,-0.000882,-0.177471,-1.695161,-0.051484]
[11:49:09] [LOCAL ] 08 -1.359 [0.001993,0.000311,-0.001624,-0.203723,-1.695321,-0.064578]
[11:49:09] [LOCAL ] 09 -1.360 [0.002191,0.000623,-0.002946,-0.225666,-1.700344,-0.062979]
[11:49:09] [LOCAL ] 10 -1.363 [0.003400,0.002378,-0.010487,-0.360796,-1.741483,-0.113547]
[11:49:09] [LOCAL ] 11 -1.376 [0.003477,0.002400,-0.010555,-0.343647,-1.753917,-0.123579]
[11:49:09] [LOCAL ] 12 -1.376 [0.003477,0.002400,-0.010556,-0.343623,-1.753913,-0.123591]
[11:49:09] [LOCAL ] 13 -1.376 [0.003477,0.002400,-0.010556,-0.343623,-1.753913,-0.123591]
[11:49:09] [LOCAL ] 14 -1.376 [0.003477,0.002400,-0.010556,-0.343623,-1.753913,-0.123591]
[11:49:09] [LOCAL ] Local Optimizer finished [0.003,0.002,-0.011,-0.344,-1.754,-0.124]
[11:49:09] [MERLIN] 00 -1.376 [0.003477,0.002400,-0.010555,-0.343647,-1.753917,-0.123579]
[11:49:09] [LOCAL ] Local optimizer start [0.000,0.000,0.000,0.000,0.000,0.000]
[11:49:09] [LOCAL ] 00 -1.269 [0.000000,-0.000000,0.000000,-0.000001,-0.000013,-0.000000]
[11:49:09] [LOCAL ] 01 -1.269 [0.000000,-0.000000,0.000000,-0.000004,-0.000037,-0.000001]
[11:49:09] [LOCAL ] 02 -1.269 [0.000000,-0.000000,0.000000,-0.000008,-0.000083,-0.000002]
[11:49:09] [LOCAL ] 03 -1.269 [0.000000,-0.000000,0.000000,-0.000016,-0.000169,-0.000003]
[11:49:09] [LOCAL ] 04 -1.269 [0.000000,-0.000000,0.000000,-0.000032,-0.000331,-0.000006]
[11:49:09] [LOCAL ] 05 -1.269 [0.000001,-0.000000,0.000000,-0.000061,-0.000636,-0.000012]
[11:49:09] [LOCAL ] 06 -1.269 [0.000001,-0.000000,0.000000,-0.000115,-0.001210,-0.000023]
[11:49:09] [LOCAL ] 07 -1.270 [0.000002,-0.000000,0.000000,-0.000218,-0.002291,-0.000044]
[11:49:09] [LOCAL ] 08 -1.270 [0.000004,-0.000000,0.000000,-0.000412,-0.004323,-0.000083]
[11:49:09] [LOCAL ] 09 -1.270 [0.000008,-0.000000,0.000000,-0.000776,-0.008148,-0.000155]
[11:49:09] [LOCAL ] 10 -1.270 [0.000016,-0.000001,0.000001,-0.001462,-0.015343,-0.000289]
[11:49:09] [LOCAL ] 11 -1.271 [0.000030,-0.000001,0.000001,-0.002753,-0.028871,-0.000534]
[11:49:09] [LOCAL ] 12 -1.272 [0.000056,-0.000003,0.000002,-0.005180,-0.054284,-0.000967]
[11:49:09] [LOCAL ] 13 -1.274 [0.000106,-0.000005,0.000004,-0.009740,-0.101931,-0.001689]
[11:49:09] [LOCAL ] 14 -1.279 [0.000198,-0.000009,0.000006,-0.018275,-0.190863,-0.002748]
[11:49:09] [LOCAL ] 15 -1.287 [0.000369,-0.000015,0.000009,-0.034050,-0.355163,-0.003857]
[11:49:09] [LOCAL ] 16 -1.301 [0.000679,-0.000023,0.000004,-0.061933,-0.650904,-0.003659]
[11:49:09] [LOCAL ] 17 -1.324 [0.001200,-0.000018,-0.000043,-0.104737,-1.139962,-0.000017]
[11:49:09] [LOCAL ] 18 -1.350 [0.001662,0.000022,-0.000221,-0.138398,-1.553935,-0.018904]
[11:49:10] [LOCAL ] 19 -1.358 [0.001757,0.000056,-0.000400,-0.151244,-1.622584,-0.031289]
[11:49:10] [LOCAL ] 20 -1.358 [0.001871,0.000126,-0.000753,-0.171701,-1.688802,-0.047879]
[11:49:10] [LOCAL ] 21 -1.358 [0.001972,0.000268,-0.001432,-0.198932,-1.699726,-0.063372]
[11:49:10] [LOCAL ] 22 -1.360 [0.002133,0.000555,-0.002661,-0.221931,-1.687781,-0.063059]
[11:49:10] [LOCAL ] 23 -1.363 [0.003957,0.003487,-0.015235,-0.455348,-1.673783,-0.097525]
[11:49:10] [LOCAL ] 24 -1.380 [0.003983,0.003487,-0.015224,-0.448354,-1.690245,-0.103937]
[11:49:10] [LOCAL ] 25 -1.380 [0.003993,0.003487,-0.015219,-0.445746,-1.696203,-0.106309]
[11:49:10] [LOCAL ] 26 -1.380 [0.003994,0.003487,-0.015219,-0.445702,-1.696301,-0.106348]
[11:49:10] [LOCAL ] 27 -1.380 [0.003994,0.003487,-0.015219,-0.445702,-1.696301,-0.106349]
[11:49:10] [LOCAL ] Local Optimizer finished [0.004,0.003,-0.015,-0.446,-1.696,-0.106]

Note the local optimizer starts from [0, 0, 0, 0, 0, 0] both times (as I have instructed it to), but I get wildly different results between the two runs. Is this expected? It seems like a bug to me. Is the MultiStartOptimizer not resetting some state correctly between local optimizations?

dzenanz · March 17, 2025, 12:33pm

To get different results, I assume that some different random initializations (“starts”) are used. I don’t remember using that class, so I don’t know whether that is the expected behavior.

spinicist · March 17, 2025, 12:38pm

There’s no randomness that I can see. Looks like it should be deterministic: https://github.com/InsightSoftwareConsortium/ITK/blob/master/Modules/Numerics/Optimizersv4/include/itkMultiStartOptimizerv4.hxx#L173

spinicist · March 17, 2025, 12:41pm

Furthermore: I added the repeated starting point (of 0) as an extreme example to demonstrate the problem.

How I actually noticed this was when I increased the scale of my search translation grid from (-5, 0, 5) mm to (-50, 0, 50) mm, the local optimizer failed to improve the metric for all starting points, even the (0) point.

I am sure there must be some stale state that is not being updated between starting points, the question is where?

cookpa · March 17, 2025, 2:21pm

This might not be the only problem, but are you using the Mattes metric with multiple threads? If so, there is a non-deterministic component to the metric.

spinicist · March 17, 2025, 2:55pm

That is not the problem. I re-ran with Mean Squares and see the same issue. Note that both these runs start in the same place but end up somewhere different:

[14:59:39] [LOCAL ] Local optimizer start [0.000,0.000,0.000,0.000,0.000,0.000]
[14:59:39] [LOCAL ] 00 0.097 [0.000,-0.000,-0.000,-0.002,-0.028,-0.001]
[14:59:39] [LOCAL ] 01 0.096 [0.000,-0.000,-0.000,-0.005,-0.079,-0.003]
[14:59:39] [LOCAL ] 02 0.094 [0.000,-0.000,-0.000,-0.012,-0.174,-0.007]
[14:59:39] [LOCAL ] 03 0.092 [0.000,-0.000,-0.000,-0.023,-0.342,-0.015]
[14:59:39] [LOCAL ] 04 0.087 [0.001,-0.000,-0.000,-0.045,-0.629,-0.027]
[14:59:39] [LOCAL ] 05 0.081 [0.001,-0.000,-0.000,-0.083,-1.072,-0.048]
[14:59:39] [LOCAL ] 06 0.075 [0.001,0.000,-0.000,-0.146,-1.623,-0.075]
[14:59:39] [LOCAL ] 07 0.072 [0.001,0.000,-0.000,-0.153,-1.657,-0.077]
[14:59:39] [LOCAL ] 08 0.072 [0.001,0.000,-0.000,-0.155,-1.664,-0.078]
[14:59:39] [LOCAL ] 09 0.072 [0.001,0.000,-0.000,-0.155,-1.665,-0.078]
[14:59:39] [LOCAL ] 10 0.072 [0.001,0.000,-0.000,-0.155,-1.665,-0.078]
[14:59:39] [LOCAL ] Local Optimizer finished [0.001,0.000,-0.000,-0.155,-1.665,-0.078]
[14:59:39] [MERLIN] 00 0.072 [0.001,0.000,-0.000,-0.155,-1.665,-0.078]
[14:59:39] [LOCAL ] Local optimizer start [0.000,0.000,0.000,0.000,0.000,0.000]
[14:59:39] [LOCAL ] 00 0.097 [0.000,-0.000,-0.000,-0.000,-0.002,-0.000]
[14:59:39] [LOCAL ] 01 0.097 [0.000,-0.000,-0.000,-0.000,-0.005,-0.000]
[14:59:39] [LOCAL ] 02 0.097 [0.000,-0.000,-0.000,-0.001,-0.012,-0.000]
[14:59:39] [LOCAL ] 03 0.096 [0.000,-0.000,-0.000,-0.002,-0.024,-0.001]
[14:59:39] [LOCAL ] 04 0.096 [0.000,-0.000,-0.000,-0.003,-0.046,-0.002]
[14:59:39] [LOCAL ] 05 0.095 [0.000,-0.000,-0.000,-0.006,-0.088,-0.004]
[14:59:39] [LOCAL ] 06 0.094 [0.000,-0.000,-0.000,-0.011,-0.165,-0.007]
[14:59:39] [LOCAL ] 07 0.092 [0.000,-0.000,-0.000,-0.021,-0.304,-0.013]
[14:59:39] [LOCAL ] 08 0.088 [0.000,-0.000,-0.000,-0.038,-0.544,-0.023]
[14:59:39] [LOCAL ] 09 0.083 [0.001,-0.000,-0.000,-0.070,-0.930,-0.041]
[14:59:39] [LOCAL ] 10 0.076 [0.001,0.000,-0.000,-0.124,-1.455,-0.067]
[14:59:39] [LOCAL ] 11 0.072 [0.001,0.000,-0.000,-0.155,-1.636,-0.076]
[14:59:39] [LOCAL ] 12 0.072 [0.001,0.000,-0.001,-0.160,-1.659,-0.078]
[14:59:39] [LOCAL ] 13 0.072 [0.001,0.000,-0.001,-0.161,-1.663,-0.078]
[14:59:39] [LOCAL ] 14 0.072 [0.001,0.000,-0.001,-0.161,-1.664,-0.078]
[14:59:39] [LOCAL ] 15 0.072 [0.001,0.000,-0.001,-0.161,-1.664,-0.078]
[14:59:39] [LOCAL ] Local Optimizer finished [0.001,0.000,-0.001,-0.161,-1.664,-0.078]
[14:59:39] [MERLIN] 01 0.072 [0.001,0.000,-0.001,-0.161,-1.664,-0.078]

Edit: There was a missing . in some format specifiers, now all the parameters have the same rounding

spinicist · March 17, 2025, 11:23pm

I think I have found the problem: MultiStartOptimizer is not resetting the Learning Rate before running the local optimizer for each start point.

I edited my code to run the local optimizer only and observed the same behaviour that repeated calls with the same (0) start point produced different runs and final points. I added logging to see the learning rate and observed that subsequent runs started with the learning rate of the final iteration from the previous run. Finally, I added a call to SetLearningRate() before each StartOptimization() call, and then the results became the same for each run.

The simple fix for this would be to insert similar calls in MultiStartOptimizer, however I think there’s a more fundamental fix here which would be to make the local optimizers store a StartLearningRate and CurrentLearningRate?

spinicist · March 17, 2025, 11:27pm

@cookpa I’m wondering if this explains my memory of antsAI never working as well as I hoped This is a very old memory (2016 maybe?), however I am borrowing the MultiStartOptimizer approach from antsAI and I suspect this bug has been around a very long time?

cookpa · March 18, 2025, 2:39am

It’s possible. I’ll print out the optimizer values and take a look.

I tried shuffling the inputs to the tests in itkMultiStartOptimizerv4Test.cxx and it doesn’t seem to make a difference to the results, so maybe it’s something in the metric?

I will look more.

One problem with antsAI from circa 2016 was that it used joint histogram MI instead of Mattes, while antsRegistration used Mattes for either “-m Mattes” or “-m MI”. After switching antsAI to Mattes we found that the optimization was more likely to succeed.

spinicist · March 18, 2025, 9:24am

I can’t immediately tell how those tests work (what the actual test condition is). If one of the start points is sufficiently close to the true answer, that start point will still be returned no matter the order of the start points as the metric will still be best.

The test for whether this bug occurs is if the local optimizer actually moves from the start location.

I found this problem by deliberately trying a very coarse grid of start points - the first one was so far away from the optimum that by the end the local optimizer had a stupid learning rate. This was used for all subsequent points and the local optimizer stopped doing anything.

cookpa · March 18, 2025, 3:36pm

I confirmed in antsAI that if you duplicate the start points, it produces a different result.

I cannot seem to make it consistent - I got it closer by disabling scales estimation and setting a fixed learning rate in the local optimizer. It’s closer but still slightly different for repeat runs from the same starting point.

cookpa · March 18, 2025, 3:58pm

In the ITK test, I tried duplicating the starting parameters as well as spacing them out a bit more. The local optimizations are consistent, repeated runs with the same parameters give the same result.

I think something in either the image metric, or the local optimizer (itkConjugateGradientLineSearchOptimizerv4) is retaining state.

The parameters are consistent run to run, using a fixed random seed. That is, if I call antsAI twice, I get the same results. But internally, two identical optimization start points yield different results.

spinicist · March 18, 2025, 5:06pm

Yes, I concur that retained state is involved. The local optimizer definitely retains the learning rate. In my limited tests once I set that before runs they were consistent, I didn’t see any further issues with the metric but I wouldn’t rule them out.

cookpa · March 18, 2025, 5:45pm

I agree with your conclusions. I believe the ITK tests don’t catch this because the multi-start optimizer uses itk::GradientDescentOptimizerv4Template and not a conjugate gradient as its local optimizer.

dzenanz · March 18, 2025, 6:18pm

Is there a PR to be made following this discussion?

cookpa · March 18, 2025, 7:12pm

@dzenanz I would like to modify the ConjugateGradientLineSearchOptimizerv4Template because that class changes its learning rate internally even if scales estimators are not present. Other optimizers might not be affected because they have options to set the learning rate at the first iteration or to a constant