Unexpected Behaviour in MultiStartOptimizerv4

Hello, I’m trying to use the MultiStartOptimizerv4. My current code is here: riesling/cxx/riesling/merlin.cpp at merlin2 · spinicist/riesling · GitHub, apologies it is a little messy at the moment.

The results of the local optimizer within MultiStartOptimizer appears to be dependent on the order of the parameter search space, to the extent that if I pass in the same parameters twice to the search list, I get different results. Here is some logging output to demonstrate the problem:

[11:49:09] [MERLIN] Optimizer start
[11:49:09] [LOCAL ] Local optimizer start [0.000,0.000,0.000,0.000,0.000,0.000]
[11:49:09] [LOCAL ] 00 -1.269 [0.000066,-0.000003,0.000003,-0.006091,-0.063948,-0.001232]
[11:49:09] [LOCAL ] 01 -1.275 [0.000190,-0.000009,0.000007,-0.017514,-0.183217,-0.002952]
[11:49:09] [LOCAL ] 02 -1.286 [0.000418,-0.000018,0.000010,-0.038572,-0.402470,-0.004482]
[11:49:09] [LOCAL ] 03 -1.305 [0.000825,-0.000025,0.000000,-0.074768,-0.790295,-0.003310]
[11:49:09] [LOCAL ] 04 -1.333 [0.001465,-0.000007,-0.000091,-0.124831,-1.384617,-0.001261]
[11:49:09] [LOCAL ] 05 -1.356 [0.001700,0.000034,-0.000292,-0.144228,-1.582003,-0.024013]
[11:49:09] [LOCAL ] 06 -1.358 [0.001789,0.000073,-0.000493,-0.157121,-1.642256,-0.035922]
[11:49:09] [LOCAL ] 07 -1.358 [0.001894,0.000152,-0.000882,-0.177471,-1.695161,-0.051484]
[11:49:09] [LOCAL ] 08 -1.359 [0.001993,0.000311,-0.001624,-0.203723,-1.695321,-0.064578]
[11:49:09] [LOCAL ] 09 -1.360 [0.002191,0.000623,-0.002946,-0.225666,-1.700344,-0.062979]
[11:49:09] [LOCAL ] 10 -1.363 [0.003400,0.002378,-0.010487,-0.360796,-1.741483,-0.113547]
[11:49:09] [LOCAL ] 11 -1.376 [0.003477,0.002400,-0.010555,-0.343647,-1.753917,-0.123579]
[11:49:09] [LOCAL ] 12 -1.376 [0.003477,0.002400,-0.010556,-0.343623,-1.753913,-0.123591]
[11:49:09] [LOCAL ] 13 -1.376 [0.003477,0.002400,-0.010556,-0.343623,-1.753913,-0.123591]
[11:49:09] [LOCAL ] 14 -1.376 [0.003477,0.002400,-0.010556,-0.343623,-1.753913,-0.123591]
[11:49:09] [LOCAL ] Local Optimizer finished [0.003,0.002,-0.011,-0.344,-1.754,-0.124]
[11:49:09] [MERLIN] 00 -1.376 [0.003477,0.002400,-0.010555,-0.343647,-1.753917,-0.123579]
[11:49:09] [LOCAL ] Local optimizer start [0.000,0.000,0.000,0.000,0.000,0.000]
[11:49:09] [LOCAL ] 00 -1.269 [0.000000,-0.000000,0.000000,-0.000001,-0.000013,-0.000000]
[11:49:09] [LOCAL ] 01 -1.269 [0.000000,-0.000000,0.000000,-0.000004,-0.000037,-0.000001]
[11:49:09] [LOCAL ] 02 -1.269 [0.000000,-0.000000,0.000000,-0.000008,-0.000083,-0.000002]
[11:49:09] [LOCAL ] 03 -1.269 [0.000000,-0.000000,0.000000,-0.000016,-0.000169,-0.000003]
[11:49:09] [LOCAL ] 04 -1.269 [0.000000,-0.000000,0.000000,-0.000032,-0.000331,-0.000006]
[11:49:09] [LOCAL ] 05 -1.269 [0.000001,-0.000000,0.000000,-0.000061,-0.000636,-0.000012]
[11:49:09] [LOCAL ] 06 -1.269 [0.000001,-0.000000,0.000000,-0.000115,-0.001210,-0.000023]
[11:49:09] [LOCAL ] 07 -1.270 [0.000002,-0.000000,0.000000,-0.000218,-0.002291,-0.000044]
[11:49:09] [LOCAL ] 08 -1.270 [0.000004,-0.000000,0.000000,-0.000412,-0.004323,-0.000083]
[11:49:09] [LOCAL ] 09 -1.270 [0.000008,-0.000000,0.000000,-0.000776,-0.008148,-0.000155]
[11:49:09] [LOCAL ] 10 -1.270 [0.000016,-0.000001,0.000001,-0.001462,-0.015343,-0.000289]
[11:49:09] [LOCAL ] 11 -1.271 [0.000030,-0.000001,0.000001,-0.002753,-0.028871,-0.000534]
[11:49:09] [LOCAL ] 12 -1.272 [0.000056,-0.000003,0.000002,-0.005180,-0.054284,-0.000967]
[11:49:09] [LOCAL ] 13 -1.274 [0.000106,-0.000005,0.000004,-0.009740,-0.101931,-0.001689]
[11:49:09] [LOCAL ] 14 -1.279 [0.000198,-0.000009,0.000006,-0.018275,-0.190863,-0.002748]
[11:49:09] [LOCAL ] 15 -1.287 [0.000369,-0.000015,0.000009,-0.034050,-0.355163,-0.003857]
[11:49:09] [LOCAL ] 16 -1.301 [0.000679,-0.000023,0.000004,-0.061933,-0.650904,-0.003659]
[11:49:09] [LOCAL ] 17 -1.324 [0.001200,-0.000018,-0.000043,-0.104737,-1.139962,-0.000017]
[11:49:09] [LOCAL ] 18 -1.350 [0.001662,0.000022,-0.000221,-0.138398,-1.553935,-0.018904]
[11:49:10] [LOCAL ] 19 -1.358 [0.001757,0.000056,-0.000400,-0.151244,-1.622584,-0.031289]
[11:49:10] [LOCAL ] 20 -1.358 [0.001871,0.000126,-0.000753,-0.171701,-1.688802,-0.047879]
[11:49:10] [LOCAL ] 21 -1.358 [0.001972,0.000268,-0.001432,-0.198932,-1.699726,-0.063372]
[11:49:10] [LOCAL ] 22 -1.360 [0.002133,0.000555,-0.002661,-0.221931,-1.687781,-0.063059]
[11:49:10] [LOCAL ] 23 -1.363 [0.003957,0.003487,-0.015235,-0.455348,-1.673783,-0.097525]
[11:49:10] [LOCAL ] 24 -1.380 [0.003983,0.003487,-0.015224,-0.448354,-1.690245,-0.103937]
[11:49:10] [LOCAL ] 25 -1.380 [0.003993,0.003487,-0.015219,-0.445746,-1.696203,-0.106309]
[11:49:10] [LOCAL ] 26 -1.380 [0.003994,0.003487,-0.015219,-0.445702,-1.696301,-0.106348]
[11:49:10] [LOCAL ] 27 -1.380 [0.003994,0.003487,-0.015219,-0.445702,-1.696301,-0.106349]
[11:49:10] [LOCAL ] Local Optimizer finished [0.004,0.003,-0.015,-0.446,-1.696,-0.106]

Note the local optimizer starts from [0, 0, 0, 0, 0, 0] both times (as I have instructed it to), but I get wildly different results between the two runs. Is this expected? It seems like a bug to me. Is the MultiStartOptimizer not resetting some state correctly between local optimizations?

To get different results, I assume that some different random initializations (“starts”) are used. I don’t remember using that class, so I don’t know whether that is the expected behavior.

There’s no randomness that I can see. Looks like it should be deterministic: https://github.com/InsightSoftwareConsortium/ITK/blob/master/Modules/Numerics/Optimizersv4/include/itkMultiStartOptimizerv4.hxx#L173

Furthermore: I added the repeated starting point (of 0) as an extreme example to demonstrate the problem.

How I actually noticed this was when I increased the scale of my search translation grid from (-5, 0, 5) mm to (-50, 0, 50) mm, the local optimizer failed to improve the metric for all starting points, even the (0) point.

I am sure there must be some stale state that is not being updated between starting points, the question is where?

1 Like

This might not be the only problem, but are you using the Mattes metric with multiple threads? If so, there is a non-deterministic component to the metric.

1 Like

That is not the problem. I re-ran with Mean Squares and see the same issue. Note that both these runs start in the same place but end up somewhere different:

[14:59:39] [LOCAL ] Local optimizer start [0.000,0.000,0.000,0.000,0.000,0.000]
[14:59:39] [LOCAL ] 00 0.097 [0.000,-0.000,-0.000,-0.002,-0.028,-0.001]
[14:59:39] [LOCAL ] 01 0.096 [0.000,-0.000,-0.000,-0.005,-0.079,-0.003]
[14:59:39] [LOCAL ] 02 0.094 [0.000,-0.000,-0.000,-0.012,-0.174,-0.007]
[14:59:39] [LOCAL ] 03 0.092 [0.000,-0.000,-0.000,-0.023,-0.342,-0.015]
[14:59:39] [LOCAL ] 04 0.087 [0.001,-0.000,-0.000,-0.045,-0.629,-0.027]
[14:59:39] [LOCAL ] 05 0.081 [0.001,-0.000,-0.000,-0.083,-1.072,-0.048]
[14:59:39] [LOCAL ] 06 0.075 [0.001,0.000,-0.000,-0.146,-1.623,-0.075]
[14:59:39] [LOCAL ] 07 0.072 [0.001,0.000,-0.000,-0.153,-1.657,-0.077]
[14:59:39] [LOCAL ] 08 0.072 [0.001,0.000,-0.000,-0.155,-1.664,-0.078]
[14:59:39] [LOCAL ] 09 0.072 [0.001,0.000,-0.000,-0.155,-1.665,-0.078]
[14:59:39] [LOCAL ] 10 0.072 [0.001,0.000,-0.000,-0.155,-1.665,-0.078]
[14:59:39] [LOCAL ] Local Optimizer finished [0.001,0.000,-0.000,-0.155,-1.665,-0.078]
[14:59:39] [MERLIN] 00 0.072 [0.001,0.000,-0.000,-0.155,-1.665,-0.078]
[14:59:39] [LOCAL ] Local optimizer start [0.000,0.000,0.000,0.000,0.000,0.000]
[14:59:39] [LOCAL ] 00 0.097 [0.000,-0.000,-0.000,-0.000,-0.002,-0.000]
[14:59:39] [LOCAL ] 01 0.097 [0.000,-0.000,-0.000,-0.000,-0.005,-0.000]
[14:59:39] [LOCAL ] 02 0.097 [0.000,-0.000,-0.000,-0.001,-0.012,-0.000]
[14:59:39] [LOCAL ] 03 0.096 [0.000,-0.000,-0.000,-0.002,-0.024,-0.001]
[14:59:39] [LOCAL ] 04 0.096 [0.000,-0.000,-0.000,-0.003,-0.046,-0.002]
[14:59:39] [LOCAL ] 05 0.095 [0.000,-0.000,-0.000,-0.006,-0.088,-0.004]
[14:59:39] [LOCAL ] 06 0.094 [0.000,-0.000,-0.000,-0.011,-0.165,-0.007]
[14:59:39] [LOCAL ] 07 0.092 [0.000,-0.000,-0.000,-0.021,-0.304,-0.013]
[14:59:39] [LOCAL ] 08 0.088 [0.000,-0.000,-0.000,-0.038,-0.544,-0.023]
[14:59:39] [LOCAL ] 09 0.083 [0.001,-0.000,-0.000,-0.070,-0.930,-0.041]
[14:59:39] [LOCAL ] 10 0.076 [0.001,0.000,-0.000,-0.124,-1.455,-0.067]
[14:59:39] [LOCAL ] 11 0.072 [0.001,0.000,-0.000,-0.155,-1.636,-0.076]
[14:59:39] [LOCAL ] 12 0.072 [0.001,0.000,-0.001,-0.160,-1.659,-0.078]
[14:59:39] [LOCAL ] 13 0.072 [0.001,0.000,-0.001,-0.161,-1.663,-0.078]
[14:59:39] [LOCAL ] 14 0.072 [0.001,0.000,-0.001,-0.161,-1.664,-0.078]
[14:59:39] [LOCAL ] 15 0.072 [0.001,0.000,-0.001,-0.161,-1.664,-0.078]
[14:59:39] [LOCAL ] Local Optimizer finished [0.001,0.000,-0.001,-0.161,-1.664,-0.078]
[14:59:39] [MERLIN] 01 0.072 [0.001,0.000,-0.001,-0.161,-1.664,-0.078]

Edit: There was a missing . in some format specifiers, now all the parameters have the same rounding

2 Likes

I think I have found the problem: MultiStartOptimizer is not resetting the Learning Rate before running the local optimizer for each start point.

I edited my code to run the local optimizer only and observed the same behaviour that repeated calls with the same (0) start point produced different runs and final points. I added logging to see the learning rate and observed that subsequent runs started with the learning rate of the final iteration from the previous run. Finally, I added a call to SetLearningRate() before each StartOptimization() call, and then the results became the same for each run.

The simple fix for this would be to insert similar calls in MultiStartOptimizer, however I think there’s a more fundamental fix here which would be to make the local optimizers store a StartLearningRate and CurrentLearningRate?

1 Like

@cookpa I’m wondering if this explains my memory of antsAI never working as well as I hoped :sweat_smile: This is a very old memory (2016 maybe?), however I am borrowing the MultiStartOptimizer approach from antsAI and I suspect this bug has been around a very long time?

It’s possible. I’ll print out the optimizer values and take a look.

I tried shuffling the inputs to the tests in itkMultiStartOptimizerv4Test.cxx and it doesn’t seem to make a difference to the results, so maybe it’s something in the metric?

I will look more.

One problem with antsAI from circa 2016 was that it used joint histogram MI instead of Mattes, while antsRegistration used Mattes for either “-m Mattes” or “-m MI”. After switching antsAI to Mattes we found that the optimization was more likely to succeed.

I can’t immediately tell how those tests work (what the actual test condition is). If one of the start points is sufficiently close to the true answer, that start point will still be returned no matter the order of the start points as the metric will still be best.

The test for whether this bug occurs is if the local optimizer actually moves from the start location.

I found this problem by deliberately trying a very coarse grid of start points - the first one was so far away from the optimum that by the end the local optimizer had a stupid learning rate. This was used for all subsequent points and the local optimizer stopped doing anything.

I confirmed in antsAI that if you duplicate the start points, it produces a different result.

I cannot seem to make it consistent - I got it closer by disabling scales estimation and setting a fixed learning rate in the local optimizer. It’s closer but still slightly different for repeat runs from the same starting point.

In the ITK test, I tried duplicating the starting parameters as well as spacing them out a bit more. The local optimizations are consistent, repeated runs with the same parameters give the same result.

I think something in either the image metric, or the local optimizer (itkConjugateGradientLineSearchOptimizerv4) is retaining state.

The parameters are consistent run to run, using a fixed random seed. That is, if I call antsAI twice, I get the same results. But internally, two identical optimization start points yield different results.

Yes, I concur that retained state is involved. The local optimizer definitely retains the learning rate. In my limited tests once I set that before runs they were consistent, I didn’t see any further issues with the metric but I wouldn’t rule them out.

I agree with your conclusions. I believe the ITK tests don’t catch this because the multi-start optimizer uses itk::GradientDescentOptimizerv4Template and not a conjugate gradient as its local optimizer.

1 Like

Is there a PR to be made following this discussion?

@dzenanz I would like to modify the ConjugateGradientLineSearchOptimizerv4Template because that class changes its learning rate internally even if scales estimators are not present. Other optimizers might not be affected because they have options to set the learning rate at the first iteration or to a constant

1 Like