Poor affine registration results using LandmarkBasedTransformInitializer

ctorti · September 22, 2021, 4:23pm

Thanks very much for your detailed response and code. I tried running the code you provided and it seems that the only difference is that while I set up the optimizer with:

regMethod.SetOptimizerAsGradientDescent(
    learningRate=1.0, 
    numberOfIterations=500, 
    estimateLearningRate=regMethod.Once
    )

you set it up with:

regMethod.SetOptimizerAsGradientDescent(
    learningRate=1.0, 
    numberOfIterations=100, 
    convergenceMinimumValue=1e-6, 
    convergenceWindowSize=10
)

(I can see that I had not defined the values for my variables learningRate and numIters in my original post but I’ve included them explicitely in the above snippet.)

The values you set for convergenceMinimumValue and convergenceWindowSize are defaults, so my method effectively used the same values, and likewise my value for estimateLearningRate is .Once (default). So as far as I can tell our methods are identical.

I cropped the moving (CT) image rather than the fixed (MR) image - perhaps you meant to as well?

I did so by “eye”, having selected min/max indices that roughly crop movIm to fixIm's extent:

movImCropped = movIm[25:290, 115:390, 198:250]

Here is a plot of two frames from fixIm and movImCropped, as well as fixIm and movIm by comparison.

It depends on how we subjectively define a good/poor/bad/terrible registration, but based on 10 repeated runs of registering movIm to fixIm, and of movImCropped to fixIm I would summarise my results as follows:

movIm to fixIm => 1 decent result, 9 poor/bad results

movImCropped to fixIm => 4 decent, 4 poor/bad, 2 terrible

The results are here if interested.

I fully accept that 10 is not a large enough number to draw any firm conclusions, but I find it surprising that the registrations of movImCropped were just as bad, and some much worse, than movIm.

I also tried implementing the ITKv4 framework:

optimizedTx = sitk.AffineTransform(3)
    
regMethod = sitk.ImageRegistrationMethod()
regMethod.SetMovingInitialTransform(initialTx)
regMethod.SetInitialTransform(optimizedTx, inPlace=False)

... (same metric, optimizer, multi-res, interpolator settings as before)

optimizedTx = regMethod.Execute(fixIm, movIm)

finalTx = sitk.CompositeTransform(optimizedTx)
finalTx.AddTransform(initialTx)

but those results were no better (I seem to recall getting successful results using the ITKv4 framework for landmark-initialized registrations but it perhaps the datasets were different).

I played around with samplingPercentage and came to some surprising results. Using samplingPercentage = 0.5 (10 repeated runs), about 50% of the runs had terrible results, 40% were “bad”, and 10% “decent”.

Using samplingPercentage = 0.05 (10 repeated runs), 55% were decent, 45% were bad (none were terrible).

And using samplingPercentage = 0.01 (10 repeated runs), 80% were decent, 20% were poor (none were terrible).

I accept that there’s vagueness in “decent”, “poor”, “bad” and “terrible”, and not only have I not defined how I assign such scores, I didn’t use a rigorous or consistent way of assessing them. Rather I made very quick subjective assessments. Nontheless there seemed to be an inverse relationship between the sampling percentage and the quality of the registration.

So I decided to repeat the 1% runs, this time with 50 repeated runs: 66% were decent, 24% were poor, 10% were bad, none were as bad as the “terrible” results from the runs using 50% sampling.

These results seem counter-intuitive to me, since with more samples (randomly selected) I would reason that there is a greater likelihood that samples will fall within structure as well as noise, rather than just noise. But perhaps that assumption is wrong.

Can anyone comment on my findings and explain why I might have found that more samples result in worse registrations?

A final point:

I got much better (and consistent) registrations using BSplines with LandmarkBasedTransformInitializer than with affine with LandmarkBasedTransformInitializer

The BSpline method uses SetOptimizerAsLBFGS2 (when using scale factors) or SetOptimizerAsLBFGSB (when not), as opposed to SetOptimizerAsGradientDescent for the affine registration.

But they both use the same metric: SetMetricAsMattesMutualInformation (I found that for some datasets the BSpline registrations were well enough with 5% samplingPercentage, when affine registrations needed 50%).

Based on the fact that the metric used for the affine and bspline registration were the same, why should the bspline perform so much better? Might it be down to the optimizer rather than the metric?