Poor affine registration results using LandmarkBasedTransformInitializer

Hi all,

I’m getting bad affine registration results for certain combinations of moving and fixed images.

To rule out the possibility of poor initial alignment, I considered affine and bspline CT-to-MR registrations using sitk.LandmarkBasedTransformInitializer (for affine) or sitk.BSplineTransformInitializer. While I get good bspline registrations, I get very poor affine registrations, despite using the same fiducial points for both.

Following is the method used to perform the affine registration:

affineTx = sitk.AffineTransform(fixIm.GetDimension())

initialTx= sitk.LandmarkBasedTransformInitializer(
    transform=affineTx, fixedLandmarks=fixPts, 
    movingLandmarks=movPts, referenceImage=fixIm

initialTx= sitk.AffineTransform(initialTx)

regMethod = sitk.ImageRegistrationMethod()

regMethod.SetInitialTransform(initialTx, inPlace=False)
# Similarity metric settings:
samplingPercentage = 0.5
regMethod.SetMetricSamplingStrategy(regMethod.RANDOM) # 08/09

# Setup for the multi-resolution framework:
shrinkFactors = [4,2,1]
smoothingSigmas = [2,1,1]

# Interpolator settings:

# Optimizer settings:

finalTx = regMethod.Execute(fixIm, movIm)

regIm = sitk.Resample(
    movIm, fixIm, finalTx, sitk.sitkLinear, 0.0, movIm.GetPixelID()

I’ve left out the method I use for the deformable registration since that one isn’t causing me problems, but I would be happy to share if it helped.

Plots of Metric v Iterations and the difference between fixed and registered image at a mid-stack slice for 7 repeated bspline runs can be found here, and for 5 repeated affine runs here.

When comparing the Metric v Iteration curves of the bspline v affine results, it’s evident that they look very different. Interestingly 1 out of 5 repeated affine runs produced a decent result, and although the metric plot looks very different from the bspline plots, it also looks very different from the other 4 (bad) affine plots.

Interestingly changing the MetricSamplingStrategy to regMethod.NONE (in the affine method above) produced a much worse result than the one reasonably good result using regMethod.RANDOM with samplingPercentage=0.5.

Note that for the bspline registration I only need samplingPercentage=0.05 to get quite decent results.

My questions are:

  1. Why would the affine registrations fail using the same fiducials that were used for bspline registrations, especially given that I use 50% sampling for affine v 5% for bspline?

  2. Why would the affine registration result using all voxels produce a worse result than one obtained using 50% samples?

  3. Are there any suggested changes to the affine registration method to help improve the results?

Following are links to HDF files of the moving and fixed images.

Thank you.

Update 08/09/21:

Since I used sitk.BSplineTransformInitializer for the bspline registration and sitk.LandmarkBasedTransformInitializer for the affine, I wondered if something went wrong with the latter, resulting in a poor initial alignment for the affine registration.

I checked the resampled moving image to fixed image using initialTx and can confirm that the aligned image looks good - in fact much better than the affine registered results.

So it seems that initial alignment can be ruled out as a possible culprit. I guess the metric settings should be fine (MattesMutualInformation is needed for multi-modal images) - besides, those metrics work well for the bspline registration.

For the bspline registration I use the LBFGS2 optimizer with scale factors, and while the shrinkFactors were as above, smoothingSigmas = [4,2,1] (I tried this value for smoothingSigmas for the affine registration but it didn’t improve the result).

I also tried playing around with SetOptimizerAsRegularStepGradientDescent and SetOptimizerAsGradientDescentLineSearch for the affine registration, but was unable to improve it.

I’m out of ideas and would greately appreciate some advice.

Hello @ctorti,

You are trying to register two “cylinders”, so I suspect this is very sensitive to translations along the z direction (cranial-caudal direction). A related 2D task is shown in this jupyter notebook, notice the multi-initialization approach taken there.

Possibly perform the affine registration in two steps:

  1. Limit it to only translation in z (regMethod.SetOptimizerWeights([0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0]), parameter order is the affine matrix in row order and then t_x, t_y, t_z).
  2. After obtaining the results change the weights back to 1.0 for all parameters and run a registration from this new starting point.

In step 1 you may want to allow for translation in all axes, set the last three weights to 1.0.


Hi @zivy

Thanks for the suggestions. I tried limiting the affine registration to translations in z only, but ran into the error:

itk::ERROR: MattesMutualInformationImageToImageMetricv4(000002376B655EA0):
All samples map outside moving image buffer. The images do not sufficiently overlap.
They need to be initialized to have more overlap before this metric will work.
For instance, you can align the image centers by translation.

When I tried limiting it to x, y and z translations only, I ended up with a result far worse than the initially aligned image using LandmarkBasedTransformInitializer.

Interestingly I stumbled across an older thread that Andras Lasso commented on:

His suggestion to crop the larger image down to the smaller one is reasonable but it seems that I really ought to be able to crack the affine registration since I managed to get it working for the bspline.

Here is a collection of plots for 6 different fixed-moving image sets (“SR10”, “SR11”, … “SR15”), with 3 plots per set:

  1. Initial alignment (using geometry)
  2. Metric v iterations
  3. Difference between the fixed and registered image for a middle slice

These are all MR-to-MR registrations (no fiducials required), so I would expect the initial alignment using geometry or moments to provide decent starting points.

For some datasets I get good results using geometry or moments, for some I get better results using geometry, while for others I get poor results irrespective of the method of initialization.

I’ve not managed to find any patterns in good/bad results (e.g. relative voxel spacings, relative directions, relative T1/T2 contrast, etc.). Summary table here.

I wonder if someone experienced in registrations might be able to see any clues as to why they do so poorly for some datasets and what I might be able to do to make the registration method more robust. Thanks!

You could try using elastix, either directly or via ITKElastix. It tends to be somewhat slower, but more robust.

1 Like

Hi @ctorti,

Could you also share the initial transformation (sitk.WriteTransform(initialTx, 'initialRidig.tfm'))? I am curious to see what is going on here. The error from the mutual information metric indicates that the registration went completely haywire (drifted far away from the initial values that seem to have a decent overlap).

Without your initialization it is hard to figure things out. I tried looking for corresponding points with a gui, but can’t identify any (two cylinders :frowning:).

Thanks @dzenanz !

I was unaware of ITKElastix. I assumed it was new, but it seems that it’s been around for at least 2 years. I don’t know how I missed it.

I’ve found SimpleElastix to be more robust than Sitk, and it requires a lot less parameter-tuning (less black-arty). In fact I was using Selx until I encountered limitations of SimpleITK v1.2 (which Selx is based on), causing me to abandon all the work I had done in Selx in favour of Sitk. I’ll see what I can do with ITKElastix.

Hi @zivy

Thanks for your help. Here’s a link to an initialTx.

Hello @ctorti,

I ran the registration multiple times and it is variable, but not catastrophic failure as you experienced
(note: your original code didn’t include values for the SetOptimizerAsGradientDescent parameters, so those may be different from what you used).

An observation, the objects you are registering are articulated, two legs and the spatial relationship between them in the two scans is arbitrary (close, but still arbitrary). If I only use one object (cropped to single leg in the fixed image) the variability is reduced.

Code below runs the registration 10 times with the original image and 10 with cropped. The transformation results and alpha blended images are saved. A nicer alpha blending is possible using masks. If that is of interest, see details in the visualization notebook.

import SimpleITK as sitk
import numpy as np

def mask_image_multiply(mask, image):
    components_per_pixel = image.GetNumberOfComponentsPerPixel()
    if  components_per_pixel == 1:
        return mask*image
        return sitk.Compose([mask*sitk.VectorIndexSelectionCast(image,channel) for channel in range(components_per_pixel)])

def alpha_blend(image1, image2, alpha = 0.5, mask1=None,  mask2=None):
    Alaph blend two images, pixels can be scalars or vectors.
    The alpha blending factor can be either a scalar or an image whose
    pixel type is sitkFloat32 and values are in [0,1].
    The region that is alpha blended is controled by the given masks.
    if not mask1:
        mask1 = sitk.Image(image1.GetSize(), sitk.sitkFloat32) + 1.0
        mask1 = sitk.Cast(mask1, sitk.sitkFloat32)
    if not mask2:
        mask2 = sitk.Image(image2.GetSize(),sitk.sitkFloat32) + 1
        mask2 = sitk.Cast(mask2, sitk.sitkFloat32)
    # if we received a scalar, convert it to an image
    if type(alpha) != sitk.SimpleITK.Image:
        alpha = sitk.Image(image1.GetSize(), sitk.sitkFloat32) + alpha
    components_per_pixel = image1.GetNumberOfComponentsPerPixel()
    if components_per_pixel>1:
        img1 = sitk.Cast(image1, sitk.sitkVectorFloat32)
        img2 = sitk.Cast(image2, sitk.sitkVectorFloat32)
        img1 = sitk.Cast(image1, sitk.sitkFloat32)
        img2 = sitk.Cast(image2, sitk.sitkFloat32)
    intersection_mask = mask1*mask2
    intersection_image = mask_image_multiply(alpha*intersection_mask, img1) + \
                         mask_image_multiply((1-alpha)*intersection_mask, img2)
    return intersection_image + mask_image_multiply(mask2-intersection_mask, img2) + \
           mask_image_multiply(mask1-intersection_mask, img1)

def register_affine(fixed_image, moving_image, initialTx):
    regMethod = sitk.ImageRegistrationMethod()
    regMethod.SetInitialTransform(initialTx, inPlace=False)

    # Similarity metric settings:
    samplingPercentage = 0.05
    regMethod.SetMetricSamplingStrategy(regMethod.RANDOM) # 08/09

    # Setup for the multi-resolution framework:
    shrinkFactors = [4,2,1]
    smoothingSigmas = [2,1,1]

    # Interpolator settings:

    # Optimizer settings:
    #regMethod.SetOptimizerAsGradientDescent(learningRate=learningRate, numberOfIterations=numIters, estimateLearningRate=regMethod.Once)
    regMethod.SetOptimizerAsGradientDescent(learningRate=1.0, numberOfIterations=100, convergenceMinimumValue=1e-6, convergenceWindowSize=10)

    finalTx = regMethod.Execute(fixIm, movIm)
    print(f'Final metric value: {regMethod.GetMetricValue()}')
    print(f'Optimizer\'s stopping condition, {regMethod.GetOptimizerStopConditionDescription()}')
    return sitk.CompositeTransform(finalTx).GetBackTransform()

# read the images and initial transformation
fixIm = sitk.ReadImage('Soft_tissue_seg_aff_trgIm.hdf')
movIm = sitk.ReadImage('Soft_tissue_seg_aff_srcIm.hdf')
initialTx = sitk.AffineTransform(sitk.ReadTransform('Soft_tissue_seg_aff_initRegTx_20210907_111022.tfm'))

print('Using original images:')
results_original = []
for i in range(10):
    results_original.append(register_affine(fixIm, movIm, initialTx))
    sitk.WriteTransform(results_original[-1], f'tx_original_{i}.tfm')
    sitk.WriteImage(alpha_blend(sitk.Resample(movIm, fixIm, results_original[-1]), fixIm), f'alpha_blended_original_{i}.nrrd')

translation_arr = np.array([tx.GetParameters()[-3:] for tx in results_original])
print(f'translation mean: {translation_arr.mean(axis=0)}')
print(f'translation std: {translation_arr.std(axis=0)}')

fixImCropped = fixIm[0:200,...]
print('\nUsing cropped fixed image (one leg):')
results_cropped = []
for i in range(10):
    results_cropped.append(register_affine(fixImCropped, movIm, initialTx))
    sitk.WriteTransform(results_cropped[-1], f'tx_cropped_{i}.tfm')
    sitk.WriteImage(alpha_blend(sitk.Resample(movIm, fixImCropped, results_cropped[-1]), fixImCropped), f'alpha_blended_cropped_{i}.nrrd')
translation_arr = np.array([tx.GetParameters()[-3:] for tx in results_cropped])
print(f'translation mean: {translation_arr.mean(axis=0)}')
print(f'translation std: {translation_arr.std(axis=0)}')

Hi @zivy ,

Thanks very much for your detailed response and code. I tried running the code you provided and it seems that the only difference is that while I set up the optimizer with:


you set it up with:


(I can see that I had not defined the values for my variables learningRate and numIters in my original post but I’ve included them explicitely in the above snippet.)

The values you set for convergenceMinimumValue and convergenceWindowSize are defaults, so my method effectively used the same values, and likewise my value for estimateLearningRate is .Once (default). So as far as I can tell our methods are identical.

I cropped the moving (CT) image rather than the fixed (MR) image - perhaps you meant to as well?

I did so by “eye”, having selected min/max indices that roughly crop movIm to fixIm's extent:

movImCropped = movIm[25:290, 115:390, 198:250]

Here is a plot of two frames from fixIm and movImCropped, as well as fixIm and movIm by comparison.

It depends on how we subjectively define a good/poor/bad/terrible registration, but based on 10 repeated runs of registering movIm to fixIm, and of movImCropped to fixIm I would summarise my results as follows:

movIm to fixIm => 1 decent result, 9 poor/bad results

movImCropped to fixIm => 4 decent, 4 poor/bad, 2 terrible

The results are here if interested.

I fully accept that 10 is not a large enough number to draw any firm conclusions, but I find it surprising that the registrations of movImCropped were just as bad, and some much worse, than movIm.

I also tried implementing the ITKv4 framework:

optimizedTx = sitk.AffineTransform(3)
regMethod = sitk.ImageRegistrationMethod()
regMethod.SetInitialTransform(optimizedTx, inPlace=False)

... (same metric, optimizer, multi-res, interpolator settings as before)

optimizedTx = regMethod.Execute(fixIm, movIm)

finalTx = sitk.CompositeTransform(optimizedTx)

but those results were no better (I seem to recall getting successful results using the ITKv4 framework for landmark-initialized registrations but it perhaps the datasets were different).

I played around with samplingPercentage and came to some surprising results. Using samplingPercentage = 0.5 (10 repeated runs), about 50% of the runs had terrible results, 40% were “bad”, and 10% “decent”.

Using samplingPercentage = 0.05 (10 repeated runs), 55% were decent, 45% were bad (none were terrible).

And using samplingPercentage = 0.01 (10 repeated runs), 80% were decent, 20% were poor (none were terrible).

I accept that there’s vagueness in “decent”, “poor”, “bad” and “terrible”, and not only have I not defined how I assign such scores, I didn’t use a rigorous or consistent way of assessing them. Rather I made very quick subjective assessments. Nontheless there seemed to be an inverse relationship between the sampling percentage and the quality of the registration.

So I decided to repeat the 1% runs, this time with 50 repeated runs: 66% were decent, 24% were poor, 10% were bad, none were as bad as the “terrible” results from the runs using 50% sampling.

These results seem counter-intuitive to me, since with more samples (randomly selected) I would reason that there is a greater likelihood that samples will fall within structure as well as noise, rather than just noise. But perhaps that assumption is wrong.

Can anyone comment on my findings and explain why I might have found that more samples result in worse registrations?

A final point:

I got much better (and consistent) registrations using BSplines with LandmarkBasedTransformInitializer than with affine with LandmarkBasedTransformInitializer

The BSpline method uses SetOptimizerAsLBFGS2 (when using scale factors) or SetOptimizerAsLBFGSB (when not), as opposed to SetOptimizerAsGradientDescent for the affine registration.

But they both use the same metric: SetMetricAsMattesMutualInformation (I found that for some datasets the BSpline registrations were well enough with 5% samplingPercentage, when affine registrations needed 50%).

Based on the fact that the metric used for the affine and bspline registration were the same, why should the bspline perform so much better? Might it be down to the optimizer rather than the metric?