ResampleImageFilter non-determinism with threading

I’m seeing some non-deterministic results in ResampleImageFilter on with different thread counts.

I’m assuming this is unexpected behaviour, but thought I’d ask here first if there are known issues it might relate to before I dig into it further.

I’ve reproduces with 4.10 and 4.13, and it seems to depend on particular changes in image geometry.

Hi Simon,

Correct – ResampleImageFilter should not change its result based on the thread count.

Thanks,
Matt

Here is a somewhat simple test case that fails for me on visual studio 2015 builds. You can see from the slicer screen grab the patterned differences where interpolator results are different by (typically) 1.

If I come up with a better test case I’ll add it here. The one from the actual code is a bit more complicated and also larger difference, this is the simplest trigger I’ve come up with so far. The geometry numbers come from that example though and seem to be significant.

Any thoughts?

test.cxx (3.5 KB)

Forgot to mention, to trigger on my machine I run threadcount from 1 to 10 (i7 with 8 virtua cores). I get something like this:

$ for i in {1..10}; do build/Release/Test.exe $i.nrrd $i; CompareImages.exe 1.nrrd  $i.nrrd;done
PSNR = inf
PSNR = inf
PSNR = inf
PSNR = inf
PSNR = 90.0928
PSNR = inf
PSNR = 95.1087
PSNR = 97.5865
PSNR = inf
PSNR = 90.6642

Also, I hope to have a proper look at why this is happening this week, but can’t for a couple of days. So if anyone has ideas or similar experiences, I’d love to hear them. I’m also not very familiar with that codebase so it may be slow going.

1 Like

Do you have a measure of the magnitude of the per pixel error?

The algorithm used for “linear transforms” is done in an optimized scan line basis:

This methods just compute the “delta” in the scanline direction and accumulates it as it move along the scanline. This would result is subtle numeric differences than if the full transform from the fixed to the moving was done.

Now if a different “ImageRegionsSplitter” was used the results could be slightly different becausethe default way to split the images (currently) should ensure that the fast scan line is never split.

@dzenanz

In this test case, the magnitude is usually 1.

However in the problem that caused this, i was using NearestNeighbor, not linear. Unfortunately I haven’t found as simple a test case that generates it with NN. I will try to autogen an image that trips that up too if it helps. Or perhaps find one I can share.

In the NN case I would expect direction to be irrelevant, no?

The direction matters for which pixels get interpolated, and interpolation matters for coming up with values. So image direction does affect nearest neighbor interpolation.

I tried the example, and I have consistently been getting this kind of image: stripes with value 254 and 253. I didn’t notice whether this is thread-dependent. I used ITK-4.12.0-vs2017.

1 Like

Sorry, I should have been more careful with my choice of words. Clearly the image direction matrix matters in choosing which pixels are interpolated. However, while accumulation order could change the linear result a little bit, I can’s see how the choices of ImageRegionsSplitter could affect a nearest neighbors result. Am I missing something?

So there are two things to consider here:

  1. What is the numeric cause of the difference?
    The resampling is does from continuous physical space to continuous physical space, so it is done with double precision. With the limitation of finite bits, the computations for coordinates to sample from is imprecise which will require some interpolation. The pixel type is of “unsigned char” the result of the interpolator is then implicitly rounded via truncation. So a slight difference in location can result in 254 vs 255. Consider using “float” for the output pixel type followed by the “RoundImageFilter”

  2. Why is it different with thread numbers?
    This is tougher as I suggested I could see a different Region Splitter could the cause. Could you be using a different one? The other is some funny SSE instructions making the scan line dependent on each other. OR other bad optimization done.

Bradley, good points.

My intuition is the numerics are not problematic, but will keep in mind.

I’m not convinced my simple test captures enough of it, I’ll try and construct a more representative one. Once I have that I can try with all optimizations off to see if it’s optimizer related.

The Splitter could most definitely have an effect even with NearestNeighbor interpolator. If the result of the accumulation is .499999 vs .500000001 you could land on a different pixel.

Ah, of course, I was thinking of the value accumulation and should have read your snippit more carefully.

I have to leave this for a bit now, but I have verified behaviour is the same with all optimizations turned off and SIMD disabled.

1 Like

@blowekamp : ata first pass I think LinearThreadedGenerateData would be more stable with a step count rather than a rolling accumulation.

I’ll pull a branch and try that.

It might be, but multiplication is more computationally expensive than addition. That might not matter much on leading edge modern processors though :smiley:

Yes, we would have try it. Regardless it would be nice to know if that is the root of the issue.

Update. The striped image in above example made it more difficult to see what was going on, so I reverted to using a mask image with similar geometry.

As Bradley surmised, replacing the line

      inputIndex += delta;

with

outputPtr->TransformIndexToPhysicalPoint(outIt.GetIndex(), outputPoint);
inputPtr->TransformPhysicalPointToContinuousIndex(transformPtr->TransformPoint(outputPoint),inputIndex);

removed all of the thread variability.

Unsurprisingly, the accumulation version doesn’t match recomputing position at every point. Attached snapshot shows difference between reference (position recomputed always) and 1 and 8 thread resamplers using the acumulation menthod (i.e. code in 4.13).

bottom is reference image, top is the deltas between reference and 1-thread (bright) 8-thread (dim). You can see how feeding this mask through additional morphological operations can result in quite different outputs for different threads. I should point out that the screenshot is centered on one of the slices that have differences, of course much of the volume is identical.

The key issue here I think is the delta, as computed it is {0.99999999999999756, 6.9111383282915995e-15, 3.5527136788005009e-15} }

My preferred method for resampling masks has be to create a 0-level set from the mask via a distance field filter. Then resample the distance field. This approach is free from many aliasing issue, and more numerically stable with the usage of a linear or higher order interpolator.

@blowekamp : true, I use that approach in other places, and this is an oversimplified test. However this isn’t about aliasing in the result - I was surprised by the different behaviour based on thread count. This will affect any resampling of similar geometry, of course, just be more difficult to see. In the particular case I have I can make it more stable, but I’m still looking to avoid regressions between machines with different core counts…

Fundamentally this isn’t a big impact but I’m not sure the speed tradeoff is always worth the non-determinism of the scanline approach. Any correction factors for the accumulation errors, etc. are probably going to get close to cost of recomputing the position each time.

For what it’s worth, I’ve already resolved the original regression a different way.