CurvatureAnisotropicDiffusionImageFilterTest failures

Hi all,

This test fails on 2 of my bots, but passes on others:

https://open.cdash.org/testSummary.php?project=2&name=CurvatureAnisotropicDiffusionImageFilterTest&date=2019-12-09

Oddly, on Rogue7 it passes in Release but not Debug.

Many pixels are 1 shade of grey off, but on the failing submissions, one pixel is 2 shades of grey off.

Can anyone explain what the fields here are:
https://open.cdash.org/testDetails.php?test=807935646&build=6243663

ImageError 1
ImageError Minimum 4
ImageError Maximum 4
ImageError Mean 4

Is this test failing because only 1 pixel is wrong but a minimum of 4 pixels must be wrong?!

Thanks,

Sean

ImageError = how many pixel-wise differences are above the threshold (2 by default, I think). The rest are just regular statistics of those differences - in this case, all are equal due to sample size of 1.

Hi all,

Aside from the difference between building as Debug vs Release, the only other difference between Rogue7’s two submissions is/was:

  • the ‘debug’ build had Module_ITKVtkGlue = 1
  • the ‘release’ build had Module_ITKVtkGlue = 0

Yesterday I made the ‘release’ build also use ITKVtkGlue=1 and now 3 new tests fail:

https://open.cdash.org/viewTest.php?onlyfailed&buildid=6245460

  • CurvatureAnisotropicDiffusionImageFilterTest
  • LaplacianSharpeningImageFilterTest
  • ResampleImageFilter9Test

Does this give us a clue?!

Sean

Sadly, the most likely reason for failure is minor numerical instability. And that is quite hard to trace and fix in a cross-platform CPU-model-independent manner. You are welcome to give it a try yourself!

@seanm

The CurvatureAnisotropicDiffusionImageFilterTest and LaplacianSharpeningImageFilterTest differences do not look significant to me.

I think I have seem similar Resampling differences with that test. As I recall this is with the nearest neighbor interpolator, which is numerically problematic when the expected value is close to a rounding boundary. This discourse thread may shed some light:

You could increase tolerance, possibly set --compareNumberOfPixelsTolerance to 1 or 10.

Bradley, I’ll look more closely at that discussion thread, but maybe the test results indeed depend on the number of threads. My two bots where the tests fail have 12 to 16 cores, whereas the ones it passes on have 2-4 cores.

Tonight, I’ve set Module_ITKVtkGlue=0 on Rogue7 Debug to see if that makes the test pass.

I still find it odd that VtkGlue changes things… is there code that conditionally uses VTK vs some other ITK code?

Sean

That issue was addressed, but please dig into the numerics of the revised implementation.

As I recall, this changed was made to improve the consistency of the results:

But despite the commit comment, I believe theoretically it is not as numerically stable as the previous change.

So the reverse test of turning off VtkGlue indeed made the tests pass.

So, in summary, on both Rogue7 and Rogue17:

  • setting VtkGlue ON seems to make those 3 tests reliably fail.
  • setting VtkGlue OFF seems to make those 3 tests reliably pass.
  • my bots use VTK master, built just before building ITK master.
  • building as debug vs release doesn’t matter.
  • building with AppleClang vs regular Clang doesn’t matter.

If it’s a numerical stability issue, I don’t see what VtkGlue has to do with it. I only looked quickly, but the test doesn’t even seem to use any VTK.

As I don’t use those classes, and work on ITK on my employer’s time/dime, I won’t be diving into this myself, but could try any suggestions anyone has…

Sean

@seanm Thank you for diving it into that far there are quite a few options you tired. :clap:

Based on your experiments, I am suspicious VTK has compiler flags propagating when those other files are built.