LabelStatisticsImageFilter floating point exception

Hello,

Line #170 in LabelStatisticsImageFilter implementation here:

is causing a floating point exception and crashing my application. I think the reason is a numerically unstable calculation of variance, resulting in a negative value passed to std::sqrt(). In my case, I have a signed 16-bit image with a uniform intensity of -100 for a large background region, the variance of which is computed as approximately -2e-12 by the filter code.

Can anyone else confirm this is a bug? If yes, maybe the direct formula (based on the definition of variance) could be adopted as discussed in below?
https://www.johndcook.com/blog/2008/09/28/theoretical-explanation-for-numerical-results/

Is there another filter that can provide a workaround for calculating standard deviatation while this issue is being resolved?

ā€“Fijoy

This filter was recently updated to use Kahan Summation which should improve the accuracy of the computation.

What is the size of your data? How big is your image, and how many pixels are -100?

Also, are you using the current (-ish) ITKv5 master or the the prior implementation of the filter from ITKv4?

Iā€™m using ITKv4, 4.10.1 to be specific. My image is 512x512x236. Most of it (60838487 pixels) is background. This means the code is computing variance as (608384870000 - -6083848700*-6083848700/60838487) / ( 60838487 - 1.0 ).

Kahan Summation will not resolve this issue, will it? I think the use of the formula above is the issue.

1 Like

You are right, that square and division is very brutal for numeric instability. I implemented the Kahan summation to improve the accuracy of the mean. Let me look at the Welford methodā€¦

1 Like

Can you please create an issue on Github with this information?

Thanks.

Please find a fix for the sort of negative value here:

Thank you for the link to the Walford method of computing variance. However, ITK uses a parallel divide and conquer method to compute the variance with multiple threads. It is not clear to me how this recurrent method can be adapted for chunking of that data.

Thank you Bradley. Sorry I couldnā€™t get back right away and create a github issue due to some deadlines at work.

I hope I addressed your issue.

Would you be able to share the data set which produced the issue? It would be good to be able to test that the issue is addressed.

The data set is not mine (or even my companyā€™s) to share unfortunately. It should be very easy to reproduce though by creating a large image with constant intensity value of -100 (or some such number) and running the LabelStatisticsImageFilter on it.

The filter requires two inputs an intensity and a label or segmentation image. What should the label image be?

And what do you mean by largeā€¦

Create a 512x512x250 image with mostly (at least 60 million) background pixels of constant intensity value -100, and a small foreground region. Run it through itk::LabelMap to create a label map of the image, and pass the image and the label map to the statistics filter.

Curious about this statement, so I had a look at the file : ITK/Modules/Filtering/ImageStatistics/include/itkLabelStatisticsImageFilter.hxx at master Ā· InsightSoftwareConsortium/ITK Ā· GitHub

Unless I very much misunderstand things, Iā€™m not seeing Kahan summation. Am I misreading this?

It looks like it was the StatisticsImageFilter which had the improved summation algorithmic:

Perhaps the LabelStatisticsImageFilter could benefit from it too?

Hello @blowekamp! I thought Iā€™d have a go at adding CompensatedSummation to LabelStatisticsImageFilter. I havenā€™t quite succeeded yet, so I could use some advice.

First thing I did was to find test data that would break a naive double sum but work with compensated summation. Using Here is a complete simplified Kahan summation test and indeed it works with -O3 ... | Hacker News as inspiration, I created the following test. As the comment notes, it fails when using ā€œdouble sumā€ but succeeds 100% of the time (on my machine) using CompensatedSummation as written.

// Demonstration code: test fails using double, but succeeds using CompensatedSummation
TEST(StatisticsImageFilter, test0)
{
  // double sum = 1.0;
  itk::CompensatedSummation<double> sum = 1;
  for (int i = 0; i < 1000000; ++i)
  {
    sum += 1e-20;
  }

  EXPECT_EQ((double)sum, 1 + 1e-14);
}

Next, I used the same data in a one megapixel image and wrote a StatisticsImageFilter test [1] ā€“ which also works. Third, I created a corresponding test using LabelStatisticsImageFilter [2] ā€“ which, pleasingly, does fail.

Finally I changed LabelStatisticsImageFilter to use CompensatedSummation ā€“ but my new test still fails. Iā€™m not sure whether Iā€™ve made a boo-boo in the code or in the test. I like to think that the test is sufficiently simple to be bulletproof, but maddeningly: of the two tests in [2], the sum always fails but ā€œmeanā€ occasionally passes. So I may be expecting too much precision?

Full changes here: Comparing InsightSoftwareConsortium:master...smr99:master Ā· InsightSoftwareConsortium/ITK Ā· GitHub

-Steve

[1] ITK/itkStatisticsImageFilterGTest.cxx at master Ā· smr99/ITK Ā· GitHub
[2] ITK/itkLabelStatisticsImageFilterGTest.cxx at master Ā· smr99/ITK Ā· GitHub

Hello,

Thank you for working on this :+1:

The code changes to the filter looks correct. Does the precision of the results change with the changes? One like that I am suspicious of is the following:
m_sumtest_sum = 1 + 1e-14 - 1e-20; // 1 + 999,999 * 1e-20
I believe addition and subtraction are applied left to right here, so it would be ā€œ(1 + 1e-14) -1e-20ā€ which is at the limit of double precision. You may get a more accurate expected value by doing ā€œ1 + ( 1e-14 - 1e-20)ā€

Please consider making a PR to ITK with the improvement.

1 Like

Adding the parentheses does not change the outcome on my machine.

Also, note that the same test is running ā€œsuccessfullyā€ for the StatisticsImageFilter. This suggests to me that there is some difference in how the two filters compute the sum ā€“ but I cannot see what could make the difference.

One thing I have noticed is that the result does vary across runs. When the test fails, the output is something like

Expected equality of these values:
  filter->GetSum(1)
    Which is: 1.0000000000000122
  m_sumtest_sum
    Which is: 1.00000000000001

However, the least significant figures in the ā€œGetSumā€ result (122, above) do vary across runs; e.g. 131, 138, etc. I donā€™t understand what might account for variability in the test results. Thereā€™s nothing deliberately random in the test code. Can you think of something that would bring randomness? Does the pipeline randomize the image decomposition during streaming? I guess even if not, the order of merging the sub-problems might vary. But even if that were the case ā€“ why doesnā€™t the same phenomenon appear in the StatisticsImageFilter test case?

-Steve

P.S. My intention is indeed to create a PR once I can get a working set of tests.

If you have built ITK with TBB multi threader, that will likely cause different parallel decompositions from run to run. I donā€™t know why it doesnā€™t affect StatisticsImageFilter the same way.

The way the data is merged between threads is not deterministic:

This will be true for TBB or the default threader.

You may want to set the number of threads to 1 to remove this variability. It will also not use the ā€œmergeā€ code to help narrow down where the difference between the expected results is located.

1 Like

Using 1 thread does make the problem go away. So there must be some issue with the merging.

To investigate whether a simple re-ordering of the terms in the sum might play a role, I changed the ā€œtest0ā€ to put the ā€˜1ā€™ term last:

TEST(StatisticsImageFilter, test0)
{
  itk::CompensatedSummation<double> sum = 0;
  for (int i = 0; i < 1000000; ++i)
  {
    sum += 1e-20;
    if (i == 999999) sum += 1;
  }

  EXPECT_EQ((double)sum, 1 + 1e-14);
}

The test still passes. So whatever the problem with the merge, it shouldnā€™t be simply because of re-ordering the terms in the sum.

The other weird thing that bothers me: the computed sum is larger than expected; e.g.

Expected equality of these values:
  filter->GetSum(1)
    Which is: 1.0000000000000124
  m_sumtest_sum
    Which is: 1.00000000000001

If there were to be some issue in the summation, Iā€™d expect the error to be truncation to zero and thus a smaller value, not a larger one? I donā€™t think this is a rounding issue, since the ulp for this sum would be 1 + 2**-52 = 1.0000000000000002 ā€“ so the result (above) of ā€¦124 represents an error of 12 * ulp.

Another data point: I removed the ā€˜1ā€™ from the sum test, so that all the million entries are 1e-20. The sum is then consistently 1e-14 as expected (or close enough; it is something like 9.9999999999999905e-15). Even with merging.

Just in case someone goes looking for my code: I moved it to a new branch GitHub - smr99/ITK at topic-label-stats-compensated