Line #170 in LabelStatisticsImageFilter implementation here:
is causing a floating point exception and crashing my application. I think the reason is a numerically unstable calculation of variance, resulting in a negative value passed to std::sqrt(). In my case, I have a signed 16-bit image with a uniform intensity of -100 for a large background region, the variance of which is computed as approximately -2e-12 by the filter code.
Iām using ITKv4, 4.10.1 to be specific. My image is 512x512x236. Most of it (60838487 pixels) is background. This means the code is computing variance as (608384870000 - -6083848700*-6083848700/60838487) / ( 60838487 - 1.0 ).
Kahan Summation will not resolve this issue, will it? I think the use of the formula above is the issue.
You are right, that square and division is very brutal for numeric instability. I implemented the Kahan summation to improve the accuracy of the mean. Let me look at the Welford methodā¦
Please find a fix for the sort of negative value here:
Thank you for the link to the Walford method of computing variance. However, ITK uses a parallel divide and conquer method to compute the variance with multiple threads. It is not clear to me how this recurrent method can be adapted for chunking of that data.
The data set is not mine (or even my companyās) to share unfortunately. It should be very easy to reproduce though by creating a large image with constant intensity value of -100 (or some such number) and running the LabelStatisticsImageFilter on it.
Create a 512x512x250 image with mostly (at least 60 million) background pixels of constant intensity value -100, and a small foreground region. Run it through itk::LabelMap to create a label map of the image, and pass the image and the label map to the statistics filter.
Hello @blowekamp! I thought Iād have a go at adding CompensatedSummation to LabelStatisticsImageFilter. I havenāt quite succeeded yet, so I could use some advice.
First thing I did was to find test data that would break a naive double sum but work with compensated summation. Using Here is a complete simplified Kahan summation test and indeed it works with -O3 ... | Hacker News as inspiration, I created the following test. As the comment notes, it fails when using ādouble sumā but succeeds 100% of the time (on my machine) using CompensatedSummation as written.
// Demonstration code: test fails using double, but succeeds using CompensatedSummation
TEST(StatisticsImageFilter, test0)
{
// double sum = 1.0;
itk::CompensatedSummation<double> sum = 1;
for (int i = 0; i < 1000000; ++i)
{
sum += 1e-20;
}
EXPECT_EQ((double)sum, 1 + 1e-14);
}
Next, I used the same data in a one megapixel image and wrote a StatisticsImageFilter test [1] ā which also works. Third, I created a corresponding test using LabelStatisticsImageFilter [2] ā which, pleasingly, does fail.
Finally I changed LabelStatisticsImageFilter to use CompensatedSummation ā but my new test still fails. Iām not sure whether Iāve made a boo-boo in the code or in the test. I like to think that the test is sufficiently simple to be bulletproof, but maddeningly: of the two tests in [2], the sum always fails but āmeanā occasionally passes. So I may be expecting too much precision?
The code changes to the filter looks correct. Does the precision of the results change with the changes? One like that I am suspicious of is the following: m_sumtest_sum = 1 + 1e-14 - 1e-20; // 1 + 999,999 * 1e-20
I believe addition and subtraction are applied left to right here, so it would be ā(1 + 1e-14) -1e-20ā which is at the limit of double precision. You may get a more accurate expected value by doing ā1 + ( 1e-14 - 1e-20)ā
Please consider making a PR to ITK with the improvement.
Adding the parentheses does not change the outcome on my machine.
Also, note that the same test is running āsuccessfullyā for the StatisticsImageFilter. This suggests to me that there is some difference in how the two filters compute the sum ā but I cannot see what could make the difference.
One thing I have noticed is that the result does vary across runs. When the test fails, the output is something like
Expected equality of these values:
filter->GetSum(1)
Which is: 1.0000000000000122
m_sumtest_sum
Which is: 1.00000000000001
However, the least significant figures in the āGetSumā result (122, above) do vary across runs; e.g. 131, 138, etc. I donāt understand what might account for variability in the test results. Thereās nothing deliberately random in the test code. Can you think of something that would bring randomness? Does the pipeline randomize the image decomposition during streaming? I guess even if not, the order of merging the sub-problems might vary. But even if that were the case ā why doesnāt the same phenomenon appear in the StatisticsImageFilter test case?
-Steve
P.S. My intention is indeed to create a PR once I can get a working set of tests.
If you have built ITK with TBB multi threader, that will likely cause different parallel decompositions from run to run. I donāt know why it doesnāt affect StatisticsImageFilter the same way.
The way the data is merged between threads is not deterministic:
This will be true for TBB or the default threader.
You may want to set the number of threads to 1 to remove this variability. It will also not use the āmergeā code to help narrow down where the difference between the expected results is located.
Using 1 thread does make the problem go away. So there must be some issue with the merging.
To investigate whether a simple re-ordering of the terms in the sum might play a role, I changed the ātest0ā to put the ā1ā term last:
TEST(StatisticsImageFilter, test0)
{
itk::CompensatedSummation<double> sum = 0;
for (int i = 0; i < 1000000; ++i)
{
sum += 1e-20;
if (i == 999999) sum += 1;
}
EXPECT_EQ((double)sum, 1 + 1e-14);
}
The test still passes. So whatever the problem with the merge, it shouldnāt be simply because of re-ordering the terms in the sum.
The other weird thing that bothers me: the computed sum is larger than expected; e.g.
Expected equality of these values:
filter->GetSum(1)
Which is: 1.0000000000000124
m_sumtest_sum
Which is: 1.00000000000001
If there were to be some issue in the summation, Iād expect the error to be truncation to zero and thus a smaller value, not a larger one? I donāt think this is a rounding issue, since the ulp for this sum would be 1 + 2**-52 = 1.0000000000000002 ā so the result (above) of ā¦124 represents an error of 12 * ulp.
Another data point: I removed the ā1ā from the sum test, so that all the million entries are 1e-20. The sum is then consistently 1e-14 as expected (or close enough; it is something like 9.9999999999999905e-15). Even with merging.