LabelStatisticsImageFilter floating point exception


(Fijoy Vadakkumpadan) #1

Hello,

Line #170 in LabelStatisticsImageFilter implementation here:

is causing a floating point exception and crashing my application. I think the reason is a numerically unstable calculation of variance, resulting in a negative value passed to std::sqrt(). In my case, I have a signed 16-bit image with a uniform intensity of -100 for a large background region, the variance of which is computed as approximately -2e-12 by the filter code.

Can anyone else confirm this is a bug? If yes, maybe the direct formula (based on the definition of variance) could be adopted as discussed in below?
https://www.johndcook.com/blog/2008/09/28/theoretical-explanation-for-numerical-results/

Is there another filter that can provide a workaround for calculating standard deviatation while this issue is being resolved?

–Fijoy


(Bradley Lowekamp) #2

This filter was recently updated to use Kahan Summation which should improve the accuracy of the computation.

What is the size of your data? How big is your image, and how many pixels are -100?

Also, are you using the current (-ish) ITKv5 master or the the prior implementation of the filter from ITKv4?


(Fijoy Vadakkumpadan) #3

I’m using ITKv4, 4.10.1 to be specific. My image is 512x512x236. Most of it (60838487 pixels) is background. This means the code is computing variance as (608384870000 - -6083848700*-6083848700/60838487) / ( 60838487 - 1.0 ).

Kahan Summation will not resolve this issue, will it? I think the use of the formula above is the issue.


(Bradley Lowekamp) #4

You are right, that square and division is very brutal for numeric instability. I implemented the Kahan summation to improve the accuracy of the mean. Let me look at the Welford method…


(Bradley Lowekamp) #5

Can you please create an issue on Github with this information?

Thanks.


(Bradley Lowekamp) #6

Please find a fix for the sort of negative value here:

Thank you for the link to the Walford method of computing variance. However, ITK uses a parallel divide and conquer method to compute the variance with multiple threads. It is not clear to me how this recurrent method can be adapted for chunking of that data.


(Fijoy Vadakkumpadan) #7

Thank you Bradley. Sorry I couldn’t get back right away and create a github issue due to some deadlines at work.


(Bradley Lowekamp) #8

I hope I addressed your issue.

Would you be able to share the data set which produced the issue? It would be good to be able to test that the issue is addressed.


(Fijoy Vadakkumpadan) #9

The data set is not mine (or even my company’s) to share unfortunately. It should be very easy to reproduce though by creating a large image with constant intensity value of -100 (or some such number) and running the LabelStatisticsImageFilter on it.


(Bradley Lowekamp) #10

The filter requires two inputs an intensity and a label or segmentation image. What should the label image be?

And what do you mean by large…


(Fijoy Vadakkumpadan) #11

Create a 512x512x250 image with mostly (at least 60 million) background pixels of constant intensity value -100, and a small foreground region. Run it through itk::LabelMap to create a label map of the image, and pass the image and the label map to the statistics filter.