Bug in itkHistogram.hxx leading to an endless loop

Hello,

I have identified a bug in the binary search in itkHistogram.hxx that make a loop run forever. The lines in question are these:

// Binary search for the bin where this measurement could be
mid = ( end + 1 ) / 2;
median = m_Min[dim][mid];

while ( true )
  {
  if ( tempMeasurement < median )
    {
    end = mid - 1;
    }
  else if ( tempMeasurement > median )
    {
    // test whether it is inside the current bin by comparing to the max of
    // this bin.
    if ( tempMeasurement <  m_Max[dim][mid]
         && tempMeasurement >= m_Min[dim][mid] )
      {
      index[dim] = mid;
      break;
      }
    // otherwise, continue binary search
    begin = mid + 1;
    }
  else
    {
    index[dim] = mid;
    break;
    }
  mid = begin + ( end - begin ) / 2;
  median = m_Min[dim][mid];
  } // end of while

With a few of the data sets I’m currently testing with, I never reach one of the two break statements. What happens is this:

  1. tempMeasurement is less than median, and we thus enter the first if statement.
  2. mid is at the moment end + 1, so the update within the if statement (end = mid - 1) doesn’t change anything.
  3. In the bottom of the while loop mid and median are updated, but since begin is end + 1, the (end - begin)/2 part becomes zero, and mid is then set to begin - which it already is. So nothing happens here either.
  4. The median isn’t changed because mid isn’t changed.

So my question is this: has anyone else encountered this bug, and has it perhaps been fixed in later versions? I’m using 4.12 here, but I have checked the source of 4.13 and these lines seem to be the same.

Let me know if I can provide more details about the bug. Sadly I cannot easily share the input data.

A dump of my debugger’s state could be:

begin           = 17344842
end             = 17344841
mid             = 17344842
median          = 224.9340057373047
tempMeasurement = 224.93399047851562

Best regards,
Mads

1 Like

I’ve begun looking into this bug myself, but before I do dive in, I thought I’d share my test setup with you, so that you may spot any obvious mistakes. I’ve just added the following to the bottom of itkHistogramTest.cxx:

// Test of the binary search within the histogram code.
{
  std::ifstream in("testdata.bin", std::ios_base::binary);
  if (!in.good()) {
    std::cerr << "Opening test file failed" << std::endl;
    return EXIT_FAILURE;
  }

  typedef itk::Image<float, 3> ImageType;
  ImageType::RegionType region;
  ImageType::IndexType start;
  start.Fill(0);
  region.SetIndex(start);
  ImageType::SizeType imageSize;
  imageSize[0] = 512;
  imageSize[1] = 512;
  imageSize[2] = 100;
  region.SetSize(imageSize);
  ImageType::Pointer image = ImageType::New();
  image->SetRegions(region);
  image->Allocate();
  image->FillBuffer(itk::NumericTraits<float>::Zero);

  ImageType::IndexType pixelIndex;
  for (ImageType::IndexValueType x = 0; x < 512; x++) {
    for (ImageType::IndexValueType y = 0; y < 512; y++) {
      for (ImageType::IndexValueType z = 0; z < 100; z++) {
        pixelIndex[0] = x;
        pixelIndex[1] = y;
        pixelIndex[2] = z;
        float f;
        in.read((char *)&f,sizeof(float));
        image->SetPixel(pixelIndex, f);
      }
    }
  }
  in.close();

  typedef itk::Statistics::Histogram<typename ImageType::PixelType> HistogramType2;
  typedef itk::Statistics::ImageToListSampleAdaptor<ImageType> AdaptorType;
  typename AdaptorType::Pointer adaptor = AdaptorType::New();
  typedef itk::Statistics::SampleToHistogramFilter<AdaptorType,HistogramType2> FilterType;
  typename FilterType::Pointer filter = FilterType::New();

  adaptor->SetImage(image);

  typename HistogramType2::SizeType size2(1);
  typename ImageType::SizeType im_size=image->GetLargestPossibleRegion().GetSize();
  unsigned int total_size=1;
  for(unsigned int i=0;i<im_size.GetSizeDimension();++i) {
      total_size*=im_size[i];
  }

  size2.Fill( total_size );
  filter->SetInput( adaptor );
  filter->SetHistogramSize( size2 );
  filter->SetMarginalScale( 10 );
  typename HistogramType2::MeasurementVectorType min( total_size );
  typename HistogramType2::MeasurementVectorType max( total_size );
  min.Fill( 0.0f );
  max.Fill( total_size );
  filter->SetHistogramBinMinimum( min );
  filter->SetHistogramBinMaximum( max );
  filter->Update();
}

This code reads a (rather large) test input file and it hangs indefinitely while trying to generate the histogram. I can share the test data now, since it is in fully anonymized form :slight_smile: It’s 100 MB though, so I won’t attach it here - you can contact me here if you’re interested.

Best regards,
Mads

1 Like

Looking good @madsdk :+1:

A few tips:

ITK has an itk::RawImageIO, which could simplify the code.

To share the 100 MB file, one possibilty to sign up for an account on http://data.kitware.com and put it your Public account folder.

2 Likes

Turns out that the binary search fails because of the use of a min and a max list, instead
of just using a min list and then using the next bin’s min as the “max” value. When there
are many bins and a sufficiently small range of values, floating point precision leads to
fun error such as the small snippet shown below:

m_Min[0][19319936] = 250.5477142333984375, m_Max[0][19319936] = 250.5477142333984375
m_Min[0][19319937] = 250.5477142333984375, m_Max[0][19319937] = 250.5477142333984375
m_Min[0][19319938] = 250.5477447509765625, m_Max[0][19319938] = 250.5477752685546875
m_Min[0][19319939] = 250.5477752685546875, m_Max[0][19319939] = 250.5477752685546875

Notice that m_Max[0][19319937] != m_Min[0][19319938] which breaks the invariant of binary
search, and which means that my current value of 250.5477294921875 has nowhere to stay :slight_smile:
This error occurs in my code because I had set a ridiculously high number of bins by
accident.

As I see it, there are a number of solutions:

  1. Rewrite the code to stop using the m_Max variable in favour of looking ahead to the
    next index of m_Min instead (and then adding special case handling of the final
    index in the list.). This means rewriting code that works, not just the faulty
    binary search, so it’s probably not feasible right now.
  2. Rewrite just the binary search method to use the m_Min[dim][index+1] value instead
    of m_Max. This would lead to bins such as m_Min[0][19319937] in my example above
    containing value that are above their reported max though…
  3. Make a sanity check when initializing the itkHistogram that throws an exception if
    the range of values divided by the number of bins is below a certain threshold. This
    would make sure that the error could not occur, but it incurs a limitation on the
    functionality I guess.

I guess the third options is the best, but I’ll let you guys decide :wink:

Best regards,
Mads

2 Likes

3 seems like the easiest “solution”, while 1 looks like the best. But I suppose you know your priorities the best, so you should decide which one to do :smiley:

1 Like