Some thing slow down itkMaskNeighborhoodOperatorImageFilter

Hi, ITK developers,

I am working for implementing a fast convolution process and I found some thing seems incorrect in the itkMaskNeighborhoodOperatorImageFilter . I made a MaskConvolutionImageFilter using itkMaskNeighborhoodOperatorImageFilter, so that convolution process is able to be faster by applying a small mask as a input. However, regarding my testing results, I found the process still requires 8 to 11 sec to finish convolution even I input only a blank mask. After I trace source code of MaskNeighborhoodOperatorImageFilter. The behavior of the source code does really confuse me, because itk::ConstNeighborhoodIterator do check boundary hitting for each dimension when every time iterator moving. This check action seems very expensive and slow down entire convolution process because it is performed even the target pixel isn’t belong to mask. I have appiled some modification to improve performance of convolution process. Both original code and modified code are shown as follows:

Original code:

ConstNeighborhoodIterator< InputImageType > bit;
typename FaceListType::iterator fit;
ImageRegionIterator< OutputImageType > it;
ImageRegionConstIterator< MaskImageType > mit;

for ( fit = faceList.begin(); fit != faceList.end(); ++fit )
bit =
ConstNeighborhoodIterator< InputImageType >(noperator.GetRadius(),
input, *fit);
bit.OverrideBoundaryCondition( this->GetBoundaryCondition() );
it = ImageRegionIterator< OutputImageType >(output, *fit);
mit = ImageRegionConstIterator< MaskImageType >(mask, *fit);
while ( !bit.IsAtEnd() )
if ( mit.Get() )
// Compute the inner product at this pixel
it.Value() = static_cast< typename OutputImageType::PixelType >( smartInnerProduct(bit, noperator) );
// Use the default value or the input value
it.Value() = m_UseDefaultValue ? m_DefaultValue : bit.GetCenterPixel();
++bit; // This movement cause the performance issue

The small modification:

itk::ConstNeighborhoodIterator bit;
itk::ImageRegionConstIterator light_bit;
typename FaceListType::iterator fit;
itk::ImageRegionIterator it;
itk::ImageRegionConstIteratorWithIndex mit;

for(fit = faceList.begin(); fit != faceList.end(); ++fit)
bit =
input, *fit);
light_bit = itk::ImageRegionConstIterator(input, *fit);
it = itk::ImageRegionIterator(output, *fit);
mit = itk::ImageRegionConstIteratorWithIndex(mask, *fit);

for(mit = mit.Begin(); mit != mit.End(); ++mit)
if (mit.Get())
// Compute the inner product at this pixel
bit.SetLocation( mit.GetIndex() ); // Only check boundary hitting and warp offset when pixels belong to mask area
it.Value() = static_cast(smartInnerProduct(bit, noperator));
// Use the default value or the input value
it.Value() = Superclass::GetUseDefaultValue() ? Superclass::GetDefaultValue() : light_bit.Value();

FYI, Coincidentally, I’m currently also looking into ways to improve the performance of ConstNeighborhoodIterator. I proposed the following improvement: “PERF: Improved speed of copying and resizing NeighborhoodAllocator”, I hope to see it being merged onto the master very soon, then I can continue and submit some more performance patches in this area :slight_smile:

1 Like

Thanks for sharing your code @Rock_Lin! If you submit your code to Gerrit, following these instructions, we can do a review of your code easier and probably integrate it into ITK after version 4.13 is released (scheduled for Wednesday this week).

1 Like

Thank you for info me this. It seems great.

@Rock_Lin After you started this topic, two “neighborhood related” performance improvements that I proposed were merged to the master branch of ITK:

PERF: Improved speed of copying and resizing NeighborhoodAllocator;a=commit;h=b53983fab4a8fdb434717e13351875a6db9d1b26

PERF: NeighborhoodOperatorImageFunction avoids copy ConstNeighborhoodIterator;a=commit;h=c37bd4e74fa2390309502cfc1eedf59dcc4c4f0e

Do these two improvements actually fix your problem? Can you please have a look at them?

1 Like