It appears that the current ITK master branch is in some cases > 40 x faster (yes, more than forty times!) HoughTransform2DCirclesImageFilter::Update(), compiled with MSVC 2015 64-bit Release. The source of my little test is at item 4 of the topic Removal of `virtual` keywords from ConstNeighborhoodIterator - #4 by blowekamp 
The specific filter->Update() call in that test takes me more than 10 minutes when using ITK 4.13, and less than 15 seconds  when using the ITK master branch!!! As I observed with commit Merge topic 'Remove-virtual-NeighborhoodAccessorFunctor-destructor' · Kitware/ITK@fa68b53 · GitHub .
This performance gain was achieved by a number of independent improvements, including:
A rigorous code cleanup of GaussianDerivativeImageFunction, especially 
 
  
  
    
  
  
    
    
      
        committed 12:21PM - 26 Jan 18 UTC 
      
      
      
     
   
 
  
    Removed Gaussian blurring from GaussianDerivativeImageFunction::EvaluateAtIndex
… as the blurring usually took a lot of time (possibly 50 % or more), whereas its
result, as stored in 'value', would be multiplied by 'centerval', which would
always be zero.
It appears a bug that the blurring had zero effect, and this bug appeared to
be there already with the very first commit, 2003-05-29, SHA-1:
27973617ab1b7164ae98407bd3aefffa62c18ab0 (when the code was still located at
Code/Common/itkGaussianDerivativeImageFunction.txx).
Removed creation of Gaussian blurring kernel from RecomputeGaussianKernel(),
as it is no longer used by now. Also removed GaussianFunctionType,
GaussianFunctionPointer, RecomputeContinuousGaussianKernel(const double *),
m_ContinuousOperatorArray, m_GaussianFunction, as they were not used (anymore).
Note: A more effective blurring is still to be implemented, to be added to
a future version of ITK.
See also the discussion at
https://discourse.itk.org/t/hough-transform-2d-circles-image-filter-getcircles-patch/350/39
Change-Id: I51b0976c67d5be320c009b31d98e6f3f014fc1af 
   
   
  
    
    
  
  
 
Reducing the number of memory allocation during neighborhood operations, for example: 
 
  
  
    
  
  
    
    
      
        committed 11:27AM - 05 Dec 17 UTC 
      
      
      
     
   
 
  
    * Replaced for-loops by std::copy calls.
 * Removed redundant m_ElementCount ass… ignments.
 * Avoided reallocation in set_size when size does not change.
Change-Id: I50bca9756989def871266e280616f876dc111c1e 
   
   
  
    
    
  
  
 
Removing virtual function calls during neighborhood iteration: 
 
  
  
    
  
  
    
    
      
        committed 07:07PM - 10 Apr 18 UTC 
      
      
      
     
   
 
  
    Replace virtual specifiers in NeighborhoodIterators with
ITK_ITERATOR_VIRTUAL ma… cro defined to "".
Virtual table lookup appeared to cause a significant performance
penalty. ConstNeighborhoodIterator member functions are often called
with a high frequency, for example within the inner loop of an image
filter update, so removing the 'virtual' keywords here can yield a
significant performance gain.
Co-authored-by: Niels Dekker <N.Dekker@lumc.nl>
Change-Id: If2d4a4cbc7c4ff9cffd3af84b2621fe9e4be583c 
   
   
  
    
    
  
  
 
Introduction of a new ShapedImageNeighborhoodRange class 
 
  
  
    
  
  
    
    
      
        committed 08:50PM - 25 Apr 18 UTC 
      
      
      
     
   
 
  
    ShapedImageNeighborhoodRange allows iteration over a neighborhood of
pixels, ver… y much like NeighborhoodIterator and ShapedNeighborhoodIterator.
But it has a new design and a new interface, offering iterators similar to
those from the C++ Standard Library.
New features:
 * Can be used in C++11 range-based for loops
 * Can be used to easily construct std containers (e.g., std::vector)
 * Can be passed directly to std algorithms (std::for_each, std::copy, etc.)
 * Can also be used with std C++ <numeric>, e.g., std::inner_product
 * Supports bidirectional iteration (both ++it and --it)
Performance related properties:
 * No dynamic memory allocation (not even during construction)
 * No virtual functions (even its destructors are non-virtual)
 * No 'mutable' data members, making it easier to write thread-safe code.
 * All member functions 'noexcept', each class 'final'.
Adapted GaussianDerivativeImageFunction to use the new
ShapedImageNeighborhoodRange + range-based for loop, instead of
ConstNeighborhoodIterator + NeighborhoodInnerProduct, to calculate the
inner product. A significant performance improvement was observed
(reduction of run-time duration with more than 25%). Documented that
GaussianDerivativeImageFunction is now thread-safe.
Change-Id: I39115c957b997277c0d5c9b48903284a87254d0d 
   
   
  
    
    
  
  
 
Hope that helps, Niels
             
            
              2 Likes