Why is ITK running slower in C++ than in python?

LuOSS · January 26, 2025, 6:06pm

I have written a small program in python where a combination of morphology, masking and gaussian filters are applied to a 3D image. It is made with numpy arrays and scipy functions but I thought to use a dedicated image processing library. I started using itk in python but moved over to C++. However, my C++ implementation takes significantly longer to apply than same filters compared to itk in python and the numpy-scipy version. It takes the entire program ~80 ms to run using the numpy-scipy version, in C++ it currently takes >160ms and not everything is implemented yet.

Itk was compiled in release mode and my program was compiled with the -O3 flag. Is there anything I might have missed that can hurt the performance of the program?

Thank you very much for any tips!

dzenanz · January 27, 2025, 3:00pm

You might be comparing apples and oranges. Apples to apples comparison might be timing myProgram.exe vs python myProgram.py, or timing just the C++/Python function which does the processing (after the input is already in memory). Also, sharing your code would be helpful.

LuOSS · January 28, 2025, 4:51pm

I profiled the code with cProfile for python and valgrind for C++ and compared each function call. The most time consuming calls in the code bellow was the erode filter with a cumulative time of ~20ms in python and ~40ms in C++. The most expensive function in the numpy-scipy code is the Gaussian filter at 7ms. This is a smaller difference than when I initially published the question so I might have messed up. The code is going to process 100 000 images so any speedup helps. The rotation transform, not yet implemented in the C++ code, is much faster with itk in python than scipy.ndimage.

Thank you very much for the response!

C++ code

using PixelType = double;
constexpr unsigned int Dimension = 3;

using ImageType = itk::Image<PixelType, 3>;
using IteratorType = itk::ImageRegionIteratorWithIndex<ImageType>;

int main(int argc, char* argv[])
{

    auto image = ImageType::New();
    auto mask = ImageType::New();
    auto bkg = ImageType::New();

    ImageType::IndexType start;
    start[0] = 0; // first index on X
    start[1] = 0; // first index on Y
    start[2] = 0; // first index on Z

    ImageType::SizeType size;
    size[0] = 64; // size along X
    size[1] = 64; // size along Y
    size[2] = 64; // size along Z

    itk::ImageRegion<3> region(start, size);


    image->SetRegions(region);
    image->Allocate();

    mask->SetRegions(region);
    mask->Allocate();

    bkg->SetRegions(region);
    bkg->Allocate();
 
    // Add an object to the image
    addEllipse(bkg, 1, 1, 1, 20);

    // Binary erosion
    using StructuringElementType = itk::FlatStructuringElement<3>;
    StructuringElementType::RadiusType elementRadius;
    elementRadius.Fill(2);

    StructuringElementType structuringElement = StructuringElementType::Ball(elementRadius);

    using BinaryErodeImageFilterType = itk::BinaryErodeImageFilter<ImageType, ImageType, StructuringElementType>;

    auto erodeFilter = BinaryErodeImageFilterType::New();
    erodeFilter->SetInput(bkg);
    erodeFilter->SetKernel(structuringElement);
    erodeFilter->SetForegroundValue(100);
    erodeFilter->SetBackgroundValue(0);
    auto eroded_image = erodeFilter->GetOutput();

    // Invert the object to get a mask for the background
    using InversionFilterType = itk::InvertIntensityImageFilter<ImageType>;
    auto inversionFilter = InversionFilterType::New();
    inversionFilter->SetInput(eroded_image);
    inversionFilter->SetMaximum(100);


    // Apply mask
    using MaskFilterType = itk::MaskImageFilter<ImageType, ImageType>;
    auto maskFilter = MaskFilterType::New();
    maskFilter->SetInput(bkg);
    maskFilter->SetMaskImage(inversionFilter->GetOutput());

    // Gaussian filtering
    using GaussianFilterType = itk::DiscreteGaussianImageFilter<ImageType, ImageType>;
    auto gaussianFilter = GaussianFilterType::New();
    gaussianFilter->SetVariance(2.5);
    gaussianFilter->SetInput(maskFilter->GetOutput());

    // Execute pipeline
    gaussianFilter->Update();

Python with itk. I could not get the mask filter to work. The error said that itk.Image[itk.F, 3] was not wrapped and then suggested I use the same type, itk.Image[itk.F, 3], instead.

import numpy as np
import itk

def f():
    mesh = np.meshgrid(np.arange(64, dtype=np.float64), np.arange(64, dtype=np.float64), np.arange(64, dtype=np.float64))


    np_array = make_ellipsoid(mesh, 2, 1, 1, 10)

    # Convert the numpy array to an ITK image
    ImageType = itk.Image[itk.F, 3]  # 'itk.F' denotes float, and '3' is for 3D
    image = itk.image_from_array(np_array.astype(itk.F))

    # Set up the structuring element and its radius
    element_radius = [2, 2, 2]  # equivalent to elementRadius.Fill(2) in C++
    structuring_element = itk.FlatStructuringElement[3].Box(element_radius)

    # Binary Erosion
    BinaryErodeImageFilterType = itk.BinaryErodeImageFilter[ ImageType, ImageType, itk.FlatStructuringElement[3] ]
    erode_filter = BinaryErodeImageFilterType.New()
    erode_filter.SetInput(image)
    erode_filter.SetKernel(structuring_element)
    erode_filter.SetForegroundValue(100)
    erode_filter.SetBackgroundValue(0)
    eroded_image = erode_filter.GetOutput()

    # Image Inversion
    InversionFilterType = itk.InvertIntensityImageFilter[itk.Image[itk.F,3], itk.Image[itk.F,3]]
    inversion_filter = InversionFilterType.New(eroded_image)
    inversion_filter.SetMaximum(100)
    inverted_image = inversion_filter.GetOutput()

    # Masking
    # MaskFilterType = itk.MaskImageFilter[ImageType,ImageType,ImageType]
    # mask_filter = MaskFilterType.New(image)
    # mask_filter.SetMaskImage(inverted_image)
    # mask_filter.Update()
    # masked_image = mask_filter.GetOutput()

    # Gaussian Smoothing
    GaussianFilterType = itk.DiscreteGaussianImageFilter[itk.Image[itk.F,3], itk.Image[itk.F,3]]
    gaussian_filter = GaussianFilterType.New(inverted_image)
    gaussian_filter.SetVariance(2.5)
    gaussian_filter.Update()
    smoothed_image = gaussian_filter.GetOutput()

The numpy-scipy based equivalent

import numpy as np
import scipy.ndimage as ndi

def g():
    mesh = np.meshgrid(np.arange(64, dtype=np.float64), np.arange(64, dtype=np.float64), np.arange(64, dtype=np.float64))
    
    # Add an object to the image
    np_array = make_ellipsoid(mesh, 2, 1, 1, 10)
    
    eroded_image = ndi.binary_erosion(np_array, iterations=2)
    
    inverted_image = eroded_image != True
    
    masked_image = np_array * inverted_image.astype(np.float64)
    
    blurred_image = ndi.gaussian_filter(masked_image, 2.5)
    
    return np.sum(blurred_image)

dzenanz · January 28, 2025, 6:47pm

For Python syntax, MaskFilterType = itk.MaskImageFilter[ImageType,ImageType,ImageType] will not work due to wrapping specification:

github.com/InsightSoftwareConsortium/ITK

Modules/Filtering/ImageIntensity/wrapping/itkMaskImageFilter.wrap

v5.4.2


      
          itk_wrap_class("itk::MaskImageFilter" POINTER)
          foreach(t ${WRAP_ITK_ALL_TYPES})
            itk_wrap_image_filter_combinations("${t}" "${WRAP_ITK_INT}" "${t}")
          endforeach()
          itk_end_wrap_class()

You probably want something like:

LabelImageType = itk.Image[itk.UC, 3]
MaskFilterType = itk.MaskImageFilter[ImageType,LabelImageType,ImageType]
...
mask_filter.SetMaskImage(inverted_image.astype(itk.UC))

Maybe easier would be functional syntax, something like masked_image = itk.mask_image_filter(image, mask_image=inverted_image.astype(itk.UC)). This should be functionally equivalent to the entire commented-out block.

I don’t have an explanation why C++ variant is slower. I have a possible theory. You are using double pixel type, where you should be using an unsigned char or possibly a short. Erosion filter is expecting to operate on integral pixel types. With pixel represented as doubles, there must be a lot of unnecessary casting happening. If you require precision for the later smoothing operation, you should only use double there, not through the entire processing pipeline.

For larger erosion/dilation radii, it makes sense to investigate approximation via thresholding a distance field, as its running time is (mostly) independent of the radius. If your objects are close to the image edges, you usually need to pad the image for these approximations to work correctly.

LuOSS · January 30, 2025, 7:55pm

I think you are correct. Changing the data type in the the binary to a short or char makes the C++ version take equivalent time to the python version so the primary question is solved. Don’t know if that was all that was needed. BinaryErode is still strangely slow, much slower than BinaryDilate or BinaryContour, and slower than the scipy.ndimage equivalent. I didn’t know BinaryContour existed and it does what I was using BinaryErode for so its all good. For python I have now used SimpleITK which provides the functionality I need for the moment and it works very well.

Thank you very much for the help!