The image resample filter in ITK is slow, can we use GPU to accelerate it?
That is possible, but not implemented. Implementing it would be a lot of work.
Why do you think it would be a lot of work? Would any design change be needed? Couldn’t it be another OpenCL-based imaging filter? Or are you concerned about the time to transfer the image between CPU/GPU would cancel out the performance gain?
Implementing all the interpolators to work efficiently on the GPU will be a lot of work. Even if we were to implement just linear and nearest neighbor, it is still not trivial.
Is it complicated because the generic framework of interpolators would be hard to port to OpenCL? A simple image-to-image filter doing linear or NN interpolation on 3D volumes (including single-slice volumes) would cover almost all use cases.
Yes, I think so.
I agree. Still, it would probably take days to implement it for an experienced person.
OK, thanks for the clarifications, we agree then.
What is reason for not using CUDA?
The texture memory would provide a quite simple way to create an interpolator for linear and NN interpolation (at least with affine transformations).
Because it is owned by NVIDIA and therefore not open-source and limited to their graphic cards?
They have kind-of opened it a few years ago. I don’t think there are any other implementations of CUDA but nVidia’s. GPU processing was added to ITK with version 4.0, around 2010. OpenCL was the open standard for that back then. So no CUDA.
I was asking because the RTK has some basis for a ITK Cuda implementation.
Furthermore elastix has an OpenCLResampler that could be a starting point for someone eager to have a GPU resample image filter.
LLVM/Clang has a CUDA implementation ( https://llvm.org/docs/CompileCudaWithLLVM.html ), though I’ve not tried it myself for anything other than small code samples, yet, due to the lack of CMake support: https://gitlab.kitware.com/cmake/cmake/issues/16586 (which has finally, after 3 years, seen some work as of yesterday!)
An alternative approach, which would be compatible with most hardware (not just nVidia GPUs), could be to use SYCL, Eigen already supports it as backend, so I guess that would be a more obvious choice.
However, as the image types* in OpenCL supports nearest and linear interpolation ( https://www.khronos.org/registry/OpenCL/specs/opencl-1.2.pdf#page=329 ), this would be the simplest, and probably most performant, solution. For any more advanced interpolation, like bspline, the way elastix does it (as @Gordian mentioned) looks like the right approach.
*[Although the image types is not supported on all hardware, it’s supported on practically all GPUs (I’ve yet to encounter one which doesn’t), and, I think, most x86 CPU’s that support OpenCL 1.2 or newer. It’s not supported on Xeon Phi co-processors (but most OpenCL code doesn’t perform well on those anyway), and probably not supported on more exotic hardware either, like DSPs and FPGAs, but I haven’t programmed on these platforms.]