What is the status of ITK-GPU development? Is there anyone currently working on it? Seems like some OpenCL stuff… any CUDA updates besides the small Cuda ITK project released a few years ago?
Hi Nick,
One of the promising efforts is ITKArrayFire, but this is very much a WIP.
We have another ITK performance discussion coming which would be a good time to discuss more:
hm, i see. very interested in this… i guess i’ll bring it up at that meeting then. thx
Hello,
is there any progress with the integration of ITKArrayFire?
How does the performance discussion ended?
Hi @Gordian,
My development on ITKArrayFire is currently on hold, but if you or some else would like to take it up, I would be happy to discuss next steps.
Thanks,
Matt
Dear all,
I’m just capturing this thread because I was wondering how far OpenCL found its way into ITK.
I recently implemented a list of OpenCL kernels and a wrapper for the ImageJ/Java-verse. Would it make sense to make them available in the ITK-verse? Or is there already a collection of OpenCL-kernels somewhere which I can just extend?
I’m searching for a good entry point code/documentation-wise. I found this quite not-so-recent wiki page:
https://itk.org/Wiki/ITK/Release_4/GPU_Acceleration
And the source code for the OpenCL stuff lives here, right?
Do I miss any major resource regarding ITKs OpenCL support?
Any link would be helpful! Thanks!
Cheers,
Robert
Dear Robert,
Amazing work on CLIJ!
Would it make sense to make them available in the ITK-verse?
Yes, that would be awesome!
Or is there already a collection of OpenCL-kernels somewhere which I can just extend?
Correct. There are also the modules ITKGPUFiniteDifference, ITKGPUImageFilterBase, ITKGPUSmoothing, ITKGPUThresholding, and ITKGPUPDEDeformableRegistration.
Do I miss any major resource regarding ITKs OpenCL support?
Here’s another resource.
Overall, OpenCL adoption in ITK did not take off because
- The OpenCL API is cumbersome. There is a lot of low-level resource management required. It is more of a C API than a convenient C++ API. Separate kernel code is awkward. There are not templates like C++, which makes for more work for writing N-D algorithms that work across many pixel types.
- Builds, packaging, and deployment were an issue.
SYCL addresses all these issues. It uses elegant, standard, modern C++ that is a joy to work with. It is single-source, and a lot of the low-level management and book-keeping has been abstracted away nicely. For deployment, it is using SPIR, which addresses the previous deployment issues. It has been under development for quite some time by the Khronos group, and implementations are maturing. It builds against the OpenCL API / headers, so it can work across hardware accelerators with minimal driver issues. The most interesting implementation is the cross-platform, upstream Clang support under development by Intel.
It would be awesome if we had an open source SYCL N-D image algorithm library that we could re-use in ImageJ and ITK. We can then serve Java, C++, Python, and Web Platform environments. The ITK C++ OpenCL infrastructure could be re-used to enable applications of the kernels when streaming on large datasets and transparently avoiding CPU-GPU memory transfers in processing pipelines. We can also easily build native cross-platform Python packages. We have a convenient repository template that provides the infrastructure for generating and building Pythonic, binary, cross-platform packages from the C++ code. There are pocl Python packages we can depend on to provide driver interfaces that work across CPUs and GPUs. With itk.js, eventually we may be able to just rebuild to have the code available on the Web Platform.
Another reason SYCL is interesting: like OpenCL, it works across GPUs, whether AMD, NVIDIA, or mobile device GPUs. In addition, Xilinx and Intel are working to add support for their FPGAs. This is interesting for low-power, real-time ultrasound imaging or for processing the extremely high throughput output on newer microscopy systems.
Alright, thanks for the links! I’ll see if I can build ITK with OpenCL and put it in some of our kernels. That should give me an idea if high the efforts are. The OpenCL-related issues you mentioned were mostly solved in CLIJ btw. E.g. we use an image-type-agnostic dialect of OpenCL and have a nicely working memory management.
SYCL sounds very interesting, but I have the feeling it’s not widely adopted by the community yet. I couldn’t find many adoptable code examples…
When it comes to joint efforts for building a GPU-backend that is usable from ITK and ImageJ - count me in! I’d be happy to support these efforts.
Thanks again for the hints! I’ll keep you posted.
Cheers,
Robert
Awesome, Robert! Please let us know if you run into any issues.
Just as a brief contribution to this topic, I’d like to point out that in many cases you can get GPU acceleration for free with minimal maintenance burden by making sure that you use BLAS instead of other numerical libraries (via GPU-specific implementations of BLAS like cuBLAS and clBLAS). I know that in the past there was an effort to refactor ITK’s use of numerical libraries here. I’m not certain where that went, but it might be worth pursuing a refactor first.
Also, just as a brief 2 cents- Hackernews (which should always be taken with a grain of salt) doesn’t seem very bullish on the OpenCL ecosystem in general. The recent release of OpenCL 3.0 has been met with a great deal of derision from HN readers (see here). In this light, I think it’s dangerous to commit to fully commit to an OpenCL implementation (despite its technical merits), since CUDA still seems to have become the de facto standard for both NVIDIA and AMD GPUs at this point. The introduction of Intel’s oneAPI to the market also adds complications, although I’m not confident that it will be particularly competitive given CUDA’s first-mover advantage.
@jpellman good points. We are not fully committed to only OpenCL, but we should still make it available as an option given most importantly, @haesleinhuepf’s amazing work, but also existing support in ITK, and the fact that it has the best support across a diverse set of computing architectures.
The RTK module has CUDA support, which we can also may broadly available as another option.
The oneAPI is fresher, but there is appeal in its syntax as it is a step towards, standard, elegant, modern C++ (see C++ Executors). It has a promising future, although we will see how long that future takes to materialize, because of its broad hardware support and ability to re-use existing CUDA infrastructure. E.g. cuBLAS is re-used here: