I notice there’s quite a few “accelerator” type options for ITK builds, but the documentation regarding what they do/impact is very sparse to non-existent.
Can anyone point me at some docs, or enlighten me as to how much benefit I can actually expect if I go about flipping these switches on?
I see there’s also the GPU filters, but it looks like those aren’t drop-in in the sense you need to change your code to use them. Are you aware of any benchmarks there?
I have been trying to get MKL with TBB working in ITK (v5.2rc03) with no luck so far.
FFTW gives me good performance. (Turned on ITK_USE_FFTW* options) 45+% occupancy on all 32 cores
I was unable to get any performance boost using cuFFT (Turned on using ITK_USE_CUFFTW)
MKL seems to be only using sequential no matter what I do i.e. turn on Module_ITKTBB or change ITK_DEFAULT_THREADER to TBB. I am using oneAPI TBB binary release from Release oneTBB 2021.1.1 · oneapi-src/oneTBB · GitHub
Can you please guide me on what I am missing in terms of configuration or otherwise.
Hardware is 32-Core Intel CPU with a Tesla K20c GPU.
Update1: I found the ITK_USE_MKL_WITH_TBB was commented out and that is the reason it was always sequential. Numbers make sense now. However, When I tried to uncomment that section and use MKL support without Module_ITKTBB turned ON, I get the following runtime error.
terminate called after throwing an instance of 'itk::ExceptionObject'
what(): ../Modules/Core/Common/src/itkMultiThreaderBase.cxx:408:
itk::ERROR: ITK has been built without TBB support!
Aborted (core dumped)
Update 2: With Module_ITKTBB turned on, the above error is gone but there isn’t much difference in performance. The cores only reach a max of 11% occupancy that too only occasionally. Given below is the ldd output of executable. I am guessing the two different TBBs are the cause of this low performance. Please confirm if that is the case and also a work around for it
I found out there is only minor performance difference between MKL+TBB and FFTW. On Linux, it is more convenient to use FFTW, while on Windows MKL+TBB is easier.
Okay, there is some progress with respect to finding TBB and MKL automatically.
Using oneMKL from oneAPI base toolkit, both MKL and TBB are found automatically like a breeze. I am using same TBB for both Module_ITKTBB and MKL.
However, I am still facing similar performance issue where FFTW based ITK gives better performance than MKL based one. The only difference between these two ITK build binaries is that ITK_USE_MKL is ON/OFF. All other CMake options are identical.
Also, you haven’t mentioned anything about why even if I set cuFFT values manually, FFTs don’t seem to show any improvement in performance ? I haven’t looked at cuFFT cmake in ITK yet. Hints from you guys might improve my chances of figuring out the problem.