Not able to get RTK speedups on GPU

Hi.

I am using RTK to do iterative reconstruction. Everything seems to be working, except when trying to run things on the GPU. Everything compiles fine and it runs. However, when the GPU-compiled programs run, they are slow, slower than the CPU versions. nvtop shows that the process is present on the GPU, but that it is using 0% of GPU processing while it is hammering the CPU. This is true of RTK code that I wrote as well as the RTK-ready applications such as rtkfdk, admmtv, conjugategradient, etc. I have compiled (clean) ITK with the flags ITK_USE_GPU and RTK_USE_CUDA. Below is a list of system stuff (for reference) and the display from nvtop. Any pointers/advice is appreciated.

Ross

OS: Ubuntu 22.04
Graphics card: NVIDIA RTX 500
Driver Version: 535.183.01
CUDA Version: 12.2

@simon.rit might answer.

Thx

Could you please provide a concrete example of what you’re trying to do? For rtkfdk, you need to use the option --hardware cuda. For iterative reconstruction, you need to use forward and backprojectors, e.g. --fp CudaRayCast --bp CudaVoxelBased.

1 Like

Aha. I see these now in the command-line options for these executables. Let me try these options and see how it goes.

This seems to work. Thanks, so much, for your help!

1 Like

Hi,

Thanks so much for your help. RTK is really a great resource, and I have been studying how it is put together. Great design and software.

I am still having some trouble getting the GPU to engage consistently. When I run the applications that are built in RTK from the command line. Things work as expected. That is, the application uses the GPU consistently and does not use the CPU much. To determine this I use nvtop in linux. So for the commands:

Create a simulated geometry

rtksimulatedgeometry -n 180 -o geometry.xml

Create projections of the phantom file

rtkprojectshepploganphantom -g geometry.xml -o projections.mha --spacing 2 --dimension 800

Reconstruct

rtkregularizedconjugategradient -p . -r projections.mha -o regularizedrecon.mha -g geometry.xml --spacing 2,2,16 --dimension 800,800,100 --tviter 4 --gammatv 10.0 -n 10 --tikhonov 1.0 --gammalaplacian 1.0 --fp CudaRayCast --bp CudaVoxelBased

I get the following GPU behavior:

Which makes sense. Everything seems to be computed on the GPU. Very little use of the CPU during the iterative reconstruction process. However, if I create my own CG iterative code (.cxx attached), I get a different behavior, for the same geometry and reconstruction volume. There is a burst of GPU activity at each iteration and then the process hammers the CPU for several seconds. Like this:

I figure that some piece of the processing is not being done on the GPU. Must be some filter that I have not GPU enabled. However, I have not yet been able to figure out what I am missing. I have verified that the projections and the volume for reconstruction are cuda images, and all of the filters are templated for cuda images.

Any suggestions would be appreciated.

Regards,

Ross

FirstCudaReconstructionCG.cxx (5.0 KB)

I don’t know what could be the issue, I would have expected the same behaviour. Is your code indeed slower than the command line application?
My suggestion would be to run the two codes with the exact same parameters and to turn on the CMake option RTK_PROBE_EACH_FILTER. The --verbose option of the command line tool will then report the time spent in each filter. You can do the same in your code, see code here, and compare the printed results to understand what filter is causing this.