ITK ERROR: CUDA ERROR: out of memory

zhangm · April 18, 2024, 1:20am

My data size is 5.03G with type uint16. When use FDK to reconstrunction the data type will cast into float, so my data size is 10.04G. And my output size set to 1.31G. According to this setting to reconstructing data by FDK, there will appear error:

RuntimeError: C:\runner\_work\im\src\rtkCudaFFTProjectionsConvolutionImageFilter.cu:83:
ITK ERROR: CUDA ERROR: out of memory

My GPU memory is about 23G. Before the error message appeared, the GPU had already used about 10G, indicating that the data had been loaded onto the GPU. According to the error message, at line 83, a new space needs to be created on the GPU, but my GPU memory is running low, so an error occurred.
In this case, can I use two GPUs? I set os.environ['CUDA_VISIBLE_DEVICES'] = '0, 1', but it doesn’t work.
Any reply will be appreciate.

simon.rit · April 18, 2024, 7:32am

No, ITKCudaCommon is single GPU, you cannot use two for the computation. The GPU is automatically selected by the Nvidia driver:

github.com

RTKConsortium/ITKCudaCommon/blob/master/src/itkCudaDataManager.cxx#L27


      
          *=========================================================================*/
          
          #include "itkCudaDataManager.h"
          #include <itksys/SystemTools.hxx>
          
          namespace itk
          {
          // constructor
          CudaDataManager::CudaDataManager()
          {
           m_Device = itk::CudaGetMaxFlopsDev();
           CUDA_CHECK(cudaSetDevice(m_Device));
          
           m_CPUBuffer = nullptr;
           m_GPUBuffer = GPUMemPointer::New();
           this->Initialize();
          
           m_ReleaseDirtyGPUBuffer = true;
           std::string relString;
           if (itksys::SystemTools::GetEnv("ITK_RELEASE_DIRTY_GPU_BUFFERS", relString) &&
               (itksys::SystemTools::LowerCase(relString) == "false" || atoi(relString.c_str()) != 0))

zhangm · April 18, 2024, 7:41am

Thank you for your reply. So I can only solve this problem by reducing the data size. Also, may I ask if in Python, in order to ensure smooth FDK reconstruction, the size of the GPU must be at least twice the size of the data? There seems to be no such limitation in the use of C++. Is it because Python and C++ load data differently?

simon.rit · April 18, 2024, 9:01am

There is a solution: you can stream the reconstruction by pieces. This is done in the C++ code:

github.com

SimonRit/RTK/blob/master/applications/rtkfdk/rtkfdk.cxx#L201-L208


      
          // Streaming depending on streaming capability of writer
          using StreamerType = itk::StreamingImageFilter<CPUOutputImageType, CPUOutputImageType>;
          StreamerType::Pointer streamerBP = StreamerType::New();
          streamerBP->SetInput(pfeldkamp);
          streamerBP->SetNumberOfStreamDivisions(args_info.divisions_arg);
          itk::ImageRegionSplitterDirection::Pointer splitter = itk::ImageRegionSplitterDirection::New();
          splitter->SetDirection(2); // Prevent splitting along z axis. As a result, splitting will be performed along y axis
          streamerBP->SetRegionSplitter(splitter);

Thanks to the fantastic pipeline mechanism of ITK, if you don’t update the projections but only the end of the pipeline (the streaming filter), it will reconstruct the image piece by piece and only load the required parts of the projections. That can be done in Python too.

There shouldn’t be any difference if the codes are similar. There must be a difference, maybe an extra Update().

zhangm · April 19, 2024, 12:51am

Thank you for your advice. I have this question beacause with the same data C++ can reconstruction correctly even though the GPU memory is 15.9G. I will check the difference between my code.