Speed up the reconstruction

I use CudaFDK to reconstruction my data, the speed is much faster than using CPU. But I still want to try and make this time shorter. I use Profile on pyCharm to print the time spent by the process like below:

Is there any way to speed up the progress? I found here that itk.MultiThreaderBase can achieve multithreading, but I don’t know how to apply it here.
Any reply will be appreciate.

GPU classes are parallel by their very nature. The only other significant function is create_module, taking 32 seconds. This might be the loading and import of all the Python modules. If you have itkConfig.LazyLoading = False in your code, try removing it or setting it to True.

Thank you for your reply.
I didn’t use itkConfig.LazyLoading = False on my code. To prevent default settings, I set itkConfig.LazyLoading = True before the code starts. But the speed did not improve.
The reason may be that I use from itk import XXX not import itk that have the same effect with LazyLoding to prevent loading all modules?

Most of the time is spent in GPUrec. You may want to optimize m_ProjectionsSubsetSize. I don’t see many other solutions, except maybe the block size in Cuda but this is currently hard-coded. Don’t hesitate to report positive or negative results, this is interesting to us too!

Thank you for your reply, I’ll try to change the value of m_ProjectionSubsetSize
During these times, a lot of time was spent on the first call to RTK as rtk.CudaImage.

According to the suggestion, I set itkConfig.LazyLoading = True, but it didn’t work. Is there any better way to reduce this time?

ITK’s package loading time is notoriously long. I don’t have a practical solution. One solution is to reduce the number of types at compilation time with the CMake variables ITK_WRAP_*, e.g., no double but float only, 3D and 4D only, etc. But I don’t have a set of predefined variables which would work for RTK and (most likely) many configurations are not supported.

Thank you for your advice.
In this case, the second run time will be reduced. Because it is not necessary to find all modules during the second run.

Usual ITK loading time is 3-8 seconds, so 30 seconds is way too excessive. I have no further suggestions about how to reduce it.

When I use rtk there will appear the error like this:
20240507105910
But this does not affect the subsequent code execution. Is this the reason for the long time to loading rtk?
My pip list is like below:

itk                       5.4rc1
itk-core                  5.4rc1
itk-cudacommon-cuda116    1.0.1
itk-filtering             5.4rc1
itk-io                    5.4rc1
itk-numerics              5.4rc1
itk-registration          5.4rc1
itk-rtk-cuda116           2.5.0
itk-segmentation          5.4rc1

This is not my experience but it really depends on what you use. The following code on my (recent) laptop

import time


def custom_callback(name, progress):
    global mod_start_time
    if progress == 0:
        mod_start_time = time.time()
    if progress == 1:
        t = time.time() - mod_start_time
        print(f"Loaded {name} in {t:.2f} s.")

import itkConfig
itkConfig.ImportCallback = custom_callback
from itk import RTK as rtk
start_time = time.time()
rtk.ThreeDCircularProjectionGeometry.New()
print("--- %s seconds ---" % (time.time() - start_time))

gives

Loaded ITKPyBase in 0.24 s.
Loaded ITKCommon in 0.51 s.
Loaded ITKImageSources in 0.04 s.
Loaded ITKStatistics in 0.11 s.
Loaded ITKImageFilterBase in 0.68 s.
Loaded ITKTransform in 0.11 s.
Loaded ITKImageFunction in 0.09 s.
Loaded ITKImageGrid in 0.79 s.
Loaded ITKFFT in 0.40 s.
Loaded ITKMesh in 0.19 s.
Loaded ITKSpatialObjects in 0.09 s.
Loaded ITKImageCompose in 0.07 s.
Loaded ITKImageStatistics in 0.46 s.
Loaded ITKPath in 0.02 s.
Loaded ITKImageIntensity in 5.54 s.
Loaded ITKThresholding in 0.79 s.
Loaded ITKConvolution in 0.08 s.
Loaded ITKSmoothing in 0.19 s.
Loaded ITKOptimizers in 0.02 s.
Loaded ITKImageGradient in 0.16 s.
Loaded ITKImageFeature in 0.32 s.
Loaded ITKFiniteDifference in 0.12 s.
Loaded ITKDisplacementField in 0.06 s.
Loaded ITKRegistrationCommon in 0.27 s.
Loaded ITKImageNoise in 0.28 s.
Loaded ITKIOBMP in 0.00 s.
Loaded ITKIOBioRad in 0.00 s.
Loaded ITKIOBruker in 0.00 s.
Loaded ITKIOGDCM in 0.01 s.
Loaded ITKIOIPL in 0.00 s.
Loaded ITKIOGE in 0.00 s.
Loaded ITKIOGIPL in 0.00 s.
Loaded ITKIOHDF5 in 0.00 s.
Loaded ITKIOJPEG in 0.00 s.
Loaded ITKIOJPEG2000 in 0.00 s.
Loaded ITKIOTIFF in 0.00 s.
Loaded ITKIOLSM in 0.00 s.
Loaded ITKIOMINC in 0.00 s.
Loaded ITKIOMRC in 0.00 s.
Loaded ITKIOMeta in 0.00 s.
Loaded ITKIONIFTI in 0.00 s.
Loaded ITKIONRRD in 0.00 s.
Loaded ITKIOPNG in 0.00 s.
Loaded ITKIOStimulate in 0.00 s.
Loaded ITKIOVTK in 0.00 s.
Loaded ITKIORAW in 0.01 s.
Loaded ITKBridgeNumPy in 0.03 s.
Loaded RTK in 3.60 s.
Loaded ITKIOImageBase in 3.60 s.
--- 15.493510484695435 seconds ---

if the modules have already been read from disk (it’s longer when it’s not in cache). >15 s is excessive, I agree!

I have never encountered this issue. Does it occur with a fresh install in a separate python environment?

I have run the code and the ouput like below, and I am using python 3.9.
From the results, the corresponding time trend for loading each component is the same, but my loading time will be longer.

Loaded ITKPyBase in 0.43 s.
Loaded ITKCommon in 0.91 s.
Loaded ITKImageSources in 0.04 s.
Loaded ITKStatistics in 0.15 s.
Loaded ITKImageFilterBase in 1.34 s.
Loaded ITKTransform in 0.20 s.
Loaded ITKImageFunction in 0.10 s.
Loaded ITKImageGrid in 1.62 s.
Loaded ITKFFT in 0.84 s.
Loaded ITKMesh in 0.16 s.
Loaded ITKSpatialObjects in 0.11 s.
Loaded ITKImageCompose in 0.11 s.
Loaded ITKImageStatistics in 0.87 s.
Loaded ITKPath in 0.02 s.
Loaded ITKImageIntensity in 11.99 s.
Loaded ITKThresholding in 1.17 s.
Loaded ITKConvolution in 0.11 s.
Loaded ITKSmoothing in 0.36 s.
Loaded ITKOptimizers in 0.02 s.
Loaded ITKImageGradient in 0.32 s.
Loaded ITKImageFeature in 0.60 s.
Loaded ITKFiniteDifference in 0.23 s.
Loaded ITKDisplacementField in 0.09 s.
Loaded ITKRegistrationCommon in 0.31 s.
Loaded ITKImageNoise in 0.64 s.
Loaded ITKIOBMP in 0.00 s.
Loaded ITKIOBioRad in 0.00 s.
Loaded ITKIOBruker in 0.00 s.
Loaded ITKIOGDCM in 0.01 s.
Loaded ITKIOIPL in 0.00 s.
Loaded ITKIOGE in 0.00 s.
Loaded ITKIOGIPL in 0.00 s.
Loaded ITKIOHDF5 in 0.00 s.
Loaded ITKIOJPEG in 0.00 s.
Loaded ITKIOJPEG2000 in 0.00 s.
Loaded ITKIOTIFF in 0.00 s.
Loaded ITKIOLSM in 0.00 s.
Loaded ITKIOMINC in 0.00 s.
Loaded ITKIOMRC in 0.00 s.
Loaded ITKIOMeta in 0.00 s.
Loaded ITKIONIFTI in 0.00 s.
Loaded ITKIONRRD in 0.00 s.
Loaded ITKIOPNG in 0.00 s.
Loaded ITKIOStimulate in 0.00 s.
Loaded ITKIOVTK in 0.00 s.
Loaded ITKIORAW in 0.01 s.
Loaded ITKBridgeNumPy in 0.03 s.
Loaded CudaCommon in 0.08 s.
Loaded RTK in 9.38 s.
Loaded ITKIOImageBase in 9.38 s.
--- 32.5577392578125 seconds ---

This problem has always existed even though I create a new python environment.