Speed up the reconstruction

I use CudaFDK to reconstruction my data, the speed is much faster than using CPU. But I still want to try and make this time shorter. I use Profile on pyCharm to print the time spent by the process like below:

Is there any way to speed up the progress? I found here that itk.MultiThreaderBase can achieve multithreading, but I don’t know how to apply it here.
Any reply will be appreciate.

GPU classes are parallel by their very nature. The only other significant function is create_module, taking 32 seconds. This might be the loading and import of all the Python modules. If you have itkConfig.LazyLoading = False in your code, try removing it or setting it to True.

Thank you for your reply.
I didn’t use itkConfig.LazyLoading = False on my code. To prevent default settings, I set itkConfig.LazyLoading = True before the code starts. But the speed did not improve.
The reason may be that I use from itk import XXX not import itk that have the same effect with LazyLoding to prevent loading all modules?

Most of the time is spent in GPUrec. You may want to optimize m_ProjectionsSubsetSize. I don’t see many other solutions, except maybe the block size in Cuda but this is currently hard-coded. Don’t hesitate to report positive or negative results, this is interesting to us too!

Thank you for your reply, I’ll try to change the value of m_ProjectionSubsetSize
During these times, a lot of time was spent on the first call to RTK as rtk.CudaImage.

According to the suggestion, I set itkConfig.LazyLoading = True, but it didn’t work. Is there any better way to reduce this time?

ITK’s package loading time is notoriously long. I don’t have a practical solution. One solution is to reduce the number of types at compilation time with the CMake variables ITK_WRAP_*, e.g., no double but float only, 3D and 4D only, etc. But I don’t have a set of predefined variables which would work for RTK and (most likely) many configurations are not supported.

Thank you for your advice.
In this case, the second run time will be reduced. Because it is not necessary to find all modules during the second run.

Usual ITK loading time is 3-8 seconds, so 30 seconds is way too excessive. I have no further suggestions about how to reduce it.

When I use rtk there will appear the error like this:
20240507105910
But this does not affect the subsequent code execution. Is this the reason for the long time to loading rtk?
My pip list is like below:

itk                       5.4rc1
itk-core                  5.4rc1
itk-cudacommon-cuda116    1.0.1
itk-filtering             5.4rc1
itk-io                    5.4rc1
itk-numerics              5.4rc1
itk-registration          5.4rc1
itk-rtk-cuda116           2.5.0
itk-segmentation          5.4rc1

This is not my experience but it really depends on what you use. The following code on my (recent) laptop

import time


def custom_callback(name, progress):
    global mod_start_time
    if progress == 0:
        mod_start_time = time.time()
    if progress == 1:
        t = time.time() - mod_start_time
        print(f"Loaded {name} in {t:.2f} s.")

import itkConfig
itkConfig.ImportCallback = custom_callback
from itk import RTK as rtk
start_time = time.time()
rtk.ThreeDCircularProjectionGeometry.New()
print("--- %s seconds ---" % (time.time() - start_time))

gives

Loaded ITKPyBase in 0.24 s.
Loaded ITKCommon in 0.51 s.
Loaded ITKImageSources in 0.04 s.
Loaded ITKStatistics in 0.11 s.
Loaded ITKImageFilterBase in 0.68 s.
Loaded ITKTransform in 0.11 s.
Loaded ITKImageFunction in 0.09 s.
Loaded ITKImageGrid in 0.79 s.
Loaded ITKFFT in 0.40 s.
Loaded ITKMesh in 0.19 s.
Loaded ITKSpatialObjects in 0.09 s.
Loaded ITKImageCompose in 0.07 s.
Loaded ITKImageStatistics in 0.46 s.
Loaded ITKPath in 0.02 s.
Loaded ITKImageIntensity in 5.54 s.
Loaded ITKThresholding in 0.79 s.
Loaded ITKConvolution in 0.08 s.
Loaded ITKSmoothing in 0.19 s.
Loaded ITKOptimizers in 0.02 s.
Loaded ITKImageGradient in 0.16 s.
Loaded ITKImageFeature in 0.32 s.
Loaded ITKFiniteDifference in 0.12 s.
Loaded ITKDisplacementField in 0.06 s.
Loaded ITKRegistrationCommon in 0.27 s.
Loaded ITKImageNoise in 0.28 s.
Loaded ITKIOBMP in 0.00 s.
Loaded ITKIOBioRad in 0.00 s.
Loaded ITKIOBruker in 0.00 s.
Loaded ITKIOGDCM in 0.01 s.
Loaded ITKIOIPL in 0.00 s.
Loaded ITKIOGE in 0.00 s.
Loaded ITKIOGIPL in 0.00 s.
Loaded ITKIOHDF5 in 0.00 s.
Loaded ITKIOJPEG in 0.00 s.
Loaded ITKIOJPEG2000 in 0.00 s.
Loaded ITKIOTIFF in 0.00 s.
Loaded ITKIOLSM in 0.00 s.
Loaded ITKIOMINC in 0.00 s.
Loaded ITKIOMRC in 0.00 s.
Loaded ITKIOMeta in 0.00 s.
Loaded ITKIONIFTI in 0.00 s.
Loaded ITKIONRRD in 0.00 s.
Loaded ITKIOPNG in 0.00 s.
Loaded ITKIOStimulate in 0.00 s.
Loaded ITKIOVTK in 0.00 s.
Loaded ITKIORAW in 0.01 s.
Loaded ITKBridgeNumPy in 0.03 s.
Loaded RTK in 3.60 s.
Loaded ITKIOImageBase in 3.60 s.
--- 15.493510484695435 seconds ---

if the modules have already been read from disk (it’s longer when it’s not in cache). >15 s is excessive, I agree!

I have never encountered this issue. Does it occur with a fresh install in a separate python environment?

I have run the code and the ouput like below, and I am using python 3.9.
From the results, the corresponding time trend for loading each component is the same, but my loading time will be longer.

Loaded ITKPyBase in 0.43 s.
Loaded ITKCommon in 0.91 s.
Loaded ITKImageSources in 0.04 s.
Loaded ITKStatistics in 0.15 s.
Loaded ITKImageFilterBase in 1.34 s.
Loaded ITKTransform in 0.20 s.
Loaded ITKImageFunction in 0.10 s.
Loaded ITKImageGrid in 1.62 s.
Loaded ITKFFT in 0.84 s.
Loaded ITKMesh in 0.16 s.
Loaded ITKSpatialObjects in 0.11 s.
Loaded ITKImageCompose in 0.11 s.
Loaded ITKImageStatistics in 0.87 s.
Loaded ITKPath in 0.02 s.
Loaded ITKImageIntensity in 11.99 s.
Loaded ITKThresholding in 1.17 s.
Loaded ITKConvolution in 0.11 s.
Loaded ITKSmoothing in 0.36 s.
Loaded ITKOptimizers in 0.02 s.
Loaded ITKImageGradient in 0.32 s.
Loaded ITKImageFeature in 0.60 s.
Loaded ITKFiniteDifference in 0.23 s.
Loaded ITKDisplacementField in 0.09 s.
Loaded ITKRegistrationCommon in 0.31 s.
Loaded ITKImageNoise in 0.64 s.
Loaded ITKIOBMP in 0.00 s.
Loaded ITKIOBioRad in 0.00 s.
Loaded ITKIOBruker in 0.00 s.
Loaded ITKIOGDCM in 0.01 s.
Loaded ITKIOIPL in 0.00 s.
Loaded ITKIOGE in 0.00 s.
Loaded ITKIOGIPL in 0.00 s.
Loaded ITKIOHDF5 in 0.00 s.
Loaded ITKIOJPEG in 0.00 s.
Loaded ITKIOJPEG2000 in 0.00 s.
Loaded ITKIOTIFF in 0.00 s.
Loaded ITKIOLSM in 0.00 s.
Loaded ITKIOMINC in 0.00 s.
Loaded ITKIOMRC in 0.00 s.
Loaded ITKIOMeta in 0.00 s.
Loaded ITKIONIFTI in 0.00 s.
Loaded ITKIONRRD in 0.00 s.
Loaded ITKIOPNG in 0.00 s.
Loaded ITKIOStimulate in 0.00 s.
Loaded ITKIOVTK in 0.00 s.
Loaded ITKIORAW in 0.01 s.
Loaded ITKBridgeNumPy in 0.03 s.
Loaded CudaCommon in 0.08 s.
Loaded RTK in 9.38 s.
Loaded ITKIOImageBase in 9.38 s.
--- 32.5577392578125 seconds ---

This problem has always existed even though I create a new python environment.

Which OS and Python version?

My OS is Win10 Professional Edition and python version is 3.9.

I create a new python environment and install itk by CMAKE instead of pip install. There’s a lot less time left. I am very confused about this result beacuse I didn’t reduce the modules numbers.
Actually, using pip would be more convenient for me, but the loading rtk time is too long.

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8
Loaded ITKPyBase in 0.19 s.
Loaded ITKCommon in 0.44 s.
Loaded ITKImageSources in 0.03 s.
Loaded ITKStatistics in 0.11 s.
Loaded ITKImageFilterBase in 0.28 s.
Loaded ITKTransform in 0.11 s.
Loaded ITKImageFunction in 0.07 s.
Loaded ITKImageGrid in 0.58 s.
Loaded ITKFFT in 0.27 s.
Loaded ITKMesh in 0.11 s.
Loaded ITKSpatialObjects in 0.10 s.
Loaded ITKImageCompose in 0.04 s.
Loaded ITKImageStatistics in 0.28 s.
Loaded ITKPath in 0.02 s.
Loaded ITKImageIntensity in 2.43 s.
Loaded ITKThresholding in 0.53 s.
Loaded ITKConvolution in 0.05 s.
Loaded ITKSmoothing in 0.12 s.
Loaded ITKOptimizers in 0.02 s.
Loaded ITKImageGradient in 0.11 s.
Loaded ITKImageFeature in 0.23 s.
Loaded ITKFiniteDifference in 0.09 s.
Loaded ITKDisplacementField in 0.06 s.
Loaded ITKRegistrationCommon in 0.18 s.
Loaded ITKImageNoise in 0.17 s.
Loaded ITKIOBMP in 0.00 s.
Loaded ITKIOBioRad in 0.00 s.
Loaded ITKIOBruker in 0.00 s.
Loaded ITKIOGDCM in 0.02 s.
Loaded ITKIOIPL in 0.00 s.
Loaded ITKIOGE in 0.01 s.
Loaded ITKIOGIPL in 0.00 s.
Loaded ITKIOHDF5 in 0.01 s.
Loaded ITKIOJPEG in 0.00 s.
Loaded ITKIOJPEG2000 in 0.00 s.
Loaded ITKIOTIFF in 0.00 s.
Loaded ITKIOLSM in 0.00 s.
Loaded ITKIOMINC in 0.00 s.
Loaded ITKIOMRC in 0.00 s.
Loaded ITKIOMeta in 0.00 s.
Loaded ITKIONIFTI in 0.00 s.
Loaded ITKIONRRD in 0.00 s.
Loaded ITKIOPNG in 0.00 s.
Loaded ITKIOStimulate in 0.00 s.
Loaded ITKIOVTK in 0.00 s.
Loaded ITKIORAW in 0.01 s.
Loaded ITKBridgeNumPy in 0.03 s.
Loaded RTK in 2.43 s.
Loaded ITKIOImageBase in 2.43 s.
--- 9.409522771835327 seconds ---

This is probably because you have activated less wrapping types in your compilation options. Can you provide the output of grep ITK_WRAP CMakeCache.txt in your binary directory?

Sorry for taking so long to reply. The file is shown in the attachment
CMakeCache.txt (566.8 KB)
I have some issues with the use of itk Python generated by cmake. I use itk by the WrapITK.pth, could I use itk by copy the file path on .pth to site-package. That I can use itk as I pip it.

Unlike the pypi packages, your CMakeCache.txt indicates that you do not wrap 4D images (ITK_WRAP_IMAGE_DIMS:STRING=2;3), double (ITK_WRAP_double:BOOL=OFF) and unsigned_short (ITK_WRAP_unsigned_short:BOOL=OFF).
You cannot pip ITK compiled like this I think but you could generate wheels if that’s what you want using ITKPythonPackage suite.

3 Likes

As your advice, I tried to use ITKPythonPackage to generate whheels. But I failed.
Can you give me some advice on how can I generate itk5.4rc1 and rtk2.5 with GPU, and set (ITK_WRAP_IMAGE_DIMS:STRING=2;3)?

Have you checked the tutorial? I don’t have a better solution than ITKPythonPackage.