I use CudaFDK to reconstruction my data, the speed is much faster than using CPU. But I still want to try and make this time shorter. I use Profile on pyCharm to print the time spent by the process like below:
Is there any way to speed up the progress? I found here that itk.MultiThreaderBase can achieve multithreading, but I don’t know how to apply it here.
Any reply will be appreciate.
GPU classes are parallel by their very nature. The only other significant function is create_module, taking 32 seconds. This might be the loading and import of all the Python modules. If you have itkConfig.LazyLoading = False in your code, try removing it or setting it to True.
Thank you for your reply.
I didn’t use itkConfig.LazyLoading = False on my code. To prevent default settings, I set itkConfig.LazyLoading = True before the code starts. But the speed did not improve.
The reason may be that I use from itk import XXX not import itk that have the same effect with LazyLoding to prevent loading all modules?
Most of the time is spent in GPUrec. You may want to optimize m_ProjectionsSubsetSize. I don’t see many other solutions, except maybe the block size in Cuda but this is currently hard-coded. Don’t hesitate to report positive or negative results, this is interesting to us too!
Thank you for your reply, I’ll try to change the value of m_ProjectionSubsetSize
During these times, a lot of time was spent on the first call to RTK as rtk.CudaImage.
According to the suggestion, I set itkConfig.LazyLoading = True, but it didn’t work. Is there any better way to reduce this time?
ITK’s package loading time is notoriously long. I don’t have a practical solution. One solution is to reduce the number of types at compilation time with the CMake variables ITK_WRAP_*, e.g., no double but float only, 3D and 4D only, etc. But I don’t have a set of predefined variables which would work for RTK and (most likely) many configurations are not supported.
This is not my experience but it really depends on what you use. The following code on my (recent) laptop
import time
def custom_callback(name, progress):
global mod_start_time
if progress == 0:
mod_start_time = time.time()
if progress == 1:
t = time.time() - mod_start_time
print(f"Loaded {name} in {t:.2f} s.")
import itkConfig
itkConfig.ImportCallback = custom_callback
from itk import RTK as rtk
start_time = time.time()
rtk.ThreeDCircularProjectionGeometry.New()
print("--- %s seconds ---" % (time.time() - start_time))
gives
Loaded ITKPyBase in 0.24 s.
Loaded ITKCommon in 0.51 s.
Loaded ITKImageSources in 0.04 s.
Loaded ITKStatistics in 0.11 s.
Loaded ITKImageFilterBase in 0.68 s.
Loaded ITKTransform in 0.11 s.
Loaded ITKImageFunction in 0.09 s.
Loaded ITKImageGrid in 0.79 s.
Loaded ITKFFT in 0.40 s.
Loaded ITKMesh in 0.19 s.
Loaded ITKSpatialObjects in 0.09 s.
Loaded ITKImageCompose in 0.07 s.
Loaded ITKImageStatistics in 0.46 s.
Loaded ITKPath in 0.02 s.
Loaded ITKImageIntensity in 5.54 s.
Loaded ITKThresholding in 0.79 s.
Loaded ITKConvolution in 0.08 s.
Loaded ITKSmoothing in 0.19 s.
Loaded ITKOptimizers in 0.02 s.
Loaded ITKImageGradient in 0.16 s.
Loaded ITKImageFeature in 0.32 s.
Loaded ITKFiniteDifference in 0.12 s.
Loaded ITKDisplacementField in 0.06 s.
Loaded ITKRegistrationCommon in 0.27 s.
Loaded ITKImageNoise in 0.28 s.
Loaded ITKIOBMP in 0.00 s.
Loaded ITKIOBioRad in 0.00 s.
Loaded ITKIOBruker in 0.00 s.
Loaded ITKIOGDCM in 0.01 s.
Loaded ITKIOIPL in 0.00 s.
Loaded ITKIOGE in 0.00 s.
Loaded ITKIOGIPL in 0.00 s.
Loaded ITKIOHDF5 in 0.00 s.
Loaded ITKIOJPEG in 0.00 s.
Loaded ITKIOJPEG2000 in 0.00 s.
Loaded ITKIOTIFF in 0.00 s.
Loaded ITKIOLSM in 0.00 s.
Loaded ITKIOMINC in 0.00 s.
Loaded ITKIOMRC in 0.00 s.
Loaded ITKIOMeta in 0.00 s.
Loaded ITKIONIFTI in 0.00 s.
Loaded ITKIONRRD in 0.00 s.
Loaded ITKIOPNG in 0.00 s.
Loaded ITKIOStimulate in 0.00 s.
Loaded ITKIOVTK in 0.00 s.
Loaded ITKIORAW in 0.01 s.
Loaded ITKBridgeNumPy in 0.03 s.
Loaded RTK in 3.60 s.
Loaded ITKIOImageBase in 3.60 s.
--- 15.493510484695435 seconds ---
if the modules have already been read from disk (it’s longer when it’s not in cache). >15 s is excessive, I agree!
I have run the code and the ouput like below, and I am using python 3.9.
From the results, the corresponding time trend for loading each component is the same, but my loading time will be longer.
Loaded ITKPyBase in 0.43 s.
Loaded ITKCommon in 0.91 s.
Loaded ITKImageSources in 0.04 s.
Loaded ITKStatistics in 0.15 s.
Loaded ITKImageFilterBase in 1.34 s.
Loaded ITKTransform in 0.20 s.
Loaded ITKImageFunction in 0.10 s.
Loaded ITKImageGrid in 1.62 s.
Loaded ITKFFT in 0.84 s.
Loaded ITKMesh in 0.16 s.
Loaded ITKSpatialObjects in 0.11 s.
Loaded ITKImageCompose in 0.11 s.
Loaded ITKImageStatistics in 0.87 s.
Loaded ITKPath in 0.02 s.
Loaded ITKImageIntensity in 11.99 s.
Loaded ITKThresholding in 1.17 s.
Loaded ITKConvolution in 0.11 s.
Loaded ITKSmoothing in 0.36 s.
Loaded ITKOptimizers in 0.02 s.
Loaded ITKImageGradient in 0.32 s.
Loaded ITKImageFeature in 0.60 s.
Loaded ITKFiniteDifference in 0.23 s.
Loaded ITKDisplacementField in 0.09 s.
Loaded ITKRegistrationCommon in 0.31 s.
Loaded ITKImageNoise in 0.64 s.
Loaded ITKIOBMP in 0.00 s.
Loaded ITKIOBioRad in 0.00 s.
Loaded ITKIOBruker in 0.00 s.
Loaded ITKIOGDCM in 0.01 s.
Loaded ITKIOIPL in 0.00 s.
Loaded ITKIOGE in 0.00 s.
Loaded ITKIOGIPL in 0.00 s.
Loaded ITKIOHDF5 in 0.00 s.
Loaded ITKIOJPEG in 0.00 s.
Loaded ITKIOJPEG2000 in 0.00 s.
Loaded ITKIOTIFF in 0.00 s.
Loaded ITKIOLSM in 0.00 s.
Loaded ITKIOMINC in 0.00 s.
Loaded ITKIOMRC in 0.00 s.
Loaded ITKIOMeta in 0.00 s.
Loaded ITKIONIFTI in 0.00 s.
Loaded ITKIONRRD in 0.00 s.
Loaded ITKIOPNG in 0.00 s.
Loaded ITKIOStimulate in 0.00 s.
Loaded ITKIOVTK in 0.00 s.
Loaded ITKIORAW in 0.01 s.
Loaded ITKBridgeNumPy in 0.03 s.
Loaded CudaCommon in 0.08 s.
Loaded RTK in 9.38 s.
Loaded ITKIOImageBase in 9.38 s.
--- 32.5577392578125 seconds ---
This problem has always existed even though I create a new python environment.
I create a new python environment and install itk by CMAKE instead of pip install. There’s a lot less time left. I am very confused about this result beacuse I didn’t reduce the modules numbers.
Actually, using pip would be more convenient for me, but the loading rtk time is too long.
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8
Loaded ITKPyBase in 0.19 s.
Loaded ITKCommon in 0.44 s.
Loaded ITKImageSources in 0.03 s.
Loaded ITKStatistics in 0.11 s.
Loaded ITKImageFilterBase in 0.28 s.
Loaded ITKTransform in 0.11 s.
Loaded ITKImageFunction in 0.07 s.
Loaded ITKImageGrid in 0.58 s.
Loaded ITKFFT in 0.27 s.
Loaded ITKMesh in 0.11 s.
Loaded ITKSpatialObjects in 0.10 s.
Loaded ITKImageCompose in 0.04 s.
Loaded ITKImageStatistics in 0.28 s.
Loaded ITKPath in 0.02 s.
Loaded ITKImageIntensity in 2.43 s.
Loaded ITKThresholding in 0.53 s.
Loaded ITKConvolution in 0.05 s.
Loaded ITKSmoothing in 0.12 s.
Loaded ITKOptimizers in 0.02 s.
Loaded ITKImageGradient in 0.11 s.
Loaded ITKImageFeature in 0.23 s.
Loaded ITKFiniteDifference in 0.09 s.
Loaded ITKDisplacementField in 0.06 s.
Loaded ITKRegistrationCommon in 0.18 s.
Loaded ITKImageNoise in 0.17 s.
Loaded ITKIOBMP in 0.00 s.
Loaded ITKIOBioRad in 0.00 s.
Loaded ITKIOBruker in 0.00 s.
Loaded ITKIOGDCM in 0.02 s.
Loaded ITKIOIPL in 0.00 s.
Loaded ITKIOGE in 0.01 s.
Loaded ITKIOGIPL in 0.00 s.
Loaded ITKIOHDF5 in 0.01 s.
Loaded ITKIOJPEG in 0.00 s.
Loaded ITKIOJPEG2000 in 0.00 s.
Loaded ITKIOTIFF in 0.00 s.
Loaded ITKIOLSM in 0.00 s.
Loaded ITKIOMINC in 0.00 s.
Loaded ITKIOMRC in 0.00 s.
Loaded ITKIOMeta in 0.00 s.
Loaded ITKIONIFTI in 0.00 s.
Loaded ITKIONRRD in 0.00 s.
Loaded ITKIOPNG in 0.00 s.
Loaded ITKIOStimulate in 0.00 s.
Loaded ITKIOVTK in 0.00 s.
Loaded ITKIORAW in 0.01 s.
Loaded ITKBridgeNumPy in 0.03 s.
Loaded RTK in 2.43 s.
Loaded ITKIOImageBase in 2.43 s.
--- 9.409522771835327 seconds ---
This is probably because you have activated less wrapping types in your compilation options. Can you provide the output of grep ITK_WRAP CMakeCache.txt in your binary directory?
Sorry for taking so long to reply. The file is shown in the attachment CMakeCache.txt (566.8 KB)
I have some issues with the use of itk Python generated by cmake. I use itk by the WrapITK.pth, could I use itk by copy the file path on .pth to site-package. That I can use itk as I pip it.
Unlike the pypi packages, your CMakeCache.txt indicates that you do not wrap 4D images (ITK_WRAP_IMAGE_DIMS:STRING=2;3), double (ITK_WRAP_double:BOOL=OFF) and unsigned_short (ITK_WRAP_unsigned_short:BOOL=OFF).
You cannot pip ITK compiled like this I think but you could generate wheels if that’s what you want using ITKPythonPackage suite.
As your advice, I tried to use ITKPythonPackage to generate whheels. But I failed.
Can you give me some advice on how can I generate itk5.4rc1 and rtk2.5 with GPU, and set (ITK_WRAP_IMAGE_DIMS:STRING=2;3)?