Speed up the reconstruction

GPU classes are parallel by their very nature. The only other significant function is create_module, taking 32 seconds. This might be the loading and import of all the Python modules. If you have itkConfig.LazyLoading = False in your code, try removing it or setting it to True.