extern templates to reduce build time

Hello,

In OTB (which is heavily based on ITK) we have been looking at the possibility of introducing explicit instantiation of common template classes in the dynamic libraries we build, together with “extern template” declarations in headers to reduce build times significantly. A quick grep in ITK source shows that this C++11 feature is barely used in ITK, and it seems like it is for another purpose (dynamic_cast issues).

After a few test on our side, the best approach seems to be:

  • In the header file: declare common template classes specializations as “extern template”
  • In the cxx file: explicitely instantiate the same specializations

This means that client code will not need to instantiate a template type (reducing build time significantly). It will need however to link with the corresponding lib. The benefit of this approach is supported by doing a quick:

$ nm -g --demangle $(find . -name “*.o”) | sort | uniq -c | sort -n | grep " W "

in a build directory, and noticing that in a full build, the compiler instantiates the same template types tens of thousands of times. (why it does not keep a cache is beyond me…). This can be seen with the compiler option -ftime-report, also.

This approach is promising (about ~15% reduction in build time in my initial tests, more can be achieved I think), but raises two issues which are probably worth discussing with ITK devs:

  1. How to handle specialization on a type that is in another module, or another library? For example in OTB we have some:
    itk::ImageSource<otb::VectorImage<double, 2u> >

(and many more). Where should the “extern template” declarations and explicit instantiations for this type be in the code? extern template declarations require the header of the type on which the template is specialized the be included (I think). This poses no issues for itk::Image, but some issues for more complex templating.

  1. How to handle MSVC and gcc incompatibilities regarding library export keywords (decltype, dllexport, _EXPORT macros & co). This issue is explained well by this page.

From my research I found that ITK handles that somewhat with EXPORT_EXPLICIT macros. Other C++ projects out there have tackled that issue as well. See for example:

https://bugs.chromium.org/p/chromium/issues/detail?id=420695
https://chromium.googlesource.com/chromium/src/+/66.0.3359.158/base/export_template.h

So, questions:

  • Would ITK be interested in using extern templates to reduce build time? Which approach would you take?
  • How to make extern templates portable accross MSVC and gcc/clang? Something like Chromium’s export_template.h? How does this interact with CMake’s generation of _EXPORT macros? Maybe it needs to be a patch all the way in CMake’s generate_export_header?

Thanks

1 Like

Hi Victor,

Explicit instantiation support would be a welcome contribution to ITK. This has been investigated a few times in the past, but we finally may have the infrastructure to make it feasible.

We can use ITK’s Python wrapping infrastructure to generate the explicit templates. This would re-use all the information about which template instantiations make sense, which is defined for the entire toolkit. Explicit instantiation started in this way some time ago, but it was removed because it was not maintained. However, it could be revitalized by adding a new directory in ITK/Wrapping/Generators/Explicit/ with the appropriate CMake configuration.

This will also enable other ITK extension module to add their own type type instantiations, e.g. for itk::ImageSource<otb::VectorImage<double, 2u> >, using the external module system.

Yes, portable extern templates across compilers, static and shared library configurations, and platforms is tricky. However, managed to do it in ITK, and these classes can be used an example.

But, this could be auto-generated by the system in CMake to produce a header that is included when explicit instantiation is enabled.

Thanks for the feedback!

Do you have a link where I could read about ITK’s Python wrapping infrastructure? I’m not so familiar with it. In particular where is the list of types that make sense to explicitly instantiate?

Regarding export macros, I wonder if instead of writing custom CMake code per project to generate those macros, it might be nicer if CMake’s own generate_export_header did that? Do you think CMake developers would be interested in adding that functionality?

Yes, more information ITK’s Python wrapping infrastructure can be found in the System Overview / Wrapping section and the Module Wrapping section of the ITK Software Guide.

Contributing extern template support to generate_export_header is a good idea. A path forward is to add a namespaced version of generate_export_header to ITK/CMake/, demonstrate that we have it working across systems in a stable way, then push it upstream.

Thanks,
Matt