Add a parallel compression method to NRRD and/or MetaImage?

It used to be faster to write compressed images, due to compression being faster than hard disk transfer rates. Not any more! Solid state disks are now pretty fast, which makes (single-threaded) compression the bottleneck. We are still mostly using the outdated single-threaded GunZip for compression of most of image formats.

What do you think about adding something more modern, e.g. zstd to NRRD and/or MetaImage?

2 Likes

+1 zstd adds a huge performance and compression ratio bump.

Should we contact the teem group and see if they are willing to contribute to the effort?

@glk maintains Teem / NRRD. Gordon, what do you think?

That sounds intriguing. Thanks for mentioning me by name or else I would not have noticed this.
I should learn the learn the answer to this myself, but what are the 3rd party dependencies of zstd (like how png depends on zlib)?

By quickly glancing at their CMake sources, it seems not to have external dependencies.

So how would it be using parallel computing of the compression?

It depends on PThreads on *nix and Kernel32 on Windows which provide threading capabilities. I assume it also depends on STL and CRT. Libraries which do not depend on these system things are rare and usually of questionable utility.

I meant it does not depend on some other non-standard library.

It looks like it is vanilla C as opposed to C++.

That explains the need for PThreads, instead of using C++11 threads :smiley:

Cool - so it’s feasible to add this new encoding to NRRD, and I’m intrigued by the possibility. Is there anyone willing to help me? I would like a side-effect of this to be a re-synching of ITK’s NrrdIO and the Teem sources from which it originates.

1 Like

What seems like a logical way forward to me is adding zstd as another third party library to ITK, and have MetaIO and NRRDIO depend on it (in addition to depending on zlib), and make NRRD and MetaIO support additional encoding zstd (in addition to raw and gzip).

I don’t know why ITK’s copy diverged from official teem. But teem has a lot of other stuff in addition to NRRD - maybe that was the reason. @matt.mccormick @blowekamp @hjmjohnson can you give more background? @glk what would be required to re-synchronize ITK and Teem?

We now have good infrastructure in place to pull from and push contributions to upstream third party projects with git subtree. We just have not enabled it for NRRD yet. We should do that as part of this effort :slight_smile:

Yea, Teem has a lot of stuff that doesn’t make any sense for ITK. So, when Nrrd was first added to ITK, I wrote some scripts to extract just those portions of Teem (not just which files, but which parts of which files) that are built to make NrrdIO (look at Utilities/NrrdIO/unteem.pl). I sort of regret that now, because it always means that updates to NrrdIO from the ITK side have to be merged back into Teem in a more tedious/manual manner. I forget when this last happened (sometime after 2009).

Matt - can you describe more how this “pull from and push contributions to upstream third party projects” might work? Or, is there another third-party project for which it is working, which could be an informative example?

Essentially, the Git subtree merge process is,

  • Create a new orphan branch in the ITK repository. This has a completely independent history. This is where we keep the NrrdIO files.
  • Whenever we want to perform an update, we run a script that pulls in the new NrrdIO sources, possibly redacts files we don’t need, etc. – similar to your unteem.pl. We create a new commit on this branch from the update.
  • Git has the ability to merge this branch into a subdirectory of the main master branch. This allows us to (theoretically) keep any modifications we have to make, like symbol name mangling related changes, etc, with Git intelligently performing the merge.
  • We have a client side hook, that checks for changes to Nrrd code, suggests submitting them upstream, and includes instructions on how to do so:

The process is automated with a shell script. We have common driver scripts for third party repositories managed by Git, but we could use something similar for Teem if it is still in SVN.

The driver scripts:

Some third party libraries currently managed this way:

1 Like

A blog post on ITK’s Git subtrees:

https://blog.kitware.com/stay-in-sync-with-upstream-via-git-subtrees/

Thanks for the detailed information. Work on Teem continues to be an unfunded effort, which I continue to do in a scatterbrained as-needed way, so I appreciate your patience and willingness to explain rudimentary things.

It sounds like the first step is getting all the Teem source onto github, right? Sourceforge says it’s not dead yet, but I’m skeptical.

Your work on Teem is appreciated, Gordon. It is a valuable asset to the community!

Yes, migration of Teem sources to GitHub is not absolutely necessary, but it would be a good first step that could help make this effort easier and the maintenance of Teem easier in the long term.

Not to get off on too much of a tangent, but do you have any tips on how I can get a little bit of development support for Teem? The two main things I need are (1) complete refresh of all the CMakeLists (they were written a long time ago when Teem was first being eyed by NAMIC, but haven’t gotten updated to more modern CMake idioms, afaik), and (2) getting Teem CTest’d on the same variety of platforms as ITK is tested on. Currently, the only modern CMake stuff for anything in Teem is the way that ITK builds NrrdIO (not all of Teem, and not all of the most recent Teem), and only NrrdIO (not all of Teem) benefits from the testing reported on the ITK dashboard. Part of being scatterbrained is feeling like I’ve said all of this before, maybe even to you, years ago, but I’m not sure if there was a concrete plan.

GitHub would help in this case, too, due to the availability of free CI services, a larger developer community, and tools for collaboration.