Add a parallel compression method to NRRD and/or MetaImage?

I was getting ready to add zstd to ITK, in order to enable its use in MetaIO and NRRD. But before doing so I decided to compile it locally and try it with my own example. Unfortunately, it turns out the multi-threaded support is an advanced and experimental feature at the moment. I have asked a question about when that might change. Let’s see what they answer.

In light of this, I also quickly checked out LZ4. The situation there does not really seem better with respect to multi-threading.

They recently decided to stabilize the multithreading API, so I entered an issue to track this.

Considering that the nrrd user community needs several major features (as proved by current discussion and a similar one here) and ready to contribute, but there seems to be no efficient way of doing it in the teem repository (not on GitHub, contains lots of irrelevant features, not modern C++, etc.), I would suggest to create a new nrrd library.

If everyone is OK with that, we could start from teem’s nrrd, add tests, make obvious cleanups (e.g., leverage C++11), and add new features (random file access, faster zip compression, etc.).

Which GitHub organization should we use?

What should be the repository name? NrrdIO, NrrdTools, QuickNrrd, SuperNrrd, …?

@hjmjohnson @pieper @jcfr

1 Like

My preference would be the InsightSoftwareConsortium.

FYI: I just worked on reviving the long stagnant NIFTI library as well https://github.com/NIFTI-Imaging/nifti_clib . While this effort is non-trivial, it will hopefully provide long term maintenance easier.

Hans

4 Likes

@lassoan, thanks for keeping this moving.

I’d vote for the name CppNrrd. Also a new toplevel organization make sense (NrrdTools is good).

This could become a central place to host nrrd libraries in other languages, like matlab, python and javascript. Reference documentation of the format and sample data could be shared by all these implementations instead of having them all over the place like they are now.

I’d also vote for starting with a very specific conformance statement of which nrrd use cases will be supported, and have tests and example data for each of them.

Off the top of my head, I’d suggest the new C++ implementation need to support at least the following features at least in order to be viable replacement for the current library:

  • scalar/vector/tensor volumes, 2D, 3D, 3D+t
  • dwi/dti extensions (gradient tables, measurement frames…)
  • .seg.nrrd
  • .nhdr and data files

This shouldn’t be hard since for the most part they just expose the raw buffers and headers. Adding parallel compression and other new features would be great.

Just to note: replacing hard to maintain legacy nrrd code makes sense, and nrrd is a good lightweight option in many purposes, but personally I’ll put more effort into better DICOM support for many of the same use cases.

1 Like

It would be great if we could store image data in DICOM files using high-speed compression/decompression and random access. Maybe what we learn from modern NRRD implementation (and hopefully even source code) can be used for DICOM files in the future.

CppNrrd could work, but it might not be fair to claim such a general name. I often find this a real issue in Python, where random people create a pyXYZ library and publish on pypi. For example, there is already a pynnrd implementation - if we don’t add any distinctive word then we could run into a name conflict if we want to release a Python version.

1 Like

Good point - if we choose to support only a subset of nrrd (or a superset?) then maybe we should make that explicit in the naming. And as a magic number or version in the header.

I think this thread is related to this other issue. Since I support NRRD without using your code, I would ask that these changes become part of the formal NRRD specification and are done with full awareness that they will break compatibility with older tools. This may well be worth it - zstd is very fast and is gaining widespread traction. However, using pigz would provide faster compression without breaking compatibility and pre-filtered zstd (e.g. blosc) might be much better suited for the nature of NRRD data than pure zstd.

1 Like

Was there any movement on moving NRRD to GitHub?

We had other priorities and so we have not started carving out NRRD IO from Teem and moving it to GitHub.

Another simple solution to this problem is to use the cloudflare Zlib for x86_64 compilations, which is a version of the classic zlib library updated to use modern instructions included in CPUs for the last 9 years (e.g. since 32nm Westmere). This provides x2 the performance of the classic zlib. Since it is a direct replacement for zlib, it is simple to integrate in code.

While pigz can give better than x2 performance on a computer with more that two cores, it does have a couple of disadvantages:

  1. It is not a direct replacement for zlib, so you have to change your code.
  2. It uses multiple CPUs, and this can slow down other threads that are running.
  3. In typical usage, one saves the uncompressed data to disk and then compress that file with pigz. This additional set of reads and writes has a penalty versus using zlib to directly write. This penalty is particularly huge on clusters that have slow disk I/O. If you are using Unix and pigz 2.3.4 or later, you can use piped mode to directly write your file with pigz, but this again requires extensive modification of your code.
3 Likes

My tests on gz compression for about an hour worth of MRI data to an intentionally slow spinning HD fast SSD using a 6-core (12-thread) CPU are here - each series was saved as a single volume. In brief

  • x1.0 raw uncompressed
  • x3.2 piped pigz
  • x4.4 pigz
  • x10.1 CloudFlare zlib
  • x20.6 zlib

These values would change with a different number of cores or different I/O speed. The gz format was not designed with parallel processing in mind, so pigz leverages the fact of the tiny 32kb gz dictionary - each thread just overlaps the chunk from the prior thread to the point where each will provide the same output. This does mean pigz is more effective for larger datasets and is not truly linear. In addition, some stages are not processed in parallel.

In brief, the CloudFlare zlib is a very easy way to double the speed of compression with minimal changes in code. dcm2niix provides a simple make file to show how this change can be provided as an option for cmake.

To summarize my opinions

  • CloudFlare zlib doubles performance, is easy to integrate and does not require changes to the NRRD format. It is completely seamless (assuming the user’s hardware supports it).
  • As noted in the start of this post, gz is not a modern compression format. It was designed for single threaded computers with very limited RAM. From my perspective, zstd is the best open source compression currently available. It is being developed by an outstanding team and will likely address the associated weaknesses, but I would consider these before deciding if this will become one of the designated compression schemes for NRRD:
    • Currently versions of zstd is mostly single threaded (except for IO). This limits performance.
    • Most medical images are saved as 16-bit integers. The most significant bits change much less frequently than the least significant. zstd ignores this and therefore misses a great opportunity for compression (using blosc to swizzle bytes demonstrates this).
    • While zstd is at the Pareto frontier for open source, the proprietary Kraken generally performs even better. I suspect a lot of this is tuning for the x86_64 architecture. Before mandating a new compression format for NRRD, I would ideally like to feel that it provides world class performance. I am certainly not recommending a proprietary format, it just makes me feel that there is potential for either a better format than zstd or a better implementation of zstd. On the other hand, I think that Kraken has very slow compression to optimize fast compression. Therefore, this final comment may be moot - zstd may indeed be a much more balanced tool when one considers both compression and decompression.
2 Likes

Replacing regular zlib by Cloudflare’s fork sounds like easy picking. Do you want to give it a try @neurolabusc?

@jcfr It looks like Slicer is still using zlib 1.2.3. As a first step, it might make sense to update to zlib 1.2.11 (the same one ITK has).

I got the CloudFlare build to work from the standard make, which was a useful proof of concept but not good for distribution.

@ningfei deserves all the credit for developing the dcm2niix cmake files that run on such a large variety of operating systems, distributions and hardware. I recognize that he has elegantly consolidated this into a few small files that one could copy-and-paste from. However, the quality of his work has been so high that I have never invested any time becoming familiar with cmake files. Therefore, updating the ITK cmake is out of my wheelhouse.

That would make sense. Is it something you could help with ?

We currently use GitHub - commontk/zlib: CMake'ified zlib that add support for ZLIB_MANGLE_PREFIX option. This allow to configure the library ensuring the symbols are properly prefixed.

Ideally, GitHub - madler/zlib: A massively spiffy yet delicately unobtrusive compression library. should be updated to support this option. This would streamline integration in ITK, VTK, …

2 Likes

https://github.com/zlib-ng/zlib-ng is trying to merge the changes made in all the various forks (chromium, intel, cloudflare) as well as their own work, including a modern build system.

1 Like