File format for general-purpose medical image computing

lassoan · April 23, 2021, 4:55pm

We encounter so many issues due to redundancies and inconsistencies within the nifti file format. Am I alone with being frustrated by this? I try to educate users one by one to not use nifti except for brain imaging, but of course this is not very effective.

@dzenanz @matt.mccormick @zivy @hjmjohnson @jhlegarreta @pieper

Do you agree that something needs to be done to dissuade people from using a poorly defined file format as current nifti?

What do you think would be a good solution?

Promote NRRD instead? It is a simple, already well-established file format that could be used instead of nifti for general-purpose applications.
Fix nifti? Simplify, remove/clarify inconsistencies and redundancies should be possible, it would require very small modifications in the implementations, the main challenge would be making current nifti users agree that change is needed.
Start promoting a more modern, more powerful file format, which will serve future needs of the medical image computing community? The file format should be simple enough and/or have small, high-quality implementation in all major programming languages and environments (complicated beasts, such as HDF5; or Python-only implementation would not be ideal).

pieper · April 23, 2021, 5:44pm

Yes, I share your frustration with the overuse of nifti in fields where it does not belong.

Promoting nrrd is a practical short term thing, but it might be better to build on a more modern base. Perhaps defining a set of medical imaging conventions on zarr.

zivy · April 23, 2021, 6:10pm

Hello @lasson,

“Happy” to see it’s not just me. Given the widespread usage of nifti I was beginning to think it was just me not loving the format. Personally, I use nrrd as a general purpose format, when I get to choose.

I am very much opposed to independently developing a completely new format, there are too many already. Piggybacking off of an existing format like zarr sounds reasonable to me. I’ve recently been working a bit with florescence microscopy images, and I was happy working either with their native hdf5 based format, zarr’ish, or converting them to nrrd. If of interest, we shared samples on zenodo (in hdf5, in nrrd).

@pieper +1 for zarr

lassoan · April 23, 2021, 7:00pm

zarr with medical image computing conventions seems to make a lot of sense. We should coordinate with other groups (napari, etc.).

For widespread adoption, we would need C++ implementation so that for example ITK and VTK can read/write natively. There are discussions and some preliminary implementation.

jhlegarreta · April 24, 2021, 12:50am

As far as I understand, the discussion @lassoan sparks is not strictly about @crossmanith 's issue. The issue reported by @crossmanith , as @dzenanz points, looks more a loss of precision issue due to:

Multiple processing steps applied to the image, and
Most probably, due to internal issues within such steps.

I ignore whether @crossmanith is using NIfTI for neuroimaging data. If she is, then she is using NIfTI for the “appropriate” modality, and the issue is still about the loss of precision, internal issues, or misuse of the relevant information in the pipeline. So, sorry @crossmanith if this is another message that adds on your thread diverting from your question (maybe the relevant part can be moved to a dedicated topic).

And then, if she is dealing with neuroimaing data, we’d actually be facing those very same limitations in the neuroimaging community, or the ITK community at least (potentially derived from third-party libraries ITK uses to deal with NIfTI).

I am by no means an expert in the NIfTI format. I have witnessed your frustration with it, @lassoan, as well as @hjmjohnson 's or @zivy’s. But I’d also say that the neuroimaging community faces also some great challenges when using file formats and standards other than or besides NIfTI. @pieper knows these better than me.

If people use NIfTI to deal with medical imaging data, I guess it is because:

It allows users to avoid dragging N DICOM files (i.e. tens or hundreds of them) and have a series in a single file.
Other formats are less known to them.
The inconsistencies, problems, shortcomings, etc. of a given format are not well documented, or are very sparsely documented, or are not sufficiently demonstrated with code and examples.

My feeling is that an additional problem in all this are the different implementations or interpretations of a given standard or file format.

When I had to deal with non-neuroimaging medical imaging data, I used to convert the DICOM files to MetaImages. Not sure if that was fair or appropriate either, but the choice was probably influenced by what was customary where I was, because we (or I) did not know enough about its potential problems or shortcomings, or did not know NRRD or other formats enough.

I do not know Zarr so I cannot say in which ways it is better.

As for the specific questions:

Promote NRRD instead? It is a simple, already well-established file format that could be used instead of nifti for general-purpose applications.

Promoting a given format involves great resources and effort (e.g. a practical workshop in every major medical imaging meeting/conference to demonstrate the pitfalls of a given format and benefits of another one (?), online resources, etc.). I have seen you @lassoan and also others persevering not to use NIfTI for non-neuroimaging medical data throughout the years, but it looks like those efforts are not hitting the target unfortunately.

Fix nifti? Simplify, remove/clarify inconsistencies and redundancies should be possible, it would require very small modifications in the implementations, the main challenge would be making current nifti users agree that change is needed.

Not sure about the meaning of fixing NIfTI. Whether it is the standard/file format the one having a problem or its implementations/(mis-)interpretations the ones having one or more problems, then I guess they have to be fixed since I assume those problems have an impact even on neuroimaging data. I’d say that the Python neuroimaging community heavily relies on nibabel to deal with NIfTI (after converting the DICOM files to NIfTI with some tool like possibly dcm2niix). So I’d dare to say that dcm2niix/nibabel maintainers would be happy to be part of such discussions and fixes.

Start promoting a more modern, more powerful file format, which will serve future needs of the medical image computing community? The file format should be simple enough and/or have small, high-quality implementation in all major programming languages and environments (complicated beasts, such as HDF5; or Python-only implementation would not be ideal).

I would not be for promoting and implementing a new format; I am hesitant about the power to involve a critical audience and developer mass necessary to implement it.

Currently, my personal time to deal with these issues is limited. Also because my understanding of them is limited.

In summary, I believe that any known shortcoming of a given format, or any misintended use of it, has to be documented, be made available/demonstrated through code and examples, across several modalities and at a single place so that whenever this or similar discussions arise, or it is argued that a format has such limitations, a link to it would suffice and be enlightening enough/self-explanatory.

PS: Very sorry for the lengthy message.

spinicist · April 24, 2021, 7:40am

Hello,

I see the joys of Nifti have come up again

I am a neuroimaging person. I use Nifti everyday. I also despair at the orientation issues. In my ideal world, I would also like to see Nifti dropped or improved.

However we don’t live in my ideal world and I’ll repeat what I’ve said here before: derailing or redirecting the Nifti juggernaut would be a massive undertaking. There are a lot of tools out there that understand Nifti and only Nifti. I will happily make a small bet that every result submitted to Human Brain Mapping went through Nifti at some point.

To underline this, I was actually in a GE Webinar yesterday where they proudly announced that they had worked with Chris Rorden to fix dcm2niix to work with GE Dicoms. So now all three major MRI vendors have kind of agreed to support Nifti. The entire BIDS ecosystem is founded upon Nifti, and BIDS is now big business.

I think an important point to raise is that beyond getting left/right and voxel dimensions correct, neuroimaging research often does not care hugely about orientation, because everything gets registered to the MNI space anyway. Because orientation information used to be so unreliable, the attitude became “make the tools robust against orientation problems” rather than “fix the orientation problems” . And given the number of citations FSL has for flirt, it is hard to argue against this attitude.

If anyone is serious about trying to affect change here, I would suggest talking to Chris Rorden (dcm2niix is now the de facto gateway to Nifti), and also the BIDS committee. You might make headway with them by pointing out a sensible file format would allow them to dispense with the .json “sidecar” that they currently use.

lassoan · April 24, 2021, 6:09pm

@jhlegarreta @spinicist thanks a lot for your valuable insights.

Replacing nifti with something else for neuroimaging applications would have enormous cost - far outweighing the benefits. The neuroimaging community will just have to deal with the problems of this file format that they created.

What would be important is to use something better for general-purpose medical image computing. Something that is less ambiguous, less redundant, more flexible, and allows better performance.

NRRD would be a small improvement in all these, but even if we invest time into a better NRRD library, parallel compression, etc. the possibilities would be rather limited.

On the other hand, something like Zarr would allow significant jump forward. It would require a bit of work to figure out how to encode essential medical imaging metadata, how to interface it efficiently with existing C++ libraries, such as ITK, VTK, and dcm2niix.

Involving Chris Rorden in the discussion is a great idea. @neurolabusc what do you think about starting to use Zarr or similar modern file format for general-purpose medical image computing applications?

It would be great to hear from @matt.mccormick, too. He may have first-hand experience in using ITK with Zarr, and feasibility of natively reader/writer in ITK.

spinicist · April 26, 2021, 9:55am

@lassoan Ah, I understand your aims better now.

I wasn’t aware of Zarr before this. I watched Joe Jevnik: Zarr vs. HDF5 | PyData New York 2019 - YouTube and it looks good. But also it looks similar enough to HDF5 that I suspect if you were careful about how you defined the imaging meta-data, it would not actually matter which of the two was used for storage.

pieper · April 26, 2021, 2:57pm

For those who are working in neuroimaging, particularly diffusion you may be interested in these scripts which implement this math. Also there’s now a -e y flag for dcm2niix that outputs nhdr files.

lassoan · April 26, 2021, 3:54pm

HDF looks good on paper but the fact that it failed to become popular in medical image computing in 30 years is quite telling. I don’t know what prevented HDF5 to succeed, maybe it came too early, was too complex, did not have a sufficiently robust and high quality implementation, lack of surrounding ecosystem of tools, did not have good governance (failed to establish strong communities and conventions), or a combination of these.

There is a HDF5-based image format (MINC2) in ITK, but it is rarely used outside the group that invented it. ITK can store transforms in HDF5, which can be considered as some amount of success, but people still use txt and displacement field image formats if they want to have a look at the data or read/write transforms without ITK.

It seems reasonable to have a fresh start and give a chance to another format than HDF5. Zarr is much better suited for software development culture today with its Python-friendliness, simpler, smaller API, easy extensibility and distribution of extensions and tools, no central authority, no ABI incompatibility, and its built-in support of multithreading, locking, caching, many storage backends.

gdevenyi · April 29, 2021, 8:19pm

I come from MINC land, which original was NetCDF (MINC1) and became HDF5 (MINC2). It had a full ecosystem pre Linux (developed on IRIX/Octane systems) and was forward-looking about storing huge data and being able to access chunks of it at a time. It also had all its code in CVS and had a complete automake/autoconf build system.

I think the reason we are where we are today is because NIFTI is “implementable” without a library, whereas to use MINC a full build system (and proper toolchain) were required, something fledgling neuroimagers lacked, instead, they had MATLAB, which could do something that resembled NIFTI without a compiled library due to the simplicity of the standard.

The file format reference lives here:
https://en.wikibooks.org/wiki/MINC/SoftwareDevelopment/MINC2.0_File_Format_Reference

May be worth at least looking at it for how the meta-data is structured, a key feature that BIDS had to redo from scratch because NIFTI can’t store it internally.

Edit: the ecosystem still exists, and we have many users, mostly via the training of the many neuroimaging groups of Montreal, all the code lives at McConnell Brain Imaging Centre, Montreal Neurological Institute, McGill University · GitHub

mihail.isakov · April 30, 2021, 6:02pm

Is the issue with flipped axes - ITK Loading of MINC Files Flips Orientations - fixed?

There is one old unmerged PR.

gdevenyi · April 30, 2021, 6:40pm

Yes, and no.

Edit: oops you linked the PR.

Yeah, my patch fixes it, however it also would change how everything works so there’s backward-compatability questions which have not been resolved.

matt.mccormick · May 3, 2021, 8:27pm

Yes, I recommend Zarr as a preferred file format for medical images of The Future .

Zarr has many advantages:

A simple format, supporting multi-dimensional arrays and metadata, with broad adoption across the scientific computing community.
Excellent compression.
Distributed, parallel computing friendly.
Web friendly.

ITK already has Zarr support in the Python interface.

Here is an example:

Binder:

Notebook:

github.com

KitwareMedical/ITKIOScanco/blob/master/examples/ConvertScancoVolumesToOpenStandardFormats.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Convert Scanco Volumes to Open Standards\n",
    "\n",
    "This notebook illustrates how to convert Scanco microCT images into open standard file formats while preserving critical image metadata."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Install notebook dependencies"
   ]
  },
  {
   "cell_type": "code",

This file has been truncated. show original

When working with xarray, we preserve metadata, keep explicit dimension names (a common source of confusion and errors!), and when we slice the data array, the spatial information, i.e. xarray coords, is updated correctly.

In terms of orientation, it would be nice to default to LPS like we default to UTF-8 nowadays for text encoding.

With regards to HDF5, Zarr was meant to provide the HDF5 interface in Python and the same hierarchical support with attributed and binary arrays, but it is 1) a simpler architecture that is easier to implement and code against, 2) more web-friendly, 3) more parallel write friendly.

Zarr has implementations in:

Python
Java
Rust
JavaScript
C++ – in addition to z5 there is also xtensor-zarr
If I recall correctly, someone is working on a C implementation…

The geospatial imaging community is using Zarr heavily through xarray and the microscopy community is using zarr through NGFF. It makes sense to do the same in the medical imaging community. I am trying to bridge these communities so the data model, e.g. spatial information, is handled in the same or at least in a compatible way. This is WIP. There are regular open Zarr community meetings, and I would encourage anyone interested to attend. Their current focus is Zarr v3, though.

lassoan · May 4, 2021, 3:50pm

Thanks a lot, these are very useful information.

Is there a plan to add support for Zarr in ITK proper (maybe via an ITK remote module)?

Does it work with xtensor as well (it could be interesting, because xtensor-zarr looks nice, too)?

Is the metadata structure specified somewhere? Is it mature and will probably remain the same in the future or still in experimental phase? It would be useful to have something like a MIC-Zarr specification similar to OME-Zarr that you referred to. Since OME-Zarr is still just a draft, maybe we could agree in a common format (or at least the basics of storing image and labels could be the same). There are a few limitations of OME-Zarr, for example there seems to be no proper support for overlapping labels (not seem to be possible to store them in a 4D array) and it would be nice to define standard for storing basic label metadata (label name, terminology).

Agreed. This would result in correct behavior most of the time, even when users ignore the patient orientation information.

matt.mccormick · May 5, 2021, 3:02am

Yes! Some work was started in this remote module. But, perhaps it would be worthwhile trying an xarray-zarr backed implementation. Are you interested in exploring and experimenting?

The metadata organization is discussed with the OME-Zarr folks in the ngff issue tracker. Ideally, we can avoid multiple unnecessary data organizations!

Discussed more: Support for multi-channel labels · Issue #19 · ome/ngff · GitHub

There is a start the omero. @joshmoore may have an idea of plans for integrating or extending this more in OME-Zarr.

joshmoore · May 5, 2021, 10:19am

Hi everyone. Sorry, I’m a poor lurker at best. Thanks for the ping, @matt.mccormick.

I’ll add on top of @matt.mccormick’s pointer, kindly funded by CZI.

There are some edge cases, but in general, I think of HDF5 as a superset of Zarr. Certainly in our work with OME-Zarr I’d like to see any Zarr be transformable to HDF5 and back again without loss. Now that NetCDF has adopted Zarr as a backend beside HDF5, I would hope that the two would become ever more compatible.

netcdf-c has support.

Some background: our work on this is motivated by the fact that our domain’s primary format (TIFF) is simply not keeping up. For remote access, which is a significant requirement, Zarr also exceeds HDF5. If you’re inclined, you can read more in:

https://doi.org/10.1101/2021.03.31.437929

At the moment, our focus is on defining the metadata that will turn a Zarr into a multi-dimensional image, which I assume is most of interest to all of you. That is ongoing and certainly wouldn’t count as mature, but the intent is very much to support each intermediate released version. I think each domain community will need to have a discussion regarding at what point on the roadmap they are interested in getting involved (if at all). But in general, big support from my side for sharing as many layers of metadata & implementations as we can!

The closest at the moment are “label properties”: Add label properties description to spec by joshmoore · Pull Request #3 · ome/ngff · GitHub . Not sure if that covers “terminology”.

~Josh

spinicist · May 7, 2021, 11:21am

Okay, looks like I need to learn more about zarr then. And my corner of the universe had only just caught up with HDF5!