in making the second version of voxlogica (see https : //github.com/vincenzoml/voxlogica), which is, all in all, the interpreter of a simple domain-specific programming language, we are exploring custom memory management solutions.
Indeed, simpleitk already manages memory internally, so I understand that very often, the memory occpuied by the output of a filter may come from previous filters, and is not necessarily allocated each time. However:
I am unsure what I need to do (from dotnet actually) to let simpleitk reuse buffers: do I need to “dispose” the Image object?
Given the current structure of the interpreter, I would be very happy if I could specify the output buffer for each filter invocation (e.g. as an additional argument to “execute”. Is this even remotely possible, either by patching the source code, or by an hack? Or should I just forget about it?
So there are two or three things here related to SimpleITK memory management.
SimpleITK uses lazy copying for basic image operations (not filtering). This allow functions and methods to return a C++ stack object and coping the image, without performing a deep copy . In many ways it becomes the equivalent of SimpleITK Image being a type of smart pointer to the ITK image.
The normal behavior of a SimpleITK (and ITK) filter is to allocate a new Image(buffer) for the output.
ITK does have an InPlaceImageFilter base class which provides the option to “steal” the input’s buffer and re-use it for the output image’s buffer. In SimpleITK this is implemented with C++ move semantics and Rvalue references. SimpleITK filters based on an ITK InPlaceImageFilter have additional Execute methods which have an Image &&image1 argument. This works seamlessly in C++ with SimpleITK memory management allowing chaining of functions, std::move and it is used to implement the place operators (e.g. +=). Unfortunately other languages don’t have a mapping for Rvalue referenced.
There is the possibility of something like an “ExecuteInPlace” method to be added. But that would place the image1 input into a invalid/unuable/null state which could be hazardous or unexpected to the SimpleITK user.
thanks for your prompt reply. The InPlaceImageFilter bit is very interesting (and could perhaps be ported to other languages by a special method?) but I don’t really need the filter to be in place; I need to be able to specify a target image: the interpreter can invoke thousands of different filters, and it knows at runtime what buffers will no longer be used, so these can be reused directly. Especially in conjunction with ImportImageFilter that would permit the interpreter to fully manage memory.
So summing up what I need is a method of filter (or a function) that would “set” the output buffer before execution so that a new one is not created.
InPlace filter operation can result is significant performance improvement due to reduce memory bandwidth needed filters and the optimization.
SimpleITK does not support the micro-management of memory.
C++ ITK Images use the ImportImageContainer for the interface of manage memory for images. Some customization might be possible at the C++ level with custom classes. Also, the ITK interface provides the general “Graft” method of Images (and other data objects) which may be of interest. Many basic ITK image filter just take the input image(s) to perform a basic operation then set the output, while other ITK filters are constructed with an internal mini-pipeline which have more complicated internal memory management with multiple grafts and/or inplace filter operations. For these more complicated filters a pre-allocated output image buffer may not be reused as expected There are already a large number of features and knobs for tweaking of memory management in the ITK pipeline which can improve efficiency and reduce memory foot prints. I am not aware of a custom memory manger for ITK yielding significant performance improvements.
It is not currently available in SimpleITK, but I suggested a candidate implementation could be to add something like an ExecuteInPlace method where appropriate.
The best resource to start understanding grafting is to understand the ITK filtering pipeline, and how filters are implemented and the pipeline executes. The ITK Software Guide would be the place to start with that. These elements of the pipeline have been removed from the SimpleITK interface.
In the C++ since, deleting an image free the image buffer the memory can be reused again. It is generally a C++ memory manager implementation detail whether that memory buffer gets returned to the OS it is reserved for future reuse by the C++ memory manager. I have successfully executed thousands of filters per second in parallel in Python. The cost of memory management varies significantly with sizes, operating systems etc. When the system is running out of memory and swapping memory management may be expensive.
From a SimpleITK interface, the idea of a ExecuteWithOutput(inImage1, inImage2, outImage1) seems appropriate. I would start/experiment with one of the SimpleITK filters that is manually implemented ( not generated with the templating code ) such as CastImageFilter, or PasteImageFilter.
IMHO the next step should be to prototype some minimal implementation to determine if there is any advantage to this micro-memory management approach before investing much time.
The current plan for VoxLogicA2 is to permit, in fact, calls to thousands of filters as the norm, as I’m planning to let it operate against whole datasets, instead of single images. After the result of one operation has been consumed by all the consumers, the buffer can be returned to the OS.
However (this is very important!) we can avoid the cost of returning to the OS, because in most cases, other filters will need images of exactly the same size and pixely type.
So I’ve designed the new interpreter in such a way to reuse the buffers when possible.
In the next days or weeks, I will provide two tests: one that calls itk WITHOUT calling dispose, until memory exhaustion, and one that calls dispose when appropriate, and will measure the overhead.
Indeed, I can in the meantime implement the interpreter calling dispose when needed, and accept the overhead