Is it possible to propagate some information to upstream pipeline?

remicres · January 4, 2024, 8:37am

Hello,

Is it possible to propagate some kind of dynamic (“current-requested-region-related”) information, from a mapper to the source through the filters of a pipeline?

My use case is the following

one reader,
a pipeline composed of several filters,
one writer.

All filters support streaming. The writer generates its output piece by piece using a classic streaming strategy (e.g. tiled). Now, I want that the writer transmit the itk::ImageRegion start and size of the next requested region upstream, so that this information finally reach the reader.

I am working with remote (http) streams and it could be really useful for the reader to know in advance what’s the next chunk to grab: this way it could threads the gathering of the next chunk while feeding the current one to the downstream filters.

I am used to the ITK architecture (OTB developer here) but I don’t see/know how/if that’s doable within the actual framework?
I know some workaround could also be done (e.g. using TileHint in image objects metadata) but that would not imply some kind of synchronization between the writer and the reader (i.e. the reader would not have the actual info on the next writer requested region). Another approach would be to make the reader self aware of the next blocks, but that would require some serious assumptions on the streaming strategy.

Any thoughts?

blowekamp · January 4, 2024, 2:19pm

This is a complex problem with may “features” which could be supported or not supported to fit within the ITK frame work.

Do any of the intermediate filters “expand” the requested regions? If they do expand then the requested regions coming to the Reader will overlap, and could require reading chunks multiple times.

There are only a couple common streaming strategies used. Usually just D-1 “slices” or ND regular blocks.

Your proposal seem to focus on directly supporting prefetching of the streaming regions. It may be better to consider it as a “Prefetching Caching Asynchronous Reader or Filter”. That is this “Caching Filter” manages multiple sub-processes/threads reading regions, it caches those output regions. To generate output the filter assembles those regions into the requested region from downstream the pipeline.

My 2 cents.

remicres · January 4, 2024, 3:05pm

Hello @blowekamp, thanks for your promt reply.

“Prefetching Caching Asynchronous Reader or Filter”

Yes, that would be a more generic approach, suited to pipelines. I really like the idea!

I will think about this approach. If we implement that we’ll let you know.

Thanks

blowekamp · January 4, 2024, 3:59pm

In terms of “predicting” the requested region sequence, consider looking at the ImageRegionSplitter class hierarchy. These classes used in a few places in ITK to generate sequences of regions for streaming, and this class could be a parameter to the proposed new caching filter.

remicres · January 9, 2024, 2:11pm

Hello @blowekamp ,

I made a POC for OTB here.
We will try it in pre-production to check how it behaves in real conditions. For now it looks like its working nice, and I believe this could be an interesting feature to bring in ITK in the future.

To be improved:

Working on N dimensions (for now, the code deals only with spatial dimensions x and y)
the heuristic is really simplistic for now

I think it should be quite straightforward to make the code generic enough for N dimensions, I don’t see any major issue for that.
The current implementation considers that the future requested region will follow the same trajectory as the previous one, and tries to guess when the shift changes in dimension (this part could be improved a bit quite easily). Actual requested regions that are not cached are fetched in the sequential part (i.e. in the GenerateData() after the caching thread finishes. So it still works (!) when the guessed next requested region is completely elsewhere.

matt.mccormick · January 22, 2024, 9:12pm

Neat!

These contributions are welcome in ITK. They can be more easily distributed as a Remote Module.

Another approach for this use case could take a custom “writer” that creates threads for multiple output regions. Each thread has its own instance of the reader + pipeline filter.