Multithreading - Backward incompatibilities

simon.rit · June 4, 2018, 2:42pm

Hi,
I tried to compile the RTK code with ITK v5.0a02. You have made changes which seem to be backward incompatible, not only at compilation (e.g. these errors), but also at execution (e.g., this test).
I assume this is intended as backward incompatible changes were planned to occur with the increment of ITK major version. Is there any documentation of the backward incompatible changes and how we should adapt our codes from ITK v4.13 to the coming ITK v5.0? That would be extremely useful.
Thanks in advance,
Simon

dzenanz · June 4, 2018, 9:15pm

The migration guide can be found here. Description of ITKV4_COMPATIBILITY is currently missing from it, I will expand the document today/tomorrow. Please provide feedback about how useful it is, if there is anything missing etc.

Edit: ITKV4_COMPATIBILITY is already described. No need to update right now. Update will be needed after this patch branch is merged.

simon.rit · June 5, 2018, 6:11am

Thanks, exactly what I was looking for. First question: what happens to filters that use threadId for something else than reporting? There is no other solution than disabling DynamicMultiThreading? For example, we use it to compute intermediate results per thread, see, e.g.,
https://github.com/SimonRit/RTK/blob/master/include/rtkTotalVariationImageFilter.hxx#L206

dzenanz · June 5, 2018, 11:31am

There were about a dozen filters which used threadId to store intermediate results in an array. All those filters had DynamicMultiThreading disabled. Without re-writing the algorithm, there is no other solution.

Edit: in your case, you might get away with using atomic m_SumOfSquareRoots, and add to it directly, e.g.: m_SumOfSquareRoots += sumOfSquareRoots;

simon.rit · June 8, 2018, 11:57am

Ok, thanks. I have tried to do it, other users can find my patch here if they want a more complex example than the one you give on the migration guide. I think the document you pointed out lacks a reference to when renaming ThreadedGenerateData is not sufficient. This is clearly explained in itkMultiThreader.h.

I haven’t used your atomic suggestion, I guess I would have to do a proper performance test to check that it does not cost anything (which would be surprising to me, atomic additions are not costless on the GPU).

On a side note, I personally don’t understand the choice to have changed the default multi-threader, particularly since you have explicitly set dynamic multi-threading in ITK filters. You would have kept backward compatibility at not cost. But I’m sure I have missed important conversation on this topic, I should spend more time on discourse.

Thanks again for the help!

dzenanz · June 8, 2018, 1:23pm

I created a patch which should address this. Please review.

dzenanz · June 8, 2018, 1:25pm

I didn’t notice that class was for GPU

The default multi-threader has been changed for performance reasons. The PlatformMultiThreader has higher per-thread overhead, and always creating a new thread makes debugging/profiling difficult.

Also, having more work units than threads helps with load balancing in cases of unequal per-pixel work or the application sharing processor with other heavy computations (which possibly affects different threads unevenly).

simon.rit · June 8, 2018, 1:41pm

It’s not for GPU, I wasn’t clear, sorry. I’m just saying that my only experience with atomic adds it’s on the gpu and it’s better to avoid them if you can to improve computation time. I don’t know if the same is true on the CPU.

Thanks for the explanation. I understand why you prefer using dynamic threading. But you would have had it without changing the default behavior since you have explicitely set DynamicThreadingOn in ITK filters.

dzenanz · June 8, 2018, 1:48pm

Atomic operations might become a bottleneck if you are doing hundred thousand or more of them per second, and some memory address is contested a lot. For non-extreme scenarios it is probably cheaper to do a do few dozen atomic adds than to allocate, aggregate and free an array which holds intermediate data.

We changed the default so user code gets the better performance by default. Especially since opting out is relatively easy.