Derivation of mutual information in EMMA algorithm for image alignment

This is a question about the derivative of the mutual information between two images with respect to a transformation between them. It appears in one of Paul Viola’s papers that introduced this concept for the first time. I have asked this question on DSP stack exchange but I could not get enough attention to it. Therefore, I bring the question to this group. In order to avoid repetition, I merely give a link to the quest on Stack Exchange.

Kindly let me know if anyone has a piece of advice about the expression.

@hjmjohnson might have an answer.