The effect of mutual information rigid registration is extremely poor.

Using the Mattes Mutual Information Image-to-Image Metric v4 as the similarity measurement for three-dimensional rigid registration, their initial gradient is not equal to zero and the registration results of two completely identical images actually become significantly worse. Why is this so? And how can this problem be avoided?

The gradient does not have to be exactly 0, just close to it. If you set up your optimization parameters well then the optimizer should eventually find the optimal alignment both for real data and your special test case. If optimizer parameters are not good, for example you set up a very large initial step size then you may get very far for the optimal alignment in the first step and the optimizer may never be able to recover from that.

You can experiment with determining optimal registration parameters manually, which is useful for learning about how various registration algorithms work. However, if you just want to register images, I would recommend to use higher-level tools, such as ElastiX or ANTs. These tools can automatically determine many parameters, so their default registration presets usually work without any manual parameter tuning.