GPU memory leakage for GPUDemonsRegistrationFilter

I need to use GPUDemonsRegistrationFilter in my project. However, I find it would cause gpu memory leakage. This problem can be repeated by the following code:

#include "itkGPUDemonsRegistrationFilter.h"
#include "itkHistogramMatchingImageFilter.h"
#include "itkCastImageFilter.h"
#include "itkWarpImageFilter.h"
#include "itkLinearInterpolateImageFunction.h"

#include "itkImportImageFilter.h"
#include "itkImageFileReader.h"
#include "itkImageFileWriter.h"
#include "itkCommand.h"
#include "itkSmartPointer.h"
#include "itkTimeProbe.h"

#include "itkGPUImage.h"
#include "itkGPUKernelManager.h"
#include "itkGPUContextManager.h"
#include "itkGPUDemonsRegistrationFilter.h"

void DIF() {
	bool debug = false;

	//Fill some arrays with bogus
	const unsigned int Dimension = 3;
	typedef float PixelType;
	unsigned int width = 100;
	unsigned int height = 100;
	unsigned int slices = 10;
	unsigned int nump = width * height * slices;
	PixelType *FixedImageArray = new PixelType[nump];
	PixelType *MovingImageArray = new PixelType[nump];
	for (unsigned int i = 0; i < nump; ++i) {
		FixedImageArray[i] = i % 5;
		MovingImageArray[i] = i % 6;
	}


	//Import those arrays as images
	typedef itk::Image< PixelType, Dimension >  FixedImageType;
	typedef itk::Image< PixelType, Dimension >  MovingImageType;

	typedef itk::ImportImageFilter<PixelType, Dimension> FixedImportFilterType;
	typedef itk::ImportImageFilter<PixelType, Dimension> MovingImportFilterType;

	FixedImportFilterType::Pointer FixedImportFilter = FixedImportFilterType::New();
	if (debug) FixedImportFilter->DebugOn();
	MovingImportFilterType::Pointer MovingImportFilter = MovingImportFilterType::New();
	if (debug) MovingImportFilter->DebugOn();

	FixedImportFilterType::IndexType start;
	start[0] = 0;
	start[1] = 0;
	start[2] = 0;
	FixedImportFilterType::SizeType size;
	size[0] = width;
	size[1] = height;
	size[2] = slices;
	FixedImportFilterType::RegionType region;
	region.SetSize(size);
	region.SetIndex(start);
	FixedImportFilter->SetRegion(region);
	double origin[3];
	origin[0] = 0.0;
	origin[1] = 0.0;
	origin[2] = 0.0;
	FixedImportFilter->SetOrigin(origin);
	double spacing[3];
	spacing[0] = 1;
	spacing[1] = 1;
	spacing[2] = 1;
	FixedImportFilter->SetSpacing(spacing);

	MovingImportFilter->SetRegion(region);
	MovingImportFilter->SetOrigin(origin);
	MovingImportFilter->SetSpacing(spacing);

	FixedImportFilter->SetImportPointer(FixedImageArray, width * height * slices, true);
	MovingImportFilter->SetImportPointer(MovingImageArray, width * height * slices, true);

	FixedImageType::Pointer FixedImage = FixedImportFilter->GetOutput();
	if (debug) FixedImage->DebugOn();
	MovingImageType::Pointer MovingImage = MovingImportFilter->GetOutput();
	if (debug) MovingImage->DebugOn();


	//convert to GPUImages
	typedef float                                      InternalPixelType;
	typedef itk::GPUImage< InternalPixelType, Dimension > InternalImageType;
	typedef itk::CastImageFilter< FixedImageType,
		InternalImageType >  FixedImageCasterType;
	typedef itk::CastImageFilter< MovingImageType,
		InternalImageType >  MovingImageCasterType;

	FixedImageCasterType::Pointer fixedImageCaster = FixedImageCasterType::New();
	if (debug) fixedImageCaster->DebugOn();
	MovingImageCasterType::Pointer movingImageCaster = MovingImageCasterType::New();
	if (debug) movingImageCaster->DebugOn();

	fixedImageCaster->SetInput(FixedImportFilter->GetOutput());
	movingImageCaster->SetInput(MovingImportFilter->GetOutput());

	InternalImageType::Pointer GPUFixedImage = fixedImageCaster->GetOutput();
	if (debug) GPUFixedImage->DebugOn();

	InternalImageType::Pointer GPUMovingImage = movingImageCaster->GetOutput();
	if (debug) GPUMovingImage->DebugOn();

	GPUFixedImage->Update();
	GPUMovingImage->Update();

	//Perform GPU Demons Registration
	typedef itk::Vector< float, Dimension >             VectorPixelType;
	typedef itk::GPUImage<  VectorPixelType, Dimension >   DeformationFieldType;
	typedef itk::GPUDemonsRegistrationFilter<
		InternalImageType,
		InternalImageType,
		DeformationFieldType> RegistrationFilterType;

	RegistrationFilterType::Pointer filter = RegistrationFilterType::New();
	if (debug) filter->DebugOn();

	filter->SetFixedImage(GPUFixedImage);
	filter->SetMovingImage(GPUMovingImage);

	filter->SetNumberOfIterations(1);
	filter->SetStandardDeviations(1.0);

	filter->Update();
}

int main(int argc, char **argv) {
	unsigned int numiter = 100000;
	for (unsigned int i = 0; i < numiter; ++i) {
		DIF();
		std::cout << "ITERATION: " << i << std::endl;
	}
}

How can we fix the gpu memory leakage problem? Any suggestion is appreciated.

The problem is not in the filter. Near the beginning of DIF procedure you have manual memory allocation:

PixelType *FixedImageArray = new PixelType[nump];

You never deallocate it, hence the leak. You should add near the end of DIF:

delete[] FixedImageArray;

Same for moving.

PixelType *FixedImageArray = new PixelType[nump];

It only malloc the cpu memory, why it may cause the GPU leakage. Moreover, delete[] FixedImageArray in the end of DIF would cause a bug. The following code would report a bug:
1

#include "itkGPUDemonsRegistrationFilter.h"
#include "itkHistogramMatchingImageFilter.h"
#include "itkCastImageFilter.h"
#include "itkWarpImageFilter.h"
#include "itkLinearInterpolateImageFunction.h"

#include "itkImportImageFilter.h"
#include "itkImageFileReader.h"
#include "itkImageFileWriter.h"
#include "itkCommand.h"
#include "itkSmartPointer.h"
#include "itkTimeProbe.h"

#include "itkGPUImage.h"
#include "itkGPUKernelManager.h"
#include "itkGPUContextManager.h"
#include "itkGPUDemonsRegistrationFilter.h"

void DIF() {
	bool debug = false;

	//Fill some arrays with bogus
	const unsigned int Dimension = 3;
	typedef float PixelType;
	unsigned int width = 100;
	unsigned int height = 100;
	unsigned int slices = 10;
	unsigned int nump = width * height * slices;
	PixelType *FixedImageArray = new PixelType[nump];
	PixelType *MovingImageArray = new PixelType[nump];
	for (unsigned int i = 0; i < nump; ++i) {
		FixedImageArray[i] = i % 5;
		MovingImageArray[i] = i % 6;
	}


	//Import those arrays as images
	typedef itk::Image< PixelType, Dimension >  FixedImageType;
	typedef itk::Image< PixelType, Dimension >  MovingImageType;

	typedef itk::ImportImageFilter<PixelType, Dimension> FixedImportFilterType;
	typedef itk::ImportImageFilter<PixelType, Dimension> MovingImportFilterType;

	FixedImportFilterType::Pointer FixedImportFilter = FixedImportFilterType::New();
	if (debug) FixedImportFilter->DebugOn();
	MovingImportFilterType::Pointer MovingImportFilter = MovingImportFilterType::New();
	if (debug) MovingImportFilter->DebugOn();

	FixedImportFilterType::IndexType start;
	start[0] = 0;
	start[1] = 0;
	start[2] = 0;
	FixedImportFilterType::SizeType size;
	size[0] = width;
	size[1] = height;
	size[2] = slices;
	FixedImportFilterType::RegionType region;
	region.SetSize(size);
	region.SetIndex(start);
	FixedImportFilter->SetRegion(region);
	double origin[3];
	origin[0] = 0.0;
	origin[1] = 0.0;
	origin[2] = 0.0;
	FixedImportFilter->SetOrigin(origin);
	double spacing[3];
	spacing[0] = 1;
	spacing[1] = 1;
	spacing[2] = 1;
	FixedImportFilter->SetSpacing(spacing);

	MovingImportFilter->SetRegion(region);
	MovingImportFilter->SetOrigin(origin);
	MovingImportFilter->SetSpacing(spacing);

	FixedImportFilter->SetImportPointer(FixedImageArray, width * height * slices, true);
	MovingImportFilter->SetImportPointer(MovingImageArray, width * height * slices, true);

	FixedImageType::Pointer FixedImage = FixedImportFilter->GetOutput();
	if (debug) FixedImage->DebugOn();
	MovingImageType::Pointer MovingImage = MovingImportFilter->GetOutput();
	if (debug) MovingImage->DebugOn();


	//convert to GPUImages
	typedef float                                      InternalPixelType;
	typedef itk::GPUImage< InternalPixelType, Dimension > InternalImageType;
	typedef itk::CastImageFilter< FixedImageType,
		InternalImageType >  FixedImageCasterType;
	typedef itk::CastImageFilter< MovingImageType,
		InternalImageType >  MovingImageCasterType;

	FixedImageCasterType::Pointer fixedImageCaster = FixedImageCasterType::New();
	if (debug) fixedImageCaster->DebugOn();
	MovingImageCasterType::Pointer movingImageCaster = MovingImageCasterType::New();
	if (debug) movingImageCaster->DebugOn();

	fixedImageCaster->SetInput(FixedImportFilter->GetOutput());
	movingImageCaster->SetInput(MovingImportFilter->GetOutput());

	InternalImageType::Pointer GPUFixedImage = fixedImageCaster->GetOutput();
	if (debug) GPUFixedImage->DebugOn();

	InternalImageType::Pointer GPUMovingImage = movingImageCaster->GetOutput();
	if (debug) GPUMovingImage->DebugOn();

	GPUFixedImage->Update();
	GPUMovingImage->Update();

	//Perform GPU Demons Registration
	typedef itk::Vector< float, Dimension >             VectorPixelType;
	typedef itk::GPUImage<  VectorPixelType, Dimension >   DeformationFieldType;
	typedef itk::GPUDemonsRegistrationFilter<
		InternalImageType,
		InternalImageType,
		DeformationFieldType> RegistrationFilterType;

	RegistrationFilterType::Pointer filter = RegistrationFilterType::New();
	if (debug) filter->DebugOn();

	filter->SetFixedImage(GPUFixedImage);
	filter->SetMovingImage(GPUMovingImage);

	filter->SetNumberOfIterations(1);
	filter->SetStandardDeviations(1.0);

	filter->Update();

 	delete[] FixedImageArray;
 	delete[] MovingImageArray;
}

int main(int argc, char **argv) {
	unsigned int numiter = 100000;
	for (unsigned int i = 0; i < numiter; ++i) {
		DIF();
		std::cout << "ITERATION: " << i << std::endl;
	}
}

What happens if you create the image in the normal way, and not via import filter? You can modify pixel values using iterators.

The GPU memory leakage is still happen after I create the image in the normal way and delete the FixedImageArray/MovingImageArray. The code is:

#include "itkGPUDemonsRegistrationFilter.h"
#include "itkHistogramMatchingImageFilter.h"
#include "itkCastImageFilter.h"
#include "itkWarpImageFilter.h"
#include "itkLinearInterpolateImageFunction.h"

#include "itkImportImageFilter.h"
#include "itkImageFileReader.h"
#include "itkImageFileWriter.h"
#include "itkCommand.h"
#include "itkSmartPointer.h"
#include "itkTimeProbe.h"

#include "itkGPUImage.h"
#include "itkGPUKernelManager.h"
#include "itkGPUContextManager.h"
#include "itkGPUDemonsRegistrationFilter.h"

void DIF() {
	bool debug = true;

	//Fill some arrays with bogus
	const unsigned int Dimension = 3;
	typedef float PixelType;
	unsigned int width = 100;
	unsigned int height = 100;
	unsigned int slices = 10;
	unsigned int nump = width * height * slices;
	PixelType *FixedImageArray = new PixelType[nump];
	PixelType *MovingImageArray = new PixelType[nump];
	for (unsigned int i = 0; i < nump; ++i) {
		FixedImageArray[i] = i % 5;
		MovingImageArray[i] = i % 6;
	}


	//Import those arrays as images
	typedef itk::Image< PixelType, Dimension >  FixedImageType;
	typedef itk::Image< PixelType, Dimension >  MovingImageType;

	FixedImageType::IndexType start;
	start[0] = 0;
	start[1] = 0;
	start[2] = 0;
	FixedImageType::SizeType size;
	size[0] = width;
	size[1] = height;
	size[2] = slices;
	FixedImageType::RegionType region;
	region.SetSize(size);
	region.SetIndex(start);

	double origin[3];
	origin[0] = 0.0;
	origin[1] = 0.0;
	origin[2] = 0.0;

	double spacing[3];
	spacing[0] = 1;
	spacing[1] = 1;
	spacing[2] = 1;

	FixedImageType::Pointer FixedImage = FixedImageType::New();
	FixedImage->SetRegions(region);
	FixedImage->Allocate();
	memcpy(FixedImage->GetBufferPointer(), FixedImageArray, sizeof(PixelType)*width * height * slices);
	FixedImage->SetOrigin(origin);
	FixedImage->SetSpacing(spacing);

	MovingImageType::Pointer MovingImage = MovingImageType::New();
	MovingImage->SetRegions(region);
	MovingImage->Allocate();
	memcpy(MovingImage->GetBufferPointer(), MovingImageArray, sizeof(PixelType)*width * height * slices);
	MovingImage->SetOrigin(origin);
	MovingImage->SetSpacing(spacing);

	//convert to GPUImages
	typedef float                                      InternalPixelType;
	typedef itk::GPUImage< InternalPixelType, Dimension > InternalImageType;
	typedef itk::CastImageFilter< FixedImageType,
		InternalImageType >  FixedImageCasterType;
	typedef itk::CastImageFilter< MovingImageType,
		InternalImageType >  MovingImageCasterType;

	FixedImageCasterType::Pointer fixedImageCaster = FixedImageCasterType::New();
	if (debug) fixedImageCaster->DebugOn();
	MovingImageCasterType::Pointer movingImageCaster = MovingImageCasterType::New();
	if (debug) movingImageCaster->DebugOn();

	fixedImageCaster->SetInput(FixedImage);
	movingImageCaster->SetInput(MovingImage);

	InternalImageType::Pointer GPUFixedImage = fixedImageCaster->GetOutput();
	if (debug) GPUFixedImage->DebugOn();

	InternalImageType::Pointer GPUMovingImage = movingImageCaster->GetOutput();
	if (debug) GPUMovingImage->DebugOn();

	GPUFixedImage->Update();
	GPUMovingImage->Update();

	//Perform GPU Demons Registration
	typedef itk::Vector< float, Dimension >             VectorPixelType;
	typedef itk::GPUImage<  VectorPixelType, Dimension >   DeformationFieldType;
	typedef itk::GPUDemonsRegistrationFilter<
		InternalImageType,
		InternalImageType,
		DeformationFieldType> RegistrationFilterType;

	RegistrationFilterType::Pointer filter = RegistrationFilterType::New();
	if (debug) filter->DebugOn();

	filter->SetFixedImage(GPUFixedImage);
	filter->SetMovingImage(GPUMovingImage);

	filter->SetNumberOfIterations(1);
	filter->SetStandardDeviations(1.0);

	filter->Update();

	delete[] FixedImageArray;
	delete[] MovingImageArray;
}

int main(int argc, char **argv) {
	unsigned int numiter = 100000;
	for (unsigned int i = 0; i < numiter; ++i) {
		DIF();
		std::cout << "ITERATION: " << i << std::endl;
	}
}

This program seems to leak around 45KB/iteration or main memory on my computer. I doesn’t seem to leak GPU memory. Is this what you observe? How do you measure leaked memory?

I did not notice the CPU memory leakage. But, why there is CPU memory leakage after I delete the FixedImageArray/MovingImageArray? There is no other new or malloc.

More important, I use GPU Z and nvidia-smi.exe to observe the GPU memory usage. The maximum GPU usage is growing with the iteration. For example, for GPU Z, in the 0~1000 iteration, the maximum GPU usage is 160 M; in 5000 iteration, the maximum GPU usage is 170 M. When the iteration is larger than 10000, the maximum GPU usage is 190 M. For 20000 iteration, the maximum GPU usage is about 215 M.

@dzenanz Have you reproduce the GPU memory leakage? Do you have any suggestion to solve this problem?

I don’t have a suggestion for solving.

Does this fix has anything to do with your problem? If you try the latest master, does it help?

I have try the latest master, but the GPU memory leakage still exist. Currently, I am crazy about this problem, and it is out of my ability to fix this bug.