Write German Umlauts into MetaData with SimpleITK

Philip · August 24, 2021, 1:15pm

Hello friends,

I’ve got a problem I am not able to solve. I have to write a German word containing the letters ‘ö’ and ‘ä’ into the MetaData of our Dicom-File. However SITK’s SetMetaData refuses to write it. It always gets encoded to Ã¶ and Ã¤, no matter what I try.

Unfortunately I can’t switch to ‘oe’ for ‘ö’ or ‘ae’ for ä since our PACS won’t match it to previous scans with that word.

I was wondering if there is a proper way to get SetMetaData to use the unicode encoding, because I know that that works for our PACS and I’ve seen other datasets do it like that.

SITK-Version: 2.1.0
IDE: Pycharm
Python: 3.6

What I’ve tried so far:

u’ABC\u00e4XYZ’
‘ABCäXYZ’.encode() with probably almost any existing encoding
Checked the file with Hex Editor: Wrong: ä as Bytes C3 A4. Correctly represented in PACS would be: Byte E4
Played with 0008|0005
I made sure that the string changes everytime I try something new in case something breaks

I am at my wits’ end, so any input is appreciated, even if it’s just telling me that it’s not possible!

Thanks for your help.

dzenanz · August 24, 2021, 1:29pm

Have you tried setting “Attribute Specific Character Set (0008,0005)” to ISO_IR 192? And encode your metadata strings using this encoding?

From DICOM standard:

Note

The ISO 10646-1, 10646-2, and their associated supplements and extensions correspond to the Unicode version 3.2 character set. The ISO IR 192 corresponds to the use of the UTF-8 encoding for this character set.

Philip · August 25, 2021, 8:29am

Hey! Thanks for your answer.

I’ve got it working now. The solution was to use ISO_IR 192 and remove any previous encoding from the strings before writing with SetMetaData.

Philip · August 25, 2021, 10:58am

Actually, I think I was a little bit fast in telling that it’s working now. The problem still exists. ISO_IR 192 and no encoding makes most viewers interpret the string correctly as unicode, however our PACS can’t/does not use unicode correctly it seems. The bytes are still saved the wrong way for the PACS to read it correctly, so instead of 0xe4 for ä it’s still 0xc3 AND 0xa4 for one Umlaut, which is correct for unicode, but not for the encoding the PACS requires, which is I guess latin-1 or something.

The problem is, that SetMetaData does not accept the latin-1 encoding, because it outputs:

File “/xyz/PycharmProjects/nifti2dcm/venv/lib/python3.6/site-packages/SimpleITK/SimpleITK.py”, line 3384, in SetMetaData
return _SimpleITK.Image_SetMetaData(self, key, value)
TypeError: in method ‘Image_SetMetaData’, argument 3 of type ‘std::string const &’

Even if I set 0008|0005 to ISO_IR 100.

I don’t know what I am doing wrong at this point.

Any further help is greatly appreciated .

Best regards,
Philip

zivy · August 25, 2021, 1:55pm

Hi @Philip,

This is a feature/problem of the SetMetaData method. It only accepts inputs of type string. When encoding the string using latin-1 the type is byte. Not sure if it is possible to overcome the issue:

import SimpleITK as sitk

# use a month name for the patient name
month_str = u'März'

#This works, but doesn't solve the specific PACS reading issue
image = sitk.Image([2,2], sitk.sitkUInt8)
image.SetMetaData('0010|0010', month_str)
image.SetMetaData('0008|0005','ISO_IR 192')
sitk.WriteImage(image, 'umlat1.dcm')

print(type(month_str))
print(type(month_str.encode('latin-1')))

#This fails because of the type issue
image.SetMetaData('0008|0005','ISO_IR 100')
image.SetMetaData('0010|0010',month_str.encode('latin-1'))

Philip · August 26, 2021, 7:09am

Hi Zivy,

I was hoping this was not the case and I was just too stupid to use it correctly :-), but after trying almost every possible permutation of encodings I was more or less expecting it to be that way.

Since there may be other people with the same problem as me, here is my fix:

I read the file as a Bytearray after writing it with SimpleITK and then adjust both wrong Umlaut-Bytes to the correct Byte, which - thankfully - is a pretty minor coding task to do since most DICOMs have a nicely structured data layout and because I only need them to be fixed within three or four different fields.

Don’t forget to adjust the field length afterwards and only partially read the header of the file till the pixeldata begins tho! It works well so far, and the additional runtime is negligible.

Thanks for your help!