Handling of combined accents by ITK system tools


(Tim Evain) #1

Hello everyone,

I’m using the itksys tools to do some file parsing and I’ve encountered a nasty problem with some filepaths containing accented characters.
For example I have this folder name in my database : “aorte_déroulée”. It appears that in fact, the “é” symbol is a “e” char followed by a combining acute accent char (I think that data came from a MacOS station).
When using itksys::Directory::Load() on the parent folder, the given string is then “aorte_de´roule´e”, which is fine. Problem is that this path is not recognized as valid afterward: using itksys::Directory::Load("[…]/aorte_de´roule´e") or itksys::SystemTools::FileIsDirectory("[…]/aorte_de´roule´e") fails. Replacing directly by the full char “é” in the string doesn’t help either.

Any insights on why it’s happening and how to fix it ?
(I can rename the folder with the accented letter char, then it will be correctly processed but I can’t control the file naming in my application)

Tim

EDIT: It appears to be a Windows-specific issue, as tests on MacOS and Linux run fine. I’ve tracked down the problem to:

  • itksys::Directory::Load() fails on (Directory.cxx, l 121)
  srchHandle = _wfindfirst_func(
    (wchar_t*)Encoding::ToWindowsExtendedPath(buf).c_str(), &data);

where srchHandle is -1

  • itksys::SystemTools::FileIsDirectory fails on (SystemTools.cxx, l 2902)
  DWORD attr =
    GetFileAttributesW(Encoding::ToWindowsExtendedPath(name).c_str());

where attr == INVALID_FILE_ATTRIBUTES


(Matt McCormick) #2

Hi Tim,

Could you please test with this PR that updates KWSys?

If there still are issues, we can fix KWSys.

Thanks,
Matt


(Tim Evain) #3

Hi Matt,

I’ve tried the current master state this morning ; it doesn’t fix this specific problem :smile:.

Tim


(Matt McCormick) #4

Hi Tim,

Thanks for testing :+1:

Is it possible to create a test that reproduces the issue?

Thanks,
Matt


(Tim Evain) #5

Sure !

Here it is:
CombinedAccents_TestCase.zip (1.9 KB)
It’s a very simple case, it should output the name of the accented folder, but it does not.

Tim


(Matt McCormick) #6

Thanks, Tim!

I created this issue to track this:


(Tim Evain) #7

You’re welcome, thanks a million for the help.

Tim


(Dženan Zukić) #8

@tim-evain please review the PR 546. You can find more information in the issue comments.


(Tim Evain) #9

@dzenanz I’ve just tried on Windows, but switching the default codepage has solved the issue :+1:.
I will try with other OS whenever I get a chance.

Thank you and @brad.king for looking into this.

Tim


(Dženan Zukić) #10

@tim-evain do you have a GitHub profile? If so it would be good if you formally approved the PR.


(Tim Evain) #11

I do. I see the PR is closed now. Sorry for the delayed answer.
I will try to do so next time; but can I do it as a non-member of the ISC ?


(Dženan Zukić) #12

I think you can. I think anyone can review a PR. But it takes an approving review from someone with write access to enable the green merge button.