tim-evain
(Tim Evain)
February 22, 2019, 3:13pm
1
Hello everyone,
I’m using the itksys tools to do some file parsing and I’ve encountered a nasty problem with some filepaths containing accented characters.
For example I have this folder name in my database : “aorte_déroulée”. It appears that in fact, the “é” symbol is a “e” char followed by a combining acute accent char (I think that data came from a MacOS station).
When using itksys::Directory::Load() on the parent folder, the given string is then “aorte_de´roule´e”, which is fine. Problem is that this path is not recognized as valid afterward: using itksys::Directory::Load("[…]/aorte_de´roule´e") or itksys::SystemTools::FileIsDirectory("[…]/aorte_de´roule´e") fails. Replacing directly by the full char “é” in the string doesn’t help either.
Any insights on why it’s happening and how to fix it ?
(I can rename the folder with the accented letter char, then it will be correctly processed but I can’t control the file naming in my application)
Tim
EDIT: It appears to be a Windows-specific issue, as tests on MacOS and Linux run fine. I’ve tracked down the problem to:
itksys::Directory::Load() fails on (Directory.cxx, l 121)
srchHandle = _wfindfirst_func(
(wchar_t*)Encoding::ToWindowsExtendedPath(buf).c_str(), &data);
where srchHandle
is -1
itksys::SystemTools::FileIsDirectory fails on (SystemTools.cxx, l 2902)
DWORD attr =
GetFileAttributesW(Encoding::ToWindowsExtendedPath(name).c_str());
where attr == INVALID_FILE_ATTRIBUTES
Hi Tim,
Could you please test with this PR that updates KWSys?
If there still are issues, we can fix KWSys.
Thanks,
Matt
tim-evain
(Tim Evain)
February 25, 2019, 12:12pm
3
Hi Matt,
I’ve tried the current master state this morning ; it doesn’t fix this specific problem .
Tim
Hi Tim,
Thanks for testing
Is it possible to create a test that reproduces the issue?
Thanks,
Matt
tim-evain
(Tim Evain)
February 25, 2019, 3:59pm
5
Sure !
Here it is:
CombinedAccents_TestCase.zip (1.9 KB)
It’s a very simple case, it should output the name of the accented folder, but it does not.
Tim
2 Likes
Thanks, Tim!
I created this issue to track this:
opened 04:16PM - 25 Feb 19 UTC
closed 10:30PM - 27 Feb 19 UTC
type:Bug
type:Infrastructure
area:Core
### Description
As discussed on Discourse:
https://discourse.itk.org/t/h… andling-of-combined-accents-by-itk-system-tools/1619/5
reported by Tim Evain,
> I’m using the itksys tools to do some file parsing and I’ve encountered a nasty problem with some filepaths containing accented characters.
For example I have this folder name in my database : “aorte_déroulée”. It appears that in fact, the “é” symbol is a “e” char followed by a combining acute accent char (I think that data came from a MacOS station).
When using itksys::Directory::Load() on the parent folder, the given string is then “aorte_de´roule´e”, which is fine. Problem is that this path is not recognized as valid afterward: using itksys::Directory::Load("[…]/aorte_de´roule´e") or itksys::SystemTools::FileIsDirectory("[…]/aorte_de´roule´e") fails. Replacing directly by the full char “é” in the string doesn’t help either.
### Steps to Reproduce
<!--
1. [First Step]
2. [Second Step]
3. [and so on...]
Provide a minimal, complete, compilable, and verifiable example (commonly
abbreviated as MWE, Minimal Working Example, or sometimes referred to as SSEE,
Short, Self Contained, Correct (Compilable) Example, SSCCE) or code snippet,
either through a GitHub gist (https://gist.github.com/) or providing your own
files (including your source code, `CMakeLists.txt` file if applicable, and your
data) reproducing the issue or showing an incorrect result. -->
### Expected behavior
Accented characters are handled.
### Actual behavior
- itksys::Directory::Load() fails on (Directory.cxx, l 121)
```
srchHandle = _wfindfirst_func(
(wchar_t*)Encoding::ToWindowsExtendedPath(buf).c_str(), &data);
```
where `srchHandle` is `-1`
- itksys::SystemTools::FileIsDirectory fails on (SystemTools.cxx, l 2902)
```
DWORD attr =
GetFileAttributesW(Encoding::ToWindowsExtendedPath(name).c_str());
```
where `attr == INVALID_FILE_ATTRIBUTES`
### Reproducibility
Recent ITK Git `master` (with updated KWSys) was tested.
### Environment
> It appears to be a Windows-specific issue, as tests on MacOS and Linux run fine. I’ve tracked down the problem to:
### Additional Information
Test case from Tim attached.
[CombinedAccents_TestCase.zip](https://github.com/InsightSoftwareConsortium/ITK/files/2901365/CombinedAccents_TestCase.zip)
CC: @bradking
1 Like
tim-evain
(Tim Evain)
February 25, 2019, 4:32pm
7
You’re welcome, thanks a million for the help.
Tim
dzenanz
(Dženan Zukić)
February 27, 2019, 4:17pm
8
@tim-evain please review the PR 546 . You can find more information in the issue comments .
1 Like
tim-evain
(Tim Evain)
February 27, 2019, 5:04pm
9
@dzenanz I’ve just tried on Windows, but switching the default codepage has solved the issue .
I will try with other OS whenever I get a chance.
Thank you and @brad.king for looking into this.
Tim
dzenanz
(Dženan Zukić)
February 27, 2019, 8:53pm
10
@tim-evain do you have a GitHub profile? If so it would be good if you formally approved the PR.
tim-evain
(Tim Evain)
February 28, 2019, 11:07am
11
I do. I see the PR is closed now. Sorry for the delayed answer.
I will try to do so next time; but can I do it as a non-member of the ISC ?
dzenanz
(Dženan Zukić)
February 28, 2019, 3:09pm
12
I think you can. I think anyone can review a PR. But it takes an approving review from someone with write access to enable the green merge button.