maven-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andreas Gudian (JIRA)" <>
Subject [jira] (SUREFIRE-1137) Problem with Umlauts in stdout
Date Sun, 01 Feb 2015 10:44:18 GMT


Andreas Gudian commented on SUREFIRE-1137:

I should have answered yesterday night already: I was able to reproduce the problem on my
local Windows machine by encoding the test java file as UTF-8 and using UTF-8 in the pom.
Stacktraces and error messages are correctly encoded in the output XML, but the sysout doesn't
survive the journey, just as Jürgen describes.

My main maven process has {{Charset.defaultCharset()}} being my windows-1252, whereas the
forked VM has {{Charset.defaultCharset()}} UTF-8. The current implementation relies on the
default charset being the same on both the main process and the forked process, hence the
encoding garbage.

* if I don't pass file.encoding to the forked VM, then the forked VM also uses windows-1252
* If I pass -Dfile.encoding=UTF-8 in the MAVEN_OPTS to the main process, then System.getProperty("file.encoding")
is "UTF-8", but {{Charset.defaultCharset()}} _remains being windows-1252_ - I was not able
to manipulate the defaultCharset of the main process with a system property. 

But the documentation is quite clear on that: you're not supposed to change the defaultCharset
by using file.encoding, but instead change the system's locale / language settings. Meh.

I'm not really sure yet what to make of this. I could pass the fork's defaultCharset back
to the main process to properly recode the stream into UTF-8. I could pass the main's defaultCharset
to the fork to use that one for encoding the String in PrintSteam's print(String) method (although
that may cause strange side-effects with other ways how to use that print stream). Or I could
convert any print stream activity in the fork to UTF-16 (although not every charset can transform
all its characters to UTF-16 and then again back from UTF-16, which is why I tried to rely
on the defaultEncoding in the first place)... 
So I might go with the first option, but I still need to think about it (to see if it really
is the right thing to do). 

If you guys have an idea here, let me know.

> Problem with Umlauts in stdout
> ------------------------------
>                 Key: SUREFIRE-1137
>                 URL:
>             Project: Maven Surefire
>          Issue Type: Bug
>          Components: Maven Surefire Plugin
>    Affects Versions: 2.18
>         Environment: Linux
>            Reporter: Jürgen Zeller
>            Assignee: Andreas Gudian
>         Attachments:
> When using Cp1252 as file encoding, the generated Surefire stdout report contains invalid
characters when run on Linux. When running the same test on Windows, everything is fine.
> A simular Problem was reported in SUREFIRE-998

This message was sent by Atlassian JIRA

View raw message