maven-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jukka Harkki (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MCHANGELOG-142) UTF-8 Encoding doubled
Date Thu, 04 Feb 2016 14:14:40 GMT

     [ https://issues.apache.org/jira/browse/MCHANGELOG-142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jukka Harkki updated MCHANGELOG-142:
------------------------------------
    Description: 
Creating changelog.xml file doubles UTF-8 encoding if the git comment information is already
UTF-8 format. For example: if property outputEncoding is set to ISO-8859-1 the output is (shown
as od dump):
{code}
0004060 7375 7420 696f 696d 616d 6e61 6d20 c379
          u   s       t   o   i   m   i   m   a   a   n       m   y   ├
0004100 73b6 6c20 7369 a4c3 6b79 6573 7373 a4c3
          Â   s       l   i   s   ├   ñ   y   k   s   e   s   s   ├   ñ
{code}
And when set to UTF-8 the output is:
{code}
0004060 6d69 6d69 6161 206e 796d 83c3 b6c2 2073
          i   m   i   m   a   a   n       m   y   ├   â   ┬   Â   s
{code}
The result of UTF-8 encoding is that scandinavian umlauts are garbled. Code C3 B6 is the right
for the "ö"-letter.

The ISO-8859-1 format would do for the site documentation but since the file changelog.xml
header says ISO-8859-1 encoding, rest of the process fails to process umlauts.

I modified class ChangeLogReport method writeChangelogXml() by commenting out issue MCHANGELOG-86
writer change:
{code}
        PrintWriter pw = new PrintWriter(new BufferedOutputStream(new FileOutputStream(outputXML)));
        pw.write(changelogXml.toString());
        pw.flush();
        pw.close();
        // MCHANGELOG-86
//        Writer writer = WriterFactory.newWriter( new BufferedOutputStream( new FileOutputStream(
outputXML ) ),
//                                                 getOutputEncoding() );
//        writer.write(changelogXml.toString());
//        writer.flush();
//        writer.close();
{code}

It might be there is double escaping in Writer since couple of lines above the change set
is created with encoding information. However, this is just a wild guess since I did not check
out implementation of changelogSet.toXML() or writer.write()
{code}
            String changeset = changelogSet.toXML(getOutputEncoding());
{code}


  was:
Creating changelog.xml file doubles UTF-8 encoding if the git comment information is already
UTF-8 format. For example: if property outputEncoding is set to ISO-8859-1 the output is (shown
as od dump):
{code}
0004060 7375 7420 696f 696d 616d 6e61 6d20 c379
          u   s       t   o   i   m   i   m   a   a   n       m   y   ├
0004100 73b6 6c20 7369 a4c3 6b79 6573 7373 a4c3
          Â   s       l   i   s   ├   ñ   y   k   s   e   s   s   ├   ñ
{code}
And when set to UTF-8 the output is:
{code}
0004060 6d69 6d69 6161 206e 796d 83c3 b6c2 2073
          i   m   i   m   a   a   n       m   y   ├   â   ┬   Â   s
{code}
The result of UTF-8 encoding is that scandinavian umlauts are garbled. Code C3 B6 is the right
for "ö"-letter.

The ISO-8859-1 format would do for the site documentation otherwise but since the file xml
header says ISO-8859-1 encoding, rest of the process fails.

I modified class ChangeLogReport method writeChangelogXml() by commenting out issue MCHANGELOG-86
writer change:
{code}
        PrintWriter pw = new PrintWriter(new BufferedOutputStream(new FileOutputStream(outputXML)));
        pw.write(changelogXml.toString());
        pw.flush();
        pw.close();
        // MCHANGELOG-86
//        Writer writer = WriterFactory.newWriter( new BufferedOutputStream( new FileOutputStream(
outputXML ) ),
//                                                 getOutputEncoding() );
//        writer.write(changelogXml.toString());
//        writer.flush();
//        writer.close();
{code}

It might be there is double escaping in Writer since couple of lines above the change set
is created with encoding information. However, this is just a wild guess since I did not check
out implementation of changelogSet.toXML() or writer.write()
{code}
            String changeset = changelogSet.toXML(getOutputEncoding());
{code}



> UTF-8 Encoding doubled
> ----------------------
>
>                 Key: MCHANGELOG-142
>                 URL: https://issues.apache.org/jira/browse/MCHANGELOG-142
>             Project: Maven Changelog Plugin
>          Issue Type: Bug
>    Affects Versions: 2.3
>            Reporter: Jukka Harkki
>
> Creating changelog.xml file doubles UTF-8 encoding if the git comment information is
already UTF-8 format. For example: if property outputEncoding is set to ISO-8859-1 the output
is (shown as od dump):
> {code}
> 0004060 7375 7420 696f 696d 616d 6e61 6d20 c379
>           u   s       t   o   i   m   i   m   a   a   n       m   y   ├
> 0004100 73b6 6c20 7369 a4c3 6b79 6573 7373 a4c3
>           Â   s       l   i   s   ├   ñ   y   k   s   e   s   s   ├   ñ
> {code}
> And when set to UTF-8 the output is:
> {code}
> 0004060 6d69 6d69 6161 206e 796d 83c3 b6c2 2073
>           i   m   i   m   a   a   n       m   y   ├   â   ┬   Â   s
> {code}
> The result of UTF-8 encoding is that scandinavian umlauts are garbled. Code C3 B6 is
the right for the "ö"-letter.
> The ISO-8859-1 format would do for the site documentation but since the file changelog.xml
header says ISO-8859-1 encoding, rest of the process fails to process umlauts.
> I modified class ChangeLogReport method writeChangelogXml() by commenting out issue MCHANGELOG-86
writer change:
> {code}
>         PrintWriter pw = new PrintWriter(new BufferedOutputStream(new FileOutputStream(outputXML)));
>         pw.write(changelogXml.toString());
>         pw.flush();
>         pw.close();
>         // MCHANGELOG-86
> //        Writer writer = WriterFactory.newWriter( new BufferedOutputStream( new FileOutputStream(
outputXML ) ),
> //                                                 getOutputEncoding() );
> //        writer.write(changelogXml.toString());
> //        writer.flush();
> //        writer.close();
> {code}
> It might be there is double escaping in Writer since couple of lines above the change
set is created with encoding information. However, this is just a wild guess since I did not
check out implementation of changelogSet.toXML() or writer.write()
> {code}
>             String changeset = changelogSet.toXML(getOutputEncoding());
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message