commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (COMPRESS-459) CPIO fails decoding multibyte name entries
Date Mon, 09 Jul 2018 09:53:00 GMT

    [ https://issues.apache.org/jira/browse/COMPRESS-459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16536735#comment-16536735
] 

ASF GitHub Bot commented on COMPRESS-459:
-----------------------------------------

GitHub user ctron opened a pull request:

    https://github.com/apache/commons-compress/pull/67

    [COMPRESS-459] Fix reading of multibyte name entries

    This fixes COMPRESS-459 by using the name number of bytes from the field
    in the stream instead of relying on the assumption that each character
    is exactly one byte, which isn't true for UTF-8, UTF-16 or other
    multi-byte character encodings.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ctron/commons-compress feature/fix_COMPRESS_459_1

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/commons-compress/pull/67.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #67
    
----
commit 715352d3343358539a40b9623a9a00beb115ff30
Author: Jens Reimann <jreimann@...>
Date:   2018-07-09T09:41:43Z

    [COMPRESS-459] Fix reading of multibyte name entries
    
    This fixes COMPRESS-459 by using the name number of bytes from the field
    in the stream instead of relying on the assumption that each character
    is exactly one byte, which isn't true for UTF-8, UTF-16 or other
    multi-byte character encodings.

----


> CPIO fails decoding multibyte name entries
> ------------------------------------------
>
>                 Key: COMPRESS-459
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-459
>             Project: Commons Compress
>          Issue Type: Bug
>          Components: Compressors
>    Affects Versions: 1.9, 1.17
>            Reporter: Jens Reimann
>            Priority: Major
>
> Having a CPIO archive in (e.g. UTF-8) mode and having a name entry with a name containing
multi-byte characters the decoder fails.
> The problem IMHO is the "getHeaderPadCount" method, which assumes a single byte per character:
>  
> {code:java}
>     public int getHeaderPadCount(){
>         if (this.alignmentBoundary == 0) { return 0; }
>         int size = this.headerSize + 1;  // Name has terminating null
>         if (name != null) {
>             size += name.length();
>         }
>         final int remain = size % this.alignmentBoundary;
>         if (remain > 0){
>             return this.alignmentBoundary - remain;
>         }
>         return 0;
>     }
> {code}
> However this may (or may not) be true for UTF-8.
>  
> Also it wouldn't be enough to call "String#getBytes(…)" as this might already transform
the underlying bytes.
> The proper solution would be to provide the name size, as read from the CPIO stream,
and pass it to the entry.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message