jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Dürig (JIRA) <j...@apache.org>
Subject [jira] [Updated] (OAK-7279) segment-tar update from java 7 to java 8 may break persisted names using invalid characters
Date Tue, 24 Apr 2018 09:10:00 GMT

     [ https://issues.apache.org/jira/browse/OAK-7279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Michael Dürig updated OAK-7279:
-------------------------------
    Labels: tech-debt  (was: )

> segment-tar update from java 7 to java 8 may break persisted names using invalid characters
> -------------------------------------------------------------------------------------------
>
>                 Key: OAK-7279
>                 URL: https://issues.apache.org/jira/browse/OAK-7279
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: segment-tar
>            Reporter: Julian Reschke
>            Priority: Minor
>              Labels: tech-debt
>
> segment-tar relies on {{String.getBytes()}} when persisting strings such as item names.
> The problem is that the behavior for this has been changed in Java 8 with respect to
invalid strings (here: null characters and unpaired surrogates).
> In Java 7, these would roundtrip, as Java was using the so-called "modified UTF-8" encoding
(see https://docs.oracle.com/javase/6/docs/api/java/io/DataInput.html#modified-utf-8). This
will produce byte sequence that are *not* valid UTF-8.
> Java 7 will read them back, but Java 8 will map the non-conforming byte sequences to
the Unicode replacement character. Note that in particular, multiple child entries might get
identical names as a consequence.
> I'm not sure about the severity of this, and whether something needs to be done about
it. AFAIC, this is another good reason to reject invalid strings as early as possible in the
stack.
> cc [~mduerig]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message