jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chetan Mehrotra (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (OAK-3395) RevisionGC fails for JCR paths having line feed characters
Date Wed, 16 Sep 2015 09:17:46 GMT

     [ https://issues.apache.org/jira/browse/OAK-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Chetan Mehrotra updated OAK-3395:
    Attachment: OAK-3395-2.patch

Thanks Thomas for the extensive review!

Attached is [updated patch|^OAK-3395-2.patch]

bq. by the way, even now an exception is throws if the last character if a backslash

In previous case that was handled as {{unescapingRequired}} had a check to ensure that any
'\' found is at least second last character in the string. So the out of bound access in unescape
would not happen. However your suggestion on simplifying {{unescapingRequired}} looks better
so now added a check there

bq. There should be unit tests for the special cases as well (code coverage should be 100%).

Done now

In addition I have now used {{RandomStringUtils}} from commons lang (test scoped dep) as the
logic you suggested might not generate valid unicode chars. Commons Lang util ensure that
proper unicode chars would be generated

Further about my concern

bq. I am not very sure if escaping as implemented would work fine for unicode chars (involving
surrogate pair i.e. those not in BMP)

I checked the docs [1] and it mentions following

...if an application scans a char sequence for HTML tags, checking each char individually,
it knows that these tags only use characters from the Basic Latin block. If the text being
scanned contains supplementary characters, then these characters cannot be confused with the
tag characters, because UTF-16 represents supplementary characters using code units whose
values are not used for BMP characters. 

So it confirms that char by char processing approach used would work fine as char being search
are from ASCII set and in case of surrogate pair (using 2 chars) its not possible for first
char to have values from ASCII (or more broader BMP range).

[1] http://www.oracle.com/us/technologies/java/supplementary-142654.html

> RevisionGC fails for JCR paths having line feed characters
> ----------------------------------------------------------
>                 Key: OAK-3395
>                 URL: https://issues.apache.org/jira/browse/OAK-3395
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: mongomk, rdbmk
>            Reporter: Chetan Mehrotra
>            Assignee: Chetan Mehrotra
>            Priority: Minor
>             Fix For: 1.3.7, 1.2.6, 1.0.21
>         Attachments: OAK-3395-1.patch, OAK-3395-2.patch
> RevisionGC fails with error while processing any id (derived from JCR path) having line
feed or carriage return char
> This happens because it relies on Oak Commons StringSort and ExternalSort which works
with line delimited string and having an id with line break would break this sorting logic.
Error reported is like
> {noformat}
> java.lang.AssertionError: Invalid id /1442211320
> 	at org.apache.jackrabbit.oak.plugins.document.util.Utils.getDepthFromId(Utils.java:337)
> 	at org.apache.jackrabbit.oak.plugins.document.NodeDocumentIdComparator.compare(NodeDocumentIdComparator.java:38)
> 	at org.apache.jackrabbit.oak.plugins.document.NodeDocumentIdComparator.compare(NodeDocumentIdComparator.java:30)
> 	at java.util.TimSort.countRunAndMakeAscending(TimSort.java:324)
> 	at java.util.TimSort.sort(TimSort.java:203)
> 	at java.util.TimSort.sort(TimSort.java:173)
> 	at java.util.Arrays.sort(Arrays.java:659)
> 	at java.util.Collections.sort(Collections.java:217)
> 	at org.apache.jackrabbit.oak.commons.sort.ExternalSort.sortAndSave(ExternalSort.java:279)
> 	at org.apache.jackrabbit.oak.commons.sort.ExternalSort.sortInBatch(ExternalSort.java:218)
> 	at org.apache.jackrabbit.oak.commons.sort.ExternalSort.sortInBatch(ExternalSort.java:257)
> 	at org.apache.jackrabbit.oak.commons.sort.StringSort$PersistentState.sort(StringSort.java:191)
> 	at org.apache.jackrabbit.oak.commons.sort.StringSort.sort(StringSort.java:88)
> 	at org.apache.jackrabbit.oak.plugins.document.VersionGarbageCollector$DeletedDocsGC.ensureSorted(VersionGarbageCollector.java:383)
> 	at org.apache.jackrabbit.oak.plugins.document.VersionGarbageCollector$DeletedDocsGC.getDocIdsToDelete(VersionGarbageCollector.java:274)
> 	at org.apache.jackrabbit.oak.plugins.document.VersionGarbageCollector$DeletedDocsGC.removeDeletedDocuments(VersionGarbageCollector.java:296)
> 	at org.apache.jackrabbit.oak.plugins.document.VersionGarbageCollector$DeletedDocsGC.removeDocuments(VersionGarbageCollector.java:241)
> 	at org.apache.jackrabbit.oak.plugins.document.VersionGarbageCollector.collectDeletedDocuments(VersionGarbageCollector.java:154)
> 	at org.apache.jackrabbit.oak.plugins.document.VersionGarbageCollector.gc(VersionGarbageCollector.java:105)
> 	at org.apache.jackrabbit.oak.plugins.document.VersionGCDeletionTest.gcWithPathsHavingNewLine(VersionGCDeletionTest.java:203)
> {noformat}

This message was sent by Atlassian JIRA

View raw message