jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Mueller (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (OAK-333) 1000 character path limit in MongoMK
Date Mon, 03 Dec 2012 11:03:58 GMT

    [ https://issues.apache.org/jira/browse/OAK-333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508646#comment-13508646
] 

Thomas Mueller commented on OAK-333:
------------------------------------

One way to solve this (potential) problem is to shrink long paths, using a replacment table,
similar to what we do in Jackrabbit 2.x using the name index.

(A) One solution is to shrink the paths is to have a 'replacement table' for long path elements
(similar to MS-DOS filenames). For example, path elements longer than 19 bytes could be replaced
with a shorter version ("/x/longPathElementBlaBlaBla/anotherLongPlathElement/y" could be replaced
with "/x/longPath~1/anotherLo~2/y", using the name index "longPath~1" = "longPathElementBlaBlaBla"
and "anotherLo~2" = "anotherLongPlathElement"). This would limit each element to be 19 bytes,
so that a path could always contain 50 elements (the '/' is also one byte). This is still
a potential problem, but only for very rare cases (one could say "misbehaving applications").
The disadvantage is that shortening is required even if the path itself is quite short, but
only one path element is long. So shortening would be used quite a lot.

(B) Another solution is to shrink multiple the combination of multiple elements. 
("/x/longPathElementBlaBlaBla/anotherLongPlathElement/y" would be replaced with "~1/y" using
the name index "~1" = "/x/longPathElementBlaBlaBla/anotherLongPlathElement"). One could even
allow recursive replacement, so that there is no limit. This would avoid having to shorten
path that are short and contain just one or few longer elements. The advantage is that shortening
is only required for extreme cases.

Each MicroKernel would need to know exactly when shortening is required, without having to
read this collection each time. For variant (A) this is quite easy (shortening is required
for all elements longer than 20 bytes), for variant (B) it would be a bit more complex: shortening
is required if the total length of the parent of the path exceeds 500 bytes (only the parent
is shortened). It is also required for each path element that exceeds 500 bytes.

Above, it is assumed that "~" is illegal within a path. I guess this is not the case, so another
character needs to be used as an shortening character.


                
> 1000 character path limit in MongoMK
> ------------------------------------
>
>                 Key: OAK-333
>                 URL: https://issues.apache.org/jira/browse/OAK-333
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: mk, mongomk
>    Affects Versions: 0.5
>            Reporter: Mete Atamel
>            Assignee: Mete Atamel
>            Priority: Minor
>         Attachments: OAK-333.patch
>
>
> In an infinite loop try to add nodes one under another to have N0/N1/N2...NN. At some
point, the current parent node will not be found and the current commit will fail. I think
this happens when the path length exceeds 1000 characters. Is this enough for a path? I was
able to create this way only 222 levels in the tree (and my node names were really short N1,
N2 ...)
> There's an automated tests for this: NodeExistsCommandMongoTest.testTreeDepth

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message