jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Mueller (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (OAK-333) 1000 character path limit in MongoMK
Date Fri, 07 Dec 2012 10:59:21 GMT

    [ https://issues.apache.org/jira/browse/OAK-333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13526313#comment-13526313
] 

Thomas Mueller edited comment on OAK-333 at 12/7/12 10:58 AM:
--------------------------------------------------------------

It would still have the same basic performance problems as indexing the hash, once the path
gets too long. 

In (B) above, I described a solution that is somewhat similar, but avoids the problem: instead
shrinking the end of the path, I suggested to shrink the beginning of the path. So if the
path exceeds a limit, the first limit/2 characters are replaced with index, which could be
the hash code actually. So

{code}
/a/very/long/path/that/exceeds/a/length/limit
{code}

would be converted to

{code}
<id(/a/very/long/path/that)>/exceeds/a/length/limit
{code}

Instead of a simple hash, I would use a lookup table. This lookup table would normally be
empty (as normally there are no long paths). If a path is too long, then the left 50% of the
path is stored there. So that each path that starts with /a/very/long/path/that uses the same
shorter prefix).

Similar to the name index we use in Jackrabbit, the id of the long prefix could be the hash
code of the prefix.

That way, similar paths stay on the same mongo shard.
                
      was (Author: tmueller):
    It would still have the same basic performance problems as indexing the hash, once the
path gets too long. 

In (B) above, I described a solution that is somewhat similar, but avoids the problem: instead
shrinking the end of the path, I suggested to shrink the beginning of the path. So if the
path exceeds a limit, the first limit/2 characters are replaced with index, which could be
the hash code actually. So

{code}
/a/very/long/path/that/exceeds/a/length/limit
{code}

would be converted to

{code}
<id(/a/very/long/path/that)>/exceeds/a/length/limit
{code}

Instead of a simple hash, I would use a lookup table. This lookup table would normally be
empty (as normally there are no long paths). If a path is too long, then the left 50% of the
path is stored there. So that each path that starts with /a/very/long/path/that.

Similar to the name index we use in Jackrabbit, the id of the long prefix could be the hash
code of the prefix.

That way, similar paths stay on the same mongo shard.
                  
> 1000 character path limit in MongoMK
> ------------------------------------
>
>                 Key: OAK-333
>                 URL: https://issues.apache.org/jira/browse/OAK-333
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: mk, mongomk
>    Affects Versions: 0.5
>            Reporter: Mete Atamel
>            Assignee: Mete Atamel
>            Priority: Minor
>         Attachments: OAK-333.patch
>
>
> In an infinite loop try to add nodes one under another to have N0/N1/N2...NN. At some
point, the current parent node will not be found and the current commit will fail. I think
this happens when the path length exceeds 1000 characters. Is this enough for a path? I was
able to create this way only 222 levels in the tree (and my node names were really short N1,
N2 ...)
> There's an automated tests for this: NodeExistsCommandMongoTest.testTreeDepth

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message