jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chetan Mehrotra (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (OAK-1926) UnmergedBranch state growing with empty BranchCommit leading to performance degradation
Date Wed, 23 Jul 2014 06:56:38 GMT

    [ https://issues.apache.org/jira/browse/OAK-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071411#comment-14071411
] 

Chetan Mehrotra edited comment on OAK-1926 at 7/23/14 6:54 AM:
---------------------------------------------------------------

Following notes are are based on  a discussion with [~mreutegg] on this issue

* DocumentNodeStore needs to keep track of UnmergedBranches to distinguish revisions which
are part of a branch
* If a process terminates with some pending UnmergedBranches then those branch info remain
present in root document revision map and can only be removed if we do a garbage collection
and remove all commits which were part of those branches. Without that we need to maintain
the in memory state
* Loading of unmerged branch was done in [1461193|http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-mongomk/src/main/java/org/apache/jackrabbit/mongomk/prototype/MongoMK.java?r1=1461193&r2=1461192&pathrev=1461193]
* Currently there are following problems wrt unmerged branches
** A - Check for revision being part of branch is costly - The way check is currently implemented
does not distinguish between in memory alive branches and persisted unmerged branches. To
simplify the check we distinguish between the two types and for persisted unmerged branch
we keep a set of such rev and first do a lookup there to confirm if rev is part of unmerged
branch before doing actual check
** B - Tracking of branches which are not merged - An unmerged branch state would be persisted
in two cases
*** Client did not merged the branch - In this case we can somehow figure out that a branch
has gone out of scope (possibly via WekReference on DocumentNodeStoreBranch) and would not
be merged. In such a case we know the commits done in that branch and perform a cleanup
*** Oak processes had a sudden exit - In this case branch commit info would be lost and we
would have to resort to GC
** C - Unmerged Rev GC (OAK-1981) - Once we implement a full GC then such branch state can
be collected in that GC

For now as part of this bug we would implement #C as that should reduce the performance issue
and later we can go for #B and #C


was (Author: chetanm):
Following notes are are based on  a discussion with [~mreutegg] on this issue

* DocumentNodeStore needs to keep track of UnmergedBranches to distinguish revisions which
are part of a branch
* If a process terminates with some pending UnmergedBranches then those branch info remain
present in root document revision map and can only be removed if we do a garbage collection
and remove all commits which were part of those branches. Without that we need to maintain
the in memory state
* Loading of unmerged branch was done in [1461193|http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-mongomk/src/main/java/org/apache/jackrabbit/mongomk/prototype/MongoMK.java?r1=1461193&r2=1461192&pathrev=1461193]
* Currently there are following problems wrt unmerged branches
** A - Check for revision being part of branch is costly - The way check is currently implemented
does not distinguish between in memory alive branches and persisted unmerged branches. To
simplify the check we distinguish between the two types and for persisted unmerged branch
we keep a set of such rev and first do a lookup there to confirm if rev is part of unmerged
branch before doing actual check
** B - Tracking of branches which are not merged - An unmerged branch state would be persisted
in two cases
*** Client did not merged the branch - In this case we can somehow figure out that a branch
has gone out of scope (possibly via WekReference on DocumentNodeStoreBranch) and would not
be merged. In such a case we know the commits done in that branch and perform a cleanup
*** Oak processes had a sudden exit - In this case branch commit info would be lost and we
would have to resort to GC
** C - Unmerged Rev GC - Once we implement a full GC then such branch state can be collected
in that GC

For now as part of this bug we would implement #C as that should reduce the performance issue
and later we can go for #B and #C

> UnmergedBranch state growing with empty BranchCommit leading to performance degradation
> ---------------------------------------------------------------------------------------
>
>                 Key: OAK-1926
>                 URL: https://issues.apache.org/jira/browse/OAK-1926
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: mongomk
>    Affects Versions: 1.0.1
>            Reporter: Chetan Mehrotra
>            Assignee: Chetan Mehrotra
>             Fix For: 1.1
>
>
> In some cluster deployment cases it has been seen that in memory state of UnmergedBranches
contains large number of empty commits. For e.g. in  one of of the runs there were 750 entries
in the UnmergedBranches and each Branch had empty branch commits.
> If there are large number of UnmergedBranches then read performance would degrade as
for determining revision validity currently logic scans all branches
> Below is some part of UnmergedBranch state
> {noformat}
> Branch 1
> 1 -> br146d2edb7a7-0-1 (true) (revision: "br146d2edb7a7-0-1", clusterId: 1, time:
"2014-06-25 05:08:52.903", branch: true)
> 2 -> br146d2f0450b-0-1 (true) (revision: "br146d2f0450b-0-1", clusterId: 1, time:
"2014-06-25 05:11:40.171", branch: true)
> Branch 2
> 1 -> br146d2ef1d08-0-1 (true) (revision: "br146d2ef1d08-0-1", clusterId: 1, time:
"2014-06-25 05:10:24.392", branch: true)
> Branch 3
> 1 -> br146d2ed26ca-0-1 (true) (revision: "br146d2ed26ca-0-1", clusterId: 1, time:
"2014-06-25 05:08:15.818", branch: true)
> 2 -> br146d2edfd0e-0-1 (true) (revision: "br146d2edfd0e-0-1", clusterId: 1, time:
"2014-06-25 05:09:10.670", branch: true)
> Branch 4
> 1 -> br146d2ecd85b-0-1 (true) (revision: "br146d2ecd85b-0-1", clusterId: 1, time:
"2014-06-25 05:07:55.739", branch: true)
> Branch 5
> 1 -> br146d2ec21a0-0-1 (true) (revision: "br146d2ec21a0-0-1", clusterId: 1, time:
"2014-06-25 05:07:08.960", branch: true)
> 2 -> br146d2ec8eca-0-1 (true) (revision: "br146d2ec8eca-0-1", clusterId: 1, time:
"2014-06-25 05:07:36.906", branch: true)
> Branch 6
> 1 -> br146d2eaf159-1-1 (true) (revision: "br146d2eaf159-1-1", clusterId: 1, time:
"2014-06-25 05:05:51.065", counter: 1, branch: true)
> Branch 7
> 1 -> br146d2e9a513-0-1 (true) (revision: "br146d2e9a513-0-1", clusterId: 1, time:
"2014-06-25 05:04:26.003", branch: true)
> {noformat}
> [~mreutegg] Suggested that these branch might be for those revision which have resulted
in a collision and upon checking it indeed appears to be the case  (value true in brackets
above indicate that). Further given the age of such revision it looks like they get populated
upon startup itself
> *Fix*
> * Need to check why we need to populate the UnermgedBranch
> * Possibly implement some purge job which would remove such stale entries 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message