jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcel Reutegger (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (OAK-2685) Track root state revision when reading the tree
Date Mon, 10 Aug 2015 14:26:45 GMT

     [ https://issues.apache.org/jira/browse/OAK-2685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Marcel Reutegger updated OAK-2685:
    Fix Version/s: 1.0.19

Merged into 1.0 branch: http://svn.apache.org/r1695086

> Track root state revision when reading the tree
> -----------------------------------------------
>                 Key: OAK-2685
>                 URL: https://issues.apache.org/jira/browse/OAK-2685
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: core, mongomk
>            Reporter: Marcel Reutegger
>            Assignee: Marcel Reutegger
>              Labels: performance
>             Fix For: 1.3.0, 1.2.3, 1.0.19
>         Attachments: OAK-2685.patch
> Currently the DocumentNodeState has two revisions:
> - {{getRevision()}} returns the read revision of this node state. This revision was used
to read the node state from the underlying {{NodeDocument}}.
> - {{getLastRevision()}} returns the revision when this node state was last modified.
This revision also reflects changes done further below the tree when the node state was not
directly affected by a change.
> The lastRevision of a state is then used as the read revision of the child node states.
This avoids reading the entire tree again with a different revision after the head revision
changed because of a commit.
> This approach has at least two problems related to comparing node states:
> - It does not work well with the current DiffCache implementation and affects the hit
rate of this cache. The DiffCache is pro-actively populated after a commit. The key for a
diff is a combination of previous and current commit revision and the path. The value then
tells what child nodes were added/removed/changed. As the comparison of node states proceeds
and traverses the tree, the revision of a state may go back in time because the lastRevision
is used as the read revision of the child nodes. This will cause misses in the diff cache,
because the revisions do not match the previous and current commit revisions as used to create
the cache entries. OAK-2562 tried to address this by keeping the read revision for child nodes
at the read revision of the parent in calls of compareAgainstBaseState() when there is a diff
cache hit. However, it turns out node state comparison does not always start at the root state.
The {{EventQueue}} implementation in oak-jcr will start at the paths as indicated by the filter
of the listener. This means, OAK-2562 is not effective in this case and the diff needs to
be calculated again based on a set of revisions, which is different from the original commit.
> - When a diff is calculated for a parent with many child nodes, the {{DocumentNodeStore}}
will perform a query on the underlying {{DocumentStore}} to get child nodes modified after
a given timestamp. This timestamp is derived from the lower revision of the two lastRevisions
of the parent node states to compare. The query gets problematic for the {{DocumentStore}}
if the timestamp is too far in the past. This will happen when the parent node (and sub-tree)
was not modified for some time. E.g. the {{MongoDocumentStore}} has an index on the _id and
the _modified field. But if there are many child nodes the _id index will not be that helpful
and if the timestamp is too far in the past, the _modified index is not selective either.
This problem was already reported in OAK-1970 and linked issues.
> Both of the above problems could be addressed by keeping track of the read revision of
the root node state in each of the node states as the tree is traversed. The revision of the
root state would then be used e.g. to derive the timestamp for the _modified constraint in
the query. Because the revision of the root state is rather recent, the _modified constraint
is very selective and the index on it would be the preferred choice.

This message was sent by Atlassian JIRA

View raw message