jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chetan Mehrotra (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (OAK-6353) Use Document order traversal for reindexing performed on DocumentNodeStore setups
Date Mon, 18 Dec 2017 06:36:00 GMT

    [ https://issues.apache.org/jira/browse/OAK-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16294556#comment-16294556

Chetan Mehrotra commented on OAK-6353:

With new Document order traversal based indexing significant performance improvements were

For a large repo (255M Mongo Docs, 66M nodes under /content and having 4.2M assets) earlier
indexing completed in 13.66 h. Compared to that document order based indexing completed in
3.469 h. 

With this initial planned implementation is done. Specific issues can later be opened for
further improvements. Possible future enhancements

# Prefetch the previous documents before doing Mongo traversal - This may reduce the time
to resolve the NodeDocument to NodeState
# Mongo query optimizations
## Avoid fetching nodes under hidden paths at all
## Only fetch those documents from Mongo which are under included paths - This can be done
by using javascript function
# Sorting optimization - Sort the batch in memory as nodes are being read and just write the
sorted files

Also documents need to be updated

> Use Document order traversal for reindexing performed on DocumentNodeStore setups
> ---------------------------------------------------------------------------------
>                 Key: OAK-6353
>                 URL: https://issues.apache.org/jira/browse/OAK-6353
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: run
>            Reporter: Chetan Mehrotra
>            Assignee: Chetan Mehrotra
>             Fix For: 1.8
>         Attachments: OAK-6353-v1.patch, OAK-6353-v2.patch
> [~tmueller] suggested [here|https://issues.apache.org/jira/browse/OAK-6246?focusedCommentId=16034442&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16034442]
that document order traversal can be faster compared to current mode of path based traversal.
Initial test indicate that such a traversal can be order of magnitude faster. 
> So this task is meant to implement such an approach and see if it can be a viable indexing
mode used for DocumentNodeStore based setups

This message was sent by Atlassian JIRA

View raw message