jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chetan Mehrotra (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (OAK-6353) Use Document order traversal for reindexing performed on DocumentNodeStore setups
Date Wed, 20 Dec 2017 06:04:00 GMT

    [ https://issues.apache.org/jira/browse/OAK-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16297924#comment-16297924
] 

Chetan Mehrotra commented on OAK-6353:
--------------------------------------

Some performance numbers for reindexing done for repo having 255M Mongo Docs, 66M nodes under
/content and having 4.2M assets

# Normal NodeStore traversal - 13.66 h

*Document Traversal*

A - Default setup 

# Total time - 3.469 h
## Time in dumping - 2.405 h
## Time in sorting - 39.87 min
###  Batch sorting - 19.13 min
###  Merging - 20.17
## Indexing 24 mins
# Space consumed
#* dumped json - 43.6 GB
#* chunked files - 43.6 GB
#* index size - 2.5 GB

{noformat}
2017-12-15 16:48:34 Proceeding to index [/oak:index/damAssetLucene2] upto checkpoint head
{} 
2017-12-15 19:12:55 Dumped 65472172 nodestates in json format in 2.405 h 
2017-12-15 19:12:55 Compression enabled while sorting : false (oak.indexer.useZip) 
2017-12-15 19:12:55 Delete original dump from traversal : true (oak.indexer.deleteOriginal)

2017-12-15 19:12:55 Max heap memory (GB) to be used for merge sort : 3 (oak.indexer.maxSortMemoryInGB)

2017-12-15 19:12:57 Sorting with memory 3.2 GB (estimated 12.6 GB) 
2017-12-15 19:32:05 Batch sorting done in 19.13 min with 29 files of size 43.6 GB to merge

2017-12-15 19:32:05 Removing the original file temp/flat-file-store/store.json 
2017-12-15 19:52:50 Merging of sorted files completed in 20.71 min 
2017-12-15 19:52:50 Sorting completed in 39.87 min 
2017-12-15 19:52:50 Estimated node count to be traversed for reindexing under / is [65472172]

2017-12-15 20:16:35 Indexing report
    - /oak:index/damAssetLucene2*(4407265)
2017-12-15 20:16:43 Indexing completed for indexes [/oak:index/damAssetLucene2] in 3.469 h
(12488171 ms) 
{noformat}

B - Compression enabled in sorting

# Total time - 3.811 h
## Time in dumping - 2.929 h
## Time in sorting - 29.56 min
###  Batch sorting - 17.67 min
###  Merging - 11.87 min
## Indexing 24 mins
# Space consumed
#* dumped json - 43.6 GB
#* chunked files - 5.5 GB
#* index size - 2.5 GB

{noformat}
2017-12-19 10:56:00  Proceeding to index [/oak:index/damAssetLucene2] upto checkpoint head
{} 
2017-12-19 13:51:50 oreBuilder - Dumped 65469575 nodestates in json format in 2.929 h (43.6
GB) 
2017-12-19 13:51:50 oreBuilder - Compression enabled while sorting : true (oak.indexer.useZip)

2017-12-19 13:51:50 oreBuilder - Delete original dump from traversal : true (oak.indexer.deleteOriginal)

2017-12-19 13:51:50 oreBuilder - Max heap memory (GB) to be used for merge sort : 3 (oak.indexer.maxSortMemoryInGB)

2017-12-19 13:51:52 Sorter - Sorting with memory 3.2 GB (estimated 12.6 GB) 
2017-12-19 14:09:32 Sorter - Batch sorting done in 17.67 min with 29 files of size 5.5 GB
to merge 
2017-12-19 14:09:32 Sorter - Removing the original file temp/flat-file-store/store.json 
2017-12-19 14:21:25 Sorter - Merging of sorted files completed in 11.87 min 
2017-12-19 14:21:25 Sorter - Sorting completed in 29.56 min 
2017-12-19 14:21:26 Estimated node count to be traversed for reindexing under / is [65469575]

2017-12-19 14:44:30 Indexing report
    - /oak:index/damAssetLucene2*(4407265)
 2017-12-19 14:44:30 Reindexing completed 
2017-12-19 14:44:30 Switched the async lane for indexes at [/oak:index/damAssetLucene2] back
to there original lanes 
2017-12-19 14:44:39 Indexing completed for indexes [/oak:index/damAssetLucene2] in 3.811 h
(13718589 ms)
{noformat}

> Use Document order traversal for reindexing performed on DocumentNodeStore setups
> ---------------------------------------------------------------------------------
>
>                 Key: OAK-6353
>                 URL: https://issues.apache.org/jira/browse/OAK-6353
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: run
>            Reporter: Chetan Mehrotra
>            Assignee: Chetan Mehrotra
>             Fix For: 1.7.13, 1.8
>
>         Attachments: OAK-6353-v1.patch, OAK-6353-v2.patch
>
>
> [~tmueller] suggested [here|https://issues.apache.org/jira/browse/OAK-6246?focusedCommentId=16034442&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16034442]
that document order traversal can be faster compared to current mode of path based traversal.
Initial test indicate that such a traversal can be order of magnitude faster. 
> So this task is meant to implement such an approach and see if it can be a viable indexing
mode used for DocumentNodeStore based setups



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message