jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chetan Mehrotra (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (OAK-7122) Implement script to compare lucene indexes logically
Date Fri, 05 Jan 2018 10:25:03 GMT

    [ https://issues.apache.org/jira/browse/OAK-7122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312866#comment-16312866
] 

Chetan Mehrotra edited comment on OAK-7122 at 1/5/18 10:24 AM:
---------------------------------------------------------------

Implemented the script at [1]. Currently it build up the structure in memory. If this proves
to be problamatic for large index can look into building the structure on file system

*Usage*

{code}
java -DindexPath=/path/to/indexing-result/indexes/lucene/data \
	-jar oak-run-*.jar \
	console /path/to/segmentstore \
    ":load https://raw.githubusercontent.com/chetanmeh/oak-console-scripts/master/src/main/groovy/lucene/luceneIndexDumper.groovy"
{code}

[1] https://github.com/chetanmeh/oak-console-scripts/tree/master/src/main/groovy/lucene


was (Author: chetanm):
Implemented the script at [1]. Currently it build up the structure in memory. If this proves
to be problamatic for large index can look into building the structure on file system

[1] https://github.com/chetanmeh/oak-console-scripts/tree/master/src/main/groovy/lucene

> Implement script to compare lucene indexes logically
> ----------------------------------------------------
>
>                 Key: OAK-7122
>                 URL: https://issues.apache.org/jira/browse/OAK-7122
>             Project: Jackrabbit Oak
>          Issue Type: Task
>          Components: run
>            Reporter: Chetan Mehrotra
>            Assignee: Chetan Mehrotra
>             Fix For: 1.8
>
>
> With Document Traversal based indexing we have implemented a newer indexing logic. To
validate that index produced by it is is same as one done by existing indexing flow we need
to implement a script which can enable comparing the index content logically
> This was recently discussed on lucene mailing list [1] and suggestion there was it can
be done by un-inverting the index. So to enable that we need to implement a script which can

> # Open a Lucene index
> # Map the Lucene Document to path of node
> # For each document determine what all fields are associated with it (stored and non
stored)
> # Dump this content in file sorted by path and for each line field name sorted by name
> Then such dumps can be generated for old and new index and compared via simple text diff
> [1] http://lucene.markmail.org/thread/wt22gk6aufs4uz55



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message