jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chetan Mehrotra (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (OAK-6535) Synchronous Lucene Property Indexes
Date Mon, 25 Sep 2017 13:25:00 GMT

    [ https://issues.apache.org/jira/browse/OAK-6535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16179017#comment-16179017
] 

Chetan Mehrotra edited comment on OAK-6535 at 9/25/17 1:24 PM:
---------------------------------------------------------------

This feature is now ready for review

* On github - See [here|https://github.com/chetanmeh/jackrabbit-oak/compare/trunk...chetanmeh:OAK-6535]
* As single patch - See [here|^OAK-6535-v1.diff]
* See [wiki|https://wiki.apache.org/jackrabbit/Synchronous%20Lucene%20Property%20Indexes]
for more background

h2. Implementation Details

*Indexing*
{{LuceneIndexEditor}} now supports a {{PropertyUpdateCallback}} which is invoked for each
indexed property change. For this feature we provide a {{PropertyIndexUpdateCallback}} which
performs the property index update as per property index type. 

For non unique sync index it uses {{ContentMirrorStoreStrategy}} and for unique it uses {{UniqueIndexStoreStrategy}}.
See wiki for storage format

For non unique indexes it disables default pruning

For unique index each index entry also stores a timestamp (as epoch time) in {{jcr:created}}.
Notes its not of type Calendar

*Query*
On query side {{IndexPlanner}} checks if the definition support sync indexes. If yes then
it determine which sync index can be used. For a query only of the sync indexes can be used.
It follows following rule

* If any unique index is found then that is given preference
* If multiple non unique sync indexes are found then first one is used

In case of unique index the entryCount is set to 1 such that this index reports almost lowest
cost.

Post planning the {{LucenePropertyIndex}} would see if planner has identified any sync index.
If yes then it returns a concatenated iterator where iterator provided by property index (via
{{HybridPropertyIndexLookup}}) comes first. 

*Cleanup*

This feature configures a {{PropertyIndexCleaner}} job which gets periodically triggered (default
frequency every 10 min) and does following

# First change the head bucket if there is any change in current head bucket state for non
unique sync index. This is merged
# For non unique sync index cleanup old orphan buckets
# For unique index scan the index entries and remove those index entries whose {{jcr:created}}
is older than lastIndexTo time of indexes indexer lane. That is those entries which have been
moved to lucene index are removed. In doing this it also keeps a threshold which defaults
to 1 hr

*Misc Points*

# Supports relative properties
# Supports non root indexes

h2. Benchmark

The benchmark can be run via

{noformat}
java -DhybridIndexEnabled=true -DindexingMode=nrt -DsyncIndexing=true -jar oak-benchmark*.jar
benchmark  HybridIndexTest Oak-Segment-Tar-DS
{noformat}

Here
* hybridIndexEnabled=true, syncIndexing=true - Enables this feature i.e. 'foo' property indexed
in hybrid mode
* hybridIndexEnabled=true, syncIndexing=false - Enables just the NRT mode
* hybridIndexEnabled=false, syncIndexing=false - Enables pure property index mode

{noformat}
# HybridIndexTest                  C     min     10%     50%     90%     max       N Searcher
 Mutator  Indexed
Oak-Segment-Tar-DS                 1       4       6       7       9     527    7992 5385539
    39400     49890      #nrt,oakCodec,sync
Oak-Segment-Tar-DS                 1       4       6       7      10     114    7462 6834075
    34220     46362      #property
Oak-Segment-Tar-DS                 1       4       5       6       8     508    9063 4439786
    47797     56844      #nrt,oakCodec
numOfIndexes: 10, refreshDeltaMillis: 1000, asyncInterval: 5, queueSize: 1000 , hybridIndexEnabled:
true, indexingMode: nrt, useOakCodec: true, cleanerIntervalInSecs: 10, syncIndexing: true

{noformat}


h2. Pending Stuff

*Open Items*

# Support for nodetype index
# Support for reference index 

*Points to discuss*

Apart from current impl design following aspects needs to be discussed

# Frequency of the cleaner job - Currently it is scheduled to run every 10 mins
# Threshold for unique index cleanup - Currently entries would be removed after 1 hr of them
making into persisted lucene index

[~tmueller] [~catholicon] [~teofili] Please review the patch. I would keep this open for this
week so that you get time. Plan to merge next week


was (Author: chetanm):
This feature is now ready for review

* On github - See [here|https://github.com/chetanmeh/jackrabbit-oak/compare/trunk...chetanmeh:OAK-6535]
* As single patch - See [here|^OAK-6535-v1.diff]
* See [wiki|https://wiki.apache.org/jackrabbit/Synchronous%20Lucene%20Property%20Indexes]
for more background

h3. Implementation Details

*Indexing*
{{LuceneIndexEditor}} now supports a {{PropertyUpdateCallback}} which is invoked for each
indexed property change. For this feature we provide a {{PropertyIndexUpdateCallback}} which
performs the property index update as per property index type. For non unique sync index it
uses {{ContentMirrorStoreStrategy}} and for unique it uses {{UniqueIndexStoreStrategy}}. See
wiki for storage format

For unique index each index entry also stores a timestamp (as epoch time) in {{jcr:created}}.
Notes its not of type Calendar

*Query*
On query side {{IndexPlanner}} checks if the definition support sync indexes. If yes then
it determine which sync index can be used. For a query only of the sync indexes can be used.
It follows following rule

* If any unique index is found then that is given preference
* If multiple non unique sync indexes are found then first one is used

In case of unique index the entryCount is set to 1 such that this index reports almost lowest
cost.

Post planning the {{LucenePropertyIndex}} would see if planner has identified any sync index.
If yes then it returns a concatenated iterator where iterator provided by property index (via
{{HybridPropertyIndexLookup}}) comes first. 

*Cleanup*

This feature configures a {{PropertyIndexCleaner}} job which gets periodically triggered (default
frequency every 10 min) and does following

# First change the head bucket if there is any change in current head bucket state for non
unique sync index. This is merged
# For non unique sync index cleanup old orphan buckets
# For unique index scan the index entries and remove those index entries whose {{jcr:created}}
is older than lastIndexTo time of indexes indexer lane. That is those entries which have been
moved to lucene index are removed. In doing this it also keeps a threshold which defaults
to 1 hr

h3. Benchmark

The benchmark can be run via

{noformat}
java -DhybridIndexEnabled=true -DindexingMode=nrt -DsyncIndexing=true -jar oak-benchmark*.jar
benchmark  HybridIndexTest Oak-Segment-Tar-DS
{noformat}

Here
* hybridIndexEnabled=true, syncIndexing=true - Enables this feature i.e. 'foo' property indexed
in hybrid mode
* hybridIndexEnabled=true, syncIndexing=false - Enables just the NRT mode
* hybridIndexEnabled=false, syncIndexing=false - Enables pure property index mode

{noformat}
# HybridIndexTest                  C     min     10%     50%     90%     max       N Searcher
 Mutator  Indexed
Oak-Segment-Tar-DS                 1       4       6       7       9     527    7992 5385539
    39400     49890      #nrt,oakCodec,sync
Oak-Segment-Tar-DS                 1       4       6       7      10     114    7462 6834075
    34220     46362      #property
Oak-Segment-Tar-DS                 1       4       5       6       8     508    9063 4439786
    47797     56844      #nrt,oakCodec
numOfIndexes: 10, refreshDeltaMillis: 1000, asyncInterval: 5, queueSize: 1000 , hybridIndexEnabled:
true, indexingMode: nrt, useOakCodec: true, cleanerIntervalInSecs: 10, syncIndexing: true

{noformat}


h3. Pending Stuff

*Open Items*

# Support for nodetype index
# Support for reference index 

*Points to discuss*

Apart from current impl design following aspects needs to be discussed

# Frequency of the cleaner job - Currently it is scheduled to run every 10 mins
# Threshold for unique index cleanup - Currently entries would be removed after 1 hr of them
making into persisted lucene index

[~tmueller] [~catholicon] [~teofili] Please review the patch. I would keep this open for this
week so that you get time. Plan to merge next week

> Synchronous Lucene Property Indexes
> -----------------------------------
>
>                 Key: OAK-6535
>                 URL: https://issues.apache.org/jira/browse/OAK-6535
>             Project: Jackrabbit Oak
>          Issue Type: New Feature
>          Components: lucene, property-index
>            Reporter: Chetan Mehrotra
>            Assignee: Chetan Mehrotra
>             Fix For: 1.8
>
>         Attachments: OAK-6535-v1.diff
>
>
> Oak 1.6 added support for Lucene Hybrid Index (OAK-4412). That enables near real time
(NRT) support for Lucene based indexes. It also had a limited support for sync indexes. This
feature aims to improve that to next level and enable support for sync property indexes.
> More details at https://wiki.apache.org/jackrabbit/Synchronous%20Lucene%20Property%20Indexes



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message