jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chetan Mehrotra (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (OAK-3576) Allow custom extension to augment indexed lucene documents
Date Wed, 16 Dec 2015 06:19:46 GMT

    [ https://issues.apache.org/jira/browse/OAK-3576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15059566#comment-15059566
] 

Chetan Mehrotra edited comment on OAK-3576 at 12/16/15 6:19 AM:
----------------------------------------------------------------

[~catholicon] Missed reviewing this new feature due to other on going work. So some late comments
before we consider this as fixed

*SPI Interface*

{code}
public interface IndexFieldProvider {
    Iterable<Field> getAugmentedFields(final String path, final String propertyName,
                                   final NodeState document, final PropertyState property,
                                   final NodeState indexDefinition);
}
{code}

I think going per property is too fine grained. Instead it would be better to just pass in
the NodeState of the root node which was found to be updated. So instead of that this interface
should be like

{code}
    /**
     * Callback to add custom fields for given node
     *
     * @param path path of the document being indexed
     * @param node node being indexed
     * @param indexDefinition nodeState of the index definition node
     *
     * @return fields which are to be added as part of {@link org.apache.lucene.document.Document}
being prepared
     * for node being indexed
     */
    Iterable<Field> getAugmentedFields(String path, NodeState node, NodeState indexDefinition);

    Set<String> getSupportedTypes();
{code}

*How callbacks are selected*

bq. All registered augmentors would get callbacks. It's upto the implementation to decide
to quit early.

Thinking more about it (and you mentioned a similar concern before!) now it would implementation
bit tricky and each implementation would need to figure out how to opt out. And for majority
of cases we do not want these default implementation to mess with indexing logic. So would
be better to add a method {{supportedTypes}} where the implementation can provide a list of
supported nodeTypes for which it wishes to participate. This aligns with the nodeType based
approach used by indexing logic currently. Same rule should be applicable on the query side

This would impact other part of current design. For keeping things fast it would better to
track this providers against the matching {{IndexRule}}

*IndexAugmentorFactory*

We can do away with that and instead provide default noop impls as part of interface itself.
The tracking of multiple implementations can be simplified with use of {{Tracker}}. See {{ConsolidatedCacheStats}}
for an example. Keep stuff consistent with matching {{IndexRule}} and OSGi dynamics would
add some complication. I can take care of this part

*Misc*

* Move the SPI package to {{org.apache.jackrabbit.oak.plugins.index.lucene.spi.fieldprovider}}
and add {{ConsumerType}} annotation. Also add package-info with initial version. This would
ensure that addition of any new class or change in one othe SPI class does not require minor
version update. 
*  Minor nit pick  - Avoid reordering of imports in commit. It causes issue in later merge
to branches!



was (Author: chetanm):
[~catholicon] Missed reviewing this new feature due to other on going work. So some late comments
before we consider this as fixed

*SPI Interface*

{code}
public interface IndexFieldProvider {
    Iterable<Field> getAugmentedFields(final String path, final String propertyName,
                                   final NodeState document, final PropertyState property,
                                   final NodeState indexDefinition);
}
{code}

I think going per property is too fine grained. Instead it would be better to just pass in
the NodeState of the root node which was found to be updated. So instead of that this interface
should be like

{code}
    /**
     * Callback to add custom fields for given node
     *
     * @param path path of the document being indexed
     * @param node node being indexed
     * @param indexDefinition nodeState of the index definition node
     *
     * @return fields which are to be added as part of {@link org.apache.lucene.document.Document}
being prepared
     * for node being indexed
     */
    Iterable<Field> getAugmentedFields(String path, NodeState node, NodeState indexDefinition);

    Set<String> getSupportedTypes();
{code}

*How callbacks are selected*

bq. All registered augmentors would get callbacks. It's upto the implementation to decide
to quit early.

Thinking more about it now it would implementation bit tricky and each implementation would
need to figure out how to opt out. So would be better to add a method {{supportedTypes}} where
the implementation can provide a list of supported nodeTypes for which it wishes to participate.
This aligns with the nodeType based approach used by indexing logic currently. Same rule should
be applicable on the query side

This would impact other part of current design. For keeping things fast it would better to
track this providers against the matching {{IndexRule}}

*IndexAugmentorFactory*

We can do away with that and instead provide default noop impls as part of interface itself.
The tracking of multiple implementations can be simplified with use of {{Tracker}}. See {{ConsolidatedCacheStats}}
for an example. Keep stuff consistent with matching {{IndexRule}} and OSGi dynamics would
add some complication. I can take care of this part

*Misc*

* Move the SPI package to {{org.apache.jackrabbit.oak.plugins.index.lucene.spi.fieldprovider}}
and add {{ConsumerType}} annotation. Also add package-info with initial version. This would
ensure that addition of any new class or change in one othe SPI class does not require minor
version update. 
*  Minor nit pick  - Avoid reordering of imports in commit. It causes issue in later merge
to branches!


> Allow custom extension to augment indexed lucene documents
> ----------------------------------------------------------
>
>                 Key: OAK-3576
>                 URL: https://issues.apache.org/jira/browse/OAK-3576
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: lucene
>            Reporter: Vikas Saurabh
>            Assignee: Vikas Saurabh
>             Fix For: 1.3.13
>
>         Attachments: OAK-3576.jsedding.patch, OAK-3576.wip.patch
>
>
> Following up on http://oak.markmail.org/thread/a53ahsgb3bowtwyq, we should have an extension
point in oak to allow custom code to add fields to documents getting indexed in lucene. We'd
also need to allow extension point to add extra query terms to utilize such augmented fields.
> (cc [~teofili], [~chetanm])



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message