jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chetan Mehrotra (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (OAK-6597) rep:excerpt not working for content indexed by aggregation in lucene
Date Fri, 01 Sep 2017 05:06:01 GMT

    [ https://issues.apache.org/jira/browse/OAK-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16150050#comment-16150050

Chetan Mehrotra commented on OAK-6597:

If I'm right that would cause an excerpt like "My fancy <b>text</b> This isn't
so fancy" or even worse without the space: "My fancy <b>text</b>This isn't so
fancy". Wouldn't it make sense to store each and every nested property in its own analyzed
field (full:_jcr_content/text1) or similar?

Yes that would be the case but then thats what aggregation would lead to. Storing each aggregated
field in its own analyzed field would be tricky as it would lead lots of field and further
query would need to be expanded for all such fields. and given such fields names would vary
we would not be able to construct such query easily

> rep:excerpt not working for content indexed by aggregation in lucene
> --------------------------------------------------------------------
>                 Key: OAK-6597
>                 URL: https://issues.apache.org/jira/browse/OAK-6597
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: lucene
>    Affects Versions: 1.6.1, 1.7.6
>            Reporter: Dirk Rudolph
>             Fix For: 1.8
>         Attachments: excerpt-with-aggregation-test.patch
> I mentioned that properties that got indexed due to an aggregation are not considered
for excerpts (highlighting) as they are not indexed as stored fields.
> See the attached patch that implements a test for excerpts in {{LuceneIndexAggregationTest2}}.
> It creates the following structure:
> {code}
> /content/foo [test:Page]
>  + bar (String)
>  - jcr:content [test:PageContent]
>   + bar (String)
> {code}
> where both strings (the _bar_ property at _foo_ and the _bar_ property at _jcr:content_)
contain different text. 
> Afterwards it queries for 2 terms ("tinc*" and "aliq*") that either exist in _/content/foo/bar_
or _/content/foo/jcr:content/bar_ but not in both. For the former one the excerpt is properly
provided for the later one it isn't.

This message was sent by Atlassian JIRA

View raw message