lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Duke DAI <duke.dai....@gmail.com>
Subject Re: Standard highlighter returns whole document as a fragment
Date Tue, 11 Aug 2015 15:46:12 GMT
Seems we are encountering same problem. (thread: bug of
highlighter/SimpleSpanFragmenter,
returned longer fragment than expected?)
When debugging, your fragmenter is SimpleSpanFragmenter? isNewFragment()
returns true due to below logic?
boolean isNewFrag = offsetAtt.endOffset() >= (fragmentSize *
currentNumFrags) <---------true
        && (textSize - offsetAtt.endOffset()) >= (fragmentSize >>> 1);
 <----------FALSE

I am pursuing input from the community instead of changing/maintaining code
by myself.

Best regards,
Duke
If not now, when? If not me, who?

On Fri, Aug 7, 2015 at 1:25 AM, Robert Alexander <robalex@gmail.com> wrote:

> Hey everyone,
>
> I ran into an issue with the standard highlighter in 4.10.4 and was hoping
> that someone could help. I'm attempting to fragment a result based on a
> SpanNearQuery. If the words in the query are next to each other, the
> fragmenter will often return one large result containing the entire
> document. If the words are farther apart, it returns fragmetns of the
> expected size.
>
> I have included an example here in a gist link. The sample creates an index
> in RAM and adds a single document. If I search for "ken" within 3 of "lay",
> I see the problem. If I search for "ken" within 3 of "office", the problem
> goes away. If you debug with the lucene source, you'll see that it seems as
> if textFragmenter.isNewFragmetn() never returns true (although I understand
> that this is the user group and not the dev group so this may be of limited
> use).
>
> Are there known issues with the standard highlighter and SpanNear queries?
> I am only using the old highlighter because the FVH doesn't appear to
> handle SpanNear queries at all.
>
> Thanks for the help,
>
> Rob
>
> Sample Gist: https://gist.github.com/robalex/97a005f4ee23c71c48f6
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message