lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Smiley (JIRA)" <>
Subject [jira] [Commented] (SOLR-6680) DefaultSolrHighlighter can sometimes avoid CachingTokenFilter
Date Mon, 03 Nov 2014 18:35:35 GMT


David Smiley commented on SOLR-6680:

I should point out that the benefit of LUCENE-6033 won't be realized for a multi-valued field
because of the way the offset adjusting works (TermOffsetsTokenStream).  I'm not concerned
with optimizing for this case but should someone else want to take this further then consider
this approach:  Don't wrap the TokenStream from the TermVectors.  Instead, grab all the values
of this field and wrap them in a CharSequence implementation that reads from each value in
sequence.  But Highlighter expects a String for the value; it could be modified to deal with
a CharSequence instead.

> DefaultSolrHighlighter can sometimes avoid CachingTokenFilter
> -------------------------------------------------------------
>                 Key: SOLR-6680
>                 URL:
>             Project: Solr
>          Issue Type: Improvement
>          Components: highlighter
>            Reporter: David Smiley
>            Assignee: David Smiley
>             Fix For: 5.0
>         Attachments: SOLR-6680.patch
> The DefaultSolrHighlighter (the most accurate one) is a bit over-eager to wrap the token
stream in a CachingTokenFilter when hl.usePhraseHighlighter=true.  This wastes memory, and
it interferes with other optimizations -- LUCENE-6034.  Furthermore, the internal TermOffsetsTokenStream
(used when TermVectors are used with this) wasn't properly delegating reset().

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message