lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Smiley (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-4656) Add hl.maxMultiValuedToExamine to limit the number of multiValued entries examined while highlighting
Date Mon, 03 Nov 2014 20:00:35 GMT

    [ https://issues.apache.org/jira/browse/SOLR-4656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195001#comment-14195001
] 

David Smiley commented on SOLR-4656:
------------------------------------

bq. It's a little different sense than maxAnayzedChars in that the unit of measurement is
the number of MV entries rather than the number of characters analyzed, but I could argue
either way.

Sure... but was there per-value overhead involved that was a bit heavy for the particular
client you did this for (i.e. massive number of values) or was it just a matter of not accumulating
value lengths?

bq. Although it sees kind of late to take away this parameter, should we deprecate it instead?

If there are a large number of values, I guess it has some value.

In my last comment to SOLR-6680 I stated I think multi-value handling should be done a bit
differently in which each value should be virtually concatenated/iterated via a CharSequence
wrapper and handed to the highlighter.  Likewise the TokenStreams of each value could be wrapped
into a concatenating wrapper.  If that were done, then I think these parameters would be completely
obsolete as it would handle the case of massive number of values.

I'll create a separate issue to accumulate maxAnalyzedChars per value and exit early.

> Add hl.maxMultiValuedToExamine to limit the number of multiValued entries examined while
highlighting
> -----------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-4656
>                 URL: https://issues.apache.org/jira/browse/SOLR-4656
>             Project: Solr
>          Issue Type: Improvement
>          Components: highlighter
>    Affects Versions: 4.3, Trunk
>            Reporter: Erick Erickson
>            Assignee: Erick Erickson
>            Priority: Minor
>             Fix For: 4.3, Trunk
>
>         Attachments: SOLR-4656-4x.patch, SOLR-4656-4x.patch, SOLR-4656-trunk.patch, SOLR-4656.patch
>
>
> I'm looking at an admittedly pathological case of many, many entries in a multiValued
field, and trying to implement a way to limit the number examined, analogous to maxAnalyzedChars,
see the patch.
> Along the way, I noticed that we do what looks like unnecessary copying of the fields
to be examined. We call Document.getFields, which copies all of the fields and values to the
returned array. Then we copy all of those to another array, converting them to Strings. Then
we actually examine them. a> this doesn't seem very efficient and b> reduces the benefit
from limiting the number of mv values examined.
> So the attached does two things:
> 1> attempts to fix this
> 2> implements hl.maxMultiValuedToExamine
> I'd _really_ love it if someone who knows the highlighting code takes a peek at the fix
to see if I've messed things up, the changes are actually pretty minimal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message