lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-3234) Provide limit on phrase analysis in FastVectorHighlighter
Date Thu, 23 Jun 2011 21:32:47 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13054114#comment-13054114
] 

Robert Muir commented on LUCENE-3234:
-------------------------------------

You can change it if you don't mind. However, I think I agree it would be good to figure out
if there is an n^2 here. This might have some affect on what the default value should be...
ideally there is some way we could fix the n^2.

Is there a way to turn your test case into a benchmark, or do you have a separate benchmark
(the example you mentioned where it blows up really bad). This could help in looking at what's
going on.


> Provide limit on phrase analysis in FastVectorHighlighter
> ---------------------------------------------------------
>
>                 Key: LUCENE-3234
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3234
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Mike Sokolov
>         Attachments: LUCENE-3234.patch
>
>
> With larger documents, FVH can spend a lot of time trying to find the best-scoring snippet
as it examines every possible phrase formed from matching terms in the document.  If one is
willing to accept
> less-than-perfect scoring by limiting the number of phrases that are examined, substantial
speedups are possible.  This is analogous to the Highlighter limit on the number of characters
to analyze.
> The patch includes an artifical test case that shows > 1000x speedup.  In a more normal
test environment, with English documents and random queries, I am seeing speedups of around
3-10x when setting phraseLimit=1, which has the effect of selecting the first possible snippet
in the document.  Most of our sites operate in this way (just show the first snippet), so
this would be a big win for us.
> With phraseLimit = -1, you get the existing FVH behavior. At larger values of phraseLimit,
you may not get substantial speedup in the normal case, but you do get the benefit of protection
against blow-up in pathological cases.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message