lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Koji Sekiguchi (Commented) (JIRA)" <>
Subject [jira] [Commented] (LUCENE-3440) FastVectorHighlighter: IDF-weighted terms for ordered fragments
Date Mon, 17 Oct 2011 11:26:11 GMT


Koji Sekiguchi commented on LUCENE-3440:

Hi sebastian,

Frankly, I didn't run the tests because I thought the changes provided with the last patch
shouldn't affect the original behavior.
I'll have a look into it. But this may take some time, due to the fact that I have no knowledge
about the test-framework. 

Ok, no problem. I'll see the test case (hopefully next week or so). But can you take care
of the following to go forward?

Ah, sebastian, I think you needed to check "Grant license to ASF for inclusion in ASF works"
when you attach your patch. Can you remove the latest patches and reattach them with that
flag? Thanks!

> FastVectorHighlighter: IDF-weighted terms for ordered fragments 
> ----------------------------------------------------------------
>                 Key: LUCENE-3440
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/highlighter
>    Affects Versions: 3.5, 4.0
>            Reporter: sebastian L.
>            Priority: Minor
>              Labels: FastVectorHighlighter
>             Fix For: 3.5, 4.0
>         Attachments: LUCENE-3.5-SNAPSHOT-3440-8.patch, LUCENE-3440.patch, LUCENE-4.0-SNAPSHOT-3440-9.patch,
weight-vs-boost_table01.html, weight-vs-boost_table02.html
> The FastVectorHighlighter uses for every term found in a fragment an equal weight, which
causes a higher ranking for fragments with a high number of words or, in the worst case, a
high number of very common words than fragments that contains *all* of the terms used in the
original query. 
> This patch provides ordered fragments with IDF-weighted terms: 
> total weight = total weight + IDF for unique term per fragment * boost of query; 
> The ranking-formula should be the same, or at least similar, to that one used in
> The patch is simple, but it works for us. 
> Some ideas:
> - A better approach would be moving the whole fragments-scoring into a separate class.
> - Switch scoring via parameter 
> - Exact phrases should be given a even better score, regardless if a phrase-query was
executed or not
> - edismax/dismax-parameters pf, ps and pf^boost should be observed and corresponding
fragments should be ranked higher 

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message