lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki>
Subject Re: Similarity.lengthNorm and positionIncrement=0
Date Mon, 13 Oct 2008 05:08:43 GMT
Michael McCandless wrote:
> I agree we should make this possible.  A field should not be "penalized" 
> just because many of its terms had synonyms.
> In your proposed method addition to Similarity, below, 
> numOverlappingTokens would count the number of tokens that had 
> positionIncrement==0?  And then that default impl is fully backwards 
> compatible since it falls back to the current approach of counting the 
> overlapping tokens when computing lengthNorm?

Yes, and yes.

> Maybe in 3.0 we should then switch it to not count overlapping tokens by 
> default.

I'm not sure. There are good arguments for and against it, that's why I 
suggested adding it as an option.

If a typical usecase is to submit queries with multiple synonyms, then 
the current method works better, because it prevents excessive score 
boosting from multiple matching synonyms. OTOH, if a typical usecase is 
that users submit queries consisting of a single synonym, then the 
proposed method works better.

I'll create a JIRA issue and prepare a patch.

Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration  Contact: info at sigram dot com

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message