lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Walter Underwood <wun...@wunderwood.org>
Subject Re: tf and very short text fields
Date Tue, 01 Apr 2014 20:17:43 GMT
Thanks! We'll try that out and report back. I keep forgetting that I want to try BM25, so this
is a good excuse.

wunder

On Apr 1, 2014, at 12:30 PM, Markus Jelsma <markus.jelsma@openindex.io> wrote:

> Also, if i remember correctly, k1 set to zero for bm25 automatically omits norms in the
calculation. So thats easy to play with without reindexing.
> 
> 
> Markus Jelsma <markus.jelsma@openindex.io> schreef:Yes, override tfidfsimilarity
and emit 1f in tf(). You can also use bm25 with k1 set to zero in your schema.
> 
> 
> Walter Underwood <wunder@wunderwood.org> schreef:And here is another peculiarity
of short text fields.
> 
> The movie "New York, New York" should not be twice as relevant for the query "new york".
Is there a way to use a binary term frequency rather than a count?
> 
> wunder
> --
> Walter Underwood
> wunder@wunderwood.org
> 
> 
> 

--
Walter Underwood
wunder@wunderwood.org




Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message