lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: About DuplicateFilter
Date Tue, 23 Apr 2019 17:27:34 GMT
How is the score being calculated? Because if it’s the usual scoring algorithm, there will
be very few scores that are exactly identical. And the usual BM25 scores really don’t mean
the documents are “similar”.

This feels like an XY problem. How is “similarity” determined here?

Best,
Erick

> On Apr 22, 2019, at 9:44 PM, kongchao592@163.com wrote:
> 
> Hi!
>    Here I hava some questions about DuplicateFilter.
> I use lucene search news,news contains 'id','title','content','pubtime','score' and so
on.'score' value type is Long,same 'score' means similar news.
> I want to search news filter resultset  just first one when 'score' is same.
> The indexed entity is like bellow(items over 1,000,000,000):
> id
> title
> content
> pubtime
> score
> 1
> title1 
> content1
> 2019-04-23
> 8888
> 2
> title2 
> content2
> 2019-04-23
> 9999
> 3
> title3 
> content3
> 2019-04-23
> 9999
> 4
> title4 
> content4
> 2019-04-23
> 9999
> 5
> title5 
> content5
> 2019-04-23
> 8888
> When I search news, i want the resultset just contains id=1 and id=2,how can i do?please
help me!
> 
> 
> kongchao592@163.com


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message