lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: Relevancy : Keyword stuffing
Date Mon, 16 Mar 2015 18:52:24 GMT

You should start by checking out the "SweetSpotSimilarity" .. it was 
heavily designed arround the idea of dealing with things like excessively 
verbose titles, and keyword stuffing in summary text ... so you can 
configure your expectation for what a "normal" length doc is, and they 
will be penalized for being longer then that.  similarly you can say what 
a 'resaonable' tf is, and docs that exceed that would't get added boost 
(which in conjunction with teh lengthNorm penality penalizes docs that 
stuff keywords)

https://lucene.apache.org/solr/5_0_0/solr-core/org/apache/solr/search/similarities/SweetSpotSimilarityFactory.html

https://lucene.apache.org/core/5_0_0/misc/org/apache/lucene/misc/doc-files/ss.computeLengthNorm.svg
https://lucene.apache.org/core/5_0_0/misc/org/apache/lucene/misc/doc-files/ss.hyperbolicTf.svg


-Hoss
http://www.lucidworks.com/

Mime
View raw message