lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: negative boosting / analysis?
Date Wed, 02 Jul 2008 18:11:49 GMT

I've never really tackled anything like this, but a few things to 
consider / watch out for are:

1) if a doc *only* matches because of the negated field do you really 
want to consider it a match?  Even in the case of dismax, the 
minNrShouldMatch aspect is going to is going to consider your megation 
field a factor, so you might find documents being considered a match, even 
though they don't contain enough of the input terms in "normal" fields 
because some terms are in the negated field.

2) the coord factor might wind up throwing your scores off in weird ways 
... something that matches on the title, the content, and the negation 
field could wind up scoring higher then something that matches only on 
title and content because of coord.

There's a "BoostingQuery" in the Lucene queries contrib that (in theory) 
helps with some of this by rewriting to a BooleanQuery with a custom coord 
function, but i'v never looked at it closely.

: I'm working on a case where we have review text that may include words that
: describe what the item is *not*.
: 
: Given the text "the kitten is not clean", searching for "clean" should not
: include (at least at the top) the kitten.
: 
: The approach I am considering is to copy the text to a negation field and do
: simple heuristic analysis in a TokenFilter.  This analysis would only keep
: tokens for words that follow "not", then we could add a negative boost for
: this field:
:   title^2 content^1 negation^0.1
: 
: Does this seem like a reasonable approach?  Any other ideas / suggestions /
: pointers?



-Hoss


Mime
View raw message