lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <>
Subject RE: Relevance boosting with the aid of semantic markup
Date Fri, 07 Dec 2001 18:02:32 GMT
> From: Halácsy Péter []
> Doug Cutting wrote:
> >
> >I made a proposal a while back which could also be used to 
> achieve this.  It
> >is not the most elegant solution, but a solution nonetheless.
> >
> Why do you say this is not elegant?

Repeating terms to emphasize them is not as elegant as specifying a weight.
Unfortunately, Lucene uses frequency information to parse proximity
information, so frequencies cannot be tweaked without adding positions, or
else changing the index format.

> Why can't we store some value of each word. If I could index 
> the stems 
> of the words as well, I gave lower value to them.
> I know a Russion search engine that uses 3 (or 4 I don't remember) 
> distinct value to classify each term in the index:
> 1. original word
> 2. stem
> 3. spam
> The priority of the terms is calculated at indexing time and used for 
> ranking.

Would such weighting be per word, or per word occurence?  Earlier you were
asking for the ability to separately weight word occurences, e.g. to boost
them if they are emphasized in the text.  That was what I was responding to.

If the desire is rather to boost all occurences of certain words, that is
much simpler.  One can simply do this at query time by setting a TermQuery
boost.  If one stores different types of terms in different fields, then the
boost can be set by field.


To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message