lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Leo Galambos <galam...@com-os2.ms.mff.cuni.cz>
Subject Re: Computing Relevancy Differently
Date Sun, 26 Jan 2003 16:56:55 GMT
> What I'd like to do is get a relevancy-based order in which (a) longer
> documents tend to get more weight than shorter ones, (b) a document body
> with 'X' instances of a query term gets a higher ranking than one with fewer
> than 'X' instances. and (c) a term found in the headline (usually in
> addition to finding the same term in the body) is more highly ranked than
> one with the term only in the body.
> 
> But that's not what happens with the default scoring, and I'd like to change
> that.

I am not Lucene developer, but:

1) Lucene uses the Vector model, if you want to use different model you 
must understand what you are doing and you must change similarity 
calculations. AFAIK you would set the normalization factor to a constant 
value (1.0 or so).

2) you are trying to search for DATA, not INFORMATION. It is a big
difference. For your task, you could rather use simpler engine that is
based on RDBMS and B+.

-g-


--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message