lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doron Cohen <>
Subject Re: search quality - assessment & improvements
Date Tue, 17 Jul 2007 02:47:08 GMT
Chris Hostetter wrote:

> isn't that just a flat line with a slope relative to teh
> specified "Slope"
> ?  your pivot just seems to affect the y-intercept (which would be the
> lengthNorm for field containing 0 terms) but doesn't that cancel out of
> any scoring equation since the fieldNorm is multiplied in for all docs?
> it seems like changing the pivot should affect the raw score values you
> get back, but it doesn't seem like it would have much (if any) effect on
> the relative scores of docs with differnet lengths
> actaully, i must be missing something about your calculation...

An example might help here, consider:
 Slope = 0.1
 Pivot = 900
 text_score(doc1) = 0.22
 text_score(doc2) = 0.20
 length(doc1) = 900
 length(doc2) = 200

The combined scores (multiplying by the result of formula below) would be:
 score(doc1) = 0.0073
 score(doc2) = 0.0069
 (doc1 is "better")

So, although doc1 is longer than doc1, it is not punished
so mach, because it is not longer than the avg.

But if the pivot was 200, combined scores would be:
 score(doc1) = 0.0134
 score(doc2) = 0.0141
 (doc2 is "better")

Here, doc1 was punished (comparing to doc1) and now doc2 is considered
"better" than doc1.

So, 'punishing' long docs gains more effect when their length
is more than the average length. Docs that are shorter than
the avarage length are not drastically boosted for this,
thereby protecting you from preferring erroneous documents
just because they are very short... makes sense?

> ..from what i can tell, your function rewards longer documents without
> bounds ... did you mean: 1/((1 - Slope) * Pivot + (Slope) * Doclen) ?

Yes, actually:  1 / sqrt((1 - Slope) * Pivot + (Slope) * Doclen)

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message