nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chirag Chaman" <>
Subject RE: [Nutch-dev] Copy DB by the piece
Date Tue, 28 Jun 2005 16:58:38 GMT
Boost are multiplied into the "match score" (aka. The Idf-tf)

Thus, pages are not soted by boosts, but by the final score.

Here's a example:

You have 3 pages:

- (blog talking about google)

Let's say the boosts factors are 1,2 and 3 respectively.

Now, you do a search for "google".
Let's take the raw scores to be 50,20,15 for the 3 url.

After boosts are applied: - 50 * 1 = 50 - 20 * 2 = 40 - 15 * 3 = 45

Thus, you'll get ranking as



-----Original Message-----
From: Massimo Miccoli [] 
Sent: Tuesday, June 28, 2005 12:51 PM
Subject: Re: [Nutch-dev] Copy DB by the piece


So the boost on top of explain.jsp is for sorting results, the final value
for rank? If so  the Hits on results pages is not ordered by boost.
Because I have in firts positions Hits with low boost.


Chirag Chaman ha scritto:

>The boost gets multiplied at search time.
>This boost has already been applied to the "field norms" -- a good way 
>to confirm is see a field norm that was originally one (URL or anchor 
>is a good
>one) and that should now be higher. A lot of the other fields like
>is way too small be being with to show any difference.
>In shot, if you see the boost on the top of the explain page, it's 
>definitely there in the field norms -- and thus being applied.
>Filangy, Inc.
>We're Improving Search!
>-----Original Message-----
>From: Massimo Miccoli []
>Sent: Tuesday, June 28, 2005 11:21 AM
>Subject: Re: [Nutch-dev] Copy DB by the piece
>Dear Nutch dev,
>I want to know if  the Boost calulated for Pages from inlink count at 
>indexing and fetching time is used on the search.
>Using DistributedSearch seams that Pgae Boost is not used to calculate 
>the ranks for pages. What I see in my result pages is most pages with 
>low page Boost is on top and some with high Boost below.
>For example by explain.jsp:
>1)  boost = 5.3968873 score for query= 50.692223
>2 ) boost = 5.586193   score for query= 46.90389
>3)  boost = 6.0371985 score for query= 43.306103
>4) boost = 7.388178    score for query= 37.984783
>So only the score for query is considered for sort (rank) the hits results?
>For an hits I think that ranks must be boost*score for query or I'm wrong?
>SF.Net email is sponsored by: Discover Easy Linux Migration Strategies 
>from IBM. Find simple to follow Roadmaps, straightforward articles, 
>informative Webcasts and more! Get everything you need to get up to 
>speed, fast.
>Nutch-developers mailing list

View raw message