lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <>
Subject Re: reoot site query results
Date Mon, 06 Dec 2004 17:10:04 GMT
In web search, link information helps greatly.  (This was Google's big 
discovery.)  There are lots more links that point to than to, and 
many (if not most) of these links have the term "slashdot", while links 
to are somewhat less likely to contain 
the term "slashdot".

As Erik hinted, Nutch uses this information.  It keeps has a database of 
links that point to each page, indexes their anchor text along with the 
page, and boosts highly linked pages more than lesser linked pages.


Chris Fraschetti wrote:
> My lucene implementation works great, its basically an index of many
> web crawls. The main thing my users complain about is say a search for
> "slashdot" will return the
> as the top result
> because the factors i have scoring it determine it as so... but
> obviously in true search engine fashion.. i would like
> to be the very top result... i've added a
> boost to queries that match the hostname field, which helped a little,
> but obviously not a proper solution. Does anyone out there in the
> search engine world have a good schema for determining root websites
> and applying a huge boost to them in one fashion or another? mainly so
> it appears before any sub pages? (assuming the query is in reference
> to that site) ...

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message