lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Jelsma <markus.jel...@openindex.io>
Subject RE: How to Make That Domains Should Be First?
Date Sat, 27 Jul 2013 13:03:25 GMT
Hi - To make this work you'll need a homepage flag and some specific hostname analysis and
function query boosting. I assume you're still using Nutch so getting detecting homepages
is easy using NUTCH-1325. To actually get the homepage flag in Solr you need to modify the
indexer to ingest the HostDB and look for HostDatum values in the reducer and set the homepage
flag there. You can also modify the CrawlDB update tool to read the HostDB so you'll have
the homepage flag in your CrawlDatums.

In Solr you need some analysis on the host field, split it on dots or make NGrams. Then, using
function queries you can conditionally check for the existance of the homepage flag and if
so, do a conditional query using the user's search terms. If you set the operator to AND you'll
make sure the homepage only come at position one if the user only types terms that occur in
the host field. So `wiki spain` won't boost the homepage at all.

Depending on URL length would not be a good idea because it doesn't allow longer hostnames
or redirects if a homepage is not on /.

https://issues.apache.org/jira/browse/NUTCH-1325
 
-----Original message-----
> From:Furkan KAMACI <furkankamaci@gmail.com>
> Sent: Friday 26th July 2013 18:11
> To: solr-user@lucene.apache.org
> Subject: How to Make That Domains Should Be First?
> 
> When I search wikipedia the home page of wikipedia is not at first result:
> 
> http://www.wikipedia.org/
> 
> first result is that:
> 
> http://en.wikipedia.org/wiki/Spain
> 
> How can I say that domains of web sites should be first at SolrCloud? (I
> want something like grouping at domains and boosting at url length )
> 

Mime
View raw message