nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Groschupf ...@media-style.com>
Subject boosting documents matching a url pattern
Date Tue, 22 Mar 2005 22:25:50 GMT
Hi developers,

please find in the jira a plugin that multiply the boosting of a 
document in case it matches a url pattern.
http://issues.apache.org/jira/browse/NUTCH-16

Any comments are very welcome!!

Stefan

P.S. I hope the  patch created with subversion will work correctly, 
since I haven't any experience with subversion yet. ;-/
In any case I can upload plain sources as well. Let me know.



 From the README:
======
The boosting-urlpattern plugin does multiply the document boosting 
value when the document url matching a given regular expression.
This is useful for intranet search engines that will index some 
corresponding web-pages as well,
but the intranet / extranet documents should be higher ranked.
It is useful to rank special kinds of content- types or protocols 
higher as other as well. (local pdf files higher then html pages)


The multiplier and the regular expression should be carefully 
configured in the urlPattern.txt file.
Each line contains one regular expression followed by a space and a 
double value.
The file format does not support comments yet.
Initially you will find 3 example entries, please note that multiplying 
a document boosting with 3 or 2 makes less sense,
try values like 1.1 or 0.9.

To install this plugin just change the your nutch configuration key: 
'plugin.includes'
simply add following to the value:
|boost-urlpattern
====





---------------------------------------------------------------
company:		http://www.media-style.com
forum:		http://www.text-mining.org
blog:			http://www.find23.net


Mime
View raw message