nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ken Krugler <kkrugler_li...@transpac.com>
Subject Re: Filtering URLs
Date Tue, 05 May 2009 16:24:41 GMT
>Hi Nutch developers,
>
>Is there any possibility to write some kind of URL Filter that 
>allows just certain URLs to gets fetched? I would like that Nutch is 
>just following some URLs that I allow, whereas seed URLs get further 
>analyzed.

There are already plugins that support URL filtering, which you can 
specify in a number of different ways. See the following plug-ins:

urlfilter-automaton
urlfilter-domain
urlfilter-prefix
urlfilter-regex
urlfilter-suffix
urlfilter-validator

Which one(s) to use depend on your particular goals.

If none of these would work for you, then you can always create a new 
plugin that implements the URLFilter interface.

-- Ken
-- 
Ken Krugler
+1 530-210-6378

Mime
View raw message