nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Piotr Kosiorowski <>
Subject Re: Urlfilter Patch
Date Thu, 01 Dec 2005 21:02:38 GMT
Jérôme Charron wrote:
> build a list of file extensions to include (other ones will be excluded) in
> the fecth process.
I would not like to exclude all others - as for example many extensions 
are valid for html - especially dynamicly generated pages (jsp,asp,cgi 
just to name the easy ones and a lot of custom ones).  But the idea of 
automatically allowing extensions for which plugins are enabled is good 
in my opinion.
Anyway I will try to find my own list of forbidden extensions I prepared 
based on  80mln of urls - I just prepared the list of most common ones 
and went through it manually. I will try to find it over weekend so we 
can combine it with the list discussed in this thread.

View raw message