nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <>
Subject Re: Urlfilter Patch
Date Thu, 01 Dec 2005 21:11:09 GMT
Jérôme Charron wrote:
> For consistency purpose, and easy of nutch management, why not filtering the
> extensions based on the activated plugins?
> By looking at the mime-types defined in the parse-plugins.xml file and the
> activated plugins, we know which content-types will be parsed.
> So, by getting the file extensions associated to each content-type, we can
> build a list of file extensions to include (other ones will be excluded) in
> the fecth process.
> No?

What about a site that develops a content system that has urls that end 
in .foo, which we would exclude, even though they return html?


View raw message