nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jérôme Charron <>
Subject Hard-coded Content-type checks
Date Tue, 13 Dec 2005 13:24:30 GMT

I would like to remove all the hard-coded content-type checks spread over
all the parse plugins.
In fact, the content-type/plugin-id mapping is now centralized in the
parse-plugin.xml file, and there's no
more needs for the parser to check the content-type.
The basic idea was:
1. The developer has the responsibility to add in the plugin.xml of his
parser the content-type(s) handled.
2. Then, the administrator has the ability to use a parser for any
content-type he wants.
3. The ParserFactory WARN the administrator if a parser is mapped to a
content-type that was not initially designed to handle this content-type
(from the plugin.xml file).
So there is no more needs for hard-coded content-type checks.
That's the administrator responsibility to take care of the
content-type/plugin-id mappings.

For instance, in my use case, I have added the application/xhtml+xml
content-type mapped to the parse-html parser.
But with the actual hard coded content-type check in parse-html, the
parse-html plugin cannot handled the application/xhtml+xml content.

If there is no objection, I will commit these changes in the next hours.




  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message