nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris A. Mattmann (JIRA)" <>
Subject [jira] Commented: (NUTCH-140) Add alias capability in parse-plugins.xml file that allows mimeType->extensionId mapping
Date Sat, 17 Dec 2005 03:11:34 GMT
    [ ] 

Chris A. Mattmann commented on NUTCH-140:

Hey Stefan,

  Mainly, it would be to make them more human readable. Also, if I go in there and define
all the aliases for the parsing plugin extensionIds that currently exist, there will be little
tailoring for the user to have to do out of the box (similar to what I did already for parse-plugins.xml
and how it has most of the mimeTypes in the system in there already out of the box). In my
opinion (and of course, just my opinion, so take it with a grain of salt), I think it's easier
to look at pluginIds such as "parse-html", rather than "org.apache.nutch.parse.html.HtmlParser",
or something like that. It's a lot less characters to type too, ;) Another advantage is that
it wouldn't change the way the system currently works, i.e., there would be no direct impact
on users who are already used to mimeType->List of pluginIds in the parse-plugins.xml file.

Just my two cents.

Take care!


> Add alias capability in parse-plugins.xml file that allows mimeType->extensionId mapping
> ----------------------------------------------------------------------------------------
>          Key: NUTCH-140
>          URL:
>      Project: Nutch
>         Type: Improvement
>   Components: fetcher
>  Environment:  Power Mac OS X 10.4, Dual Processor G5 2.0 Ghz, 1.5 GB RAM, although bug
is independent of environment
>     Reporter: Chris A. Mattmann
>     Assignee: Chris A. Mattmann
>     Priority: Minor

>  Jerome and I have been talking about an idea to address the current issue raised by
Stefan G. about having a mapping of mimeType->list of pluginIds rather than mimeType->list
of extensionIds in the parse-plugins.xml file. We've come up with the following proposed update
that would seemingly fix this problem.
>   We propose to have the concept of "aliases" in the parse-plugins.xml file, defined
at the end of the file, something lie:
>  <parse-plugins>
>     ....
>    <mimeType name="text/html">
>       <plugin id="parse-html"/>
>    </mimeType>
>     .....
>    <aliases>
>    <alias name="parse-html"
> extension-point="org.apache.nutch.parse.html.HtmlParser"/>
>    ....
>    <alias name="parse-html2" extension-point="my.other.html.Parser"/>
>    ....
>    </aliases>
> </parse-plugins>
> What do you guys think? This approach would be flexible enough to allow the mapping of
extensionIds to mimeTypes, but without impacting the current "pluginId" concept.
> Comments welcome. 

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:

View raw message