nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris A. Mattmann (JIRA)" <j...@apache.org>
Subject [jira] Commented: (NUTCH-34) Parsing different content formats
Date Sun, 11 Dec 2005 18:11:08 GMT
    [ http://issues.apache.org/jira/browse/NUTCH-34?page=comments#action_12360147 ] 

Chris A. Mattmann commented on NUTCH-34:
----------------------------------------

Hi Folks,

 Just wondering: is this issue taken care of by NUTCH-88? It would seem at least some elements
of it were (i.e., the single location of parse plugin ordering and such, the ability to have
different parse plugins registered to the same mimeType, with priority, etc.). The only thing
that isn't really handled by NUTCH-88 is the contentLength addition to the plugin.xml file,
etc., but that could (and IMO should) be split into a separate issue.

I recommend closing this issue as the bulk of it was handled by NUTCH-88.

Cheers,
  Chris


> Parsing different content formats
> ---------------------------------
>
>          Key: NUTCH-34
>          URL: http://issues.apache.org/jira/browse/NUTCH-34
>      Project: Nutch
>         Type: Improvement
>   Components: fetcher
>     Reporter: Stephan Strittmatter
>     Priority: Trivial

>
> At the moment Nuch is set up to filter content by config the xml-config file.
> There it is also set global how many bytes are loaded.
> I think it yould be better to let the parser plugins "register" themselfe in some registry
where every plugin could tell the fetcher, that:
> 1. this document type is wanted (because the parser plugin is 
>    installed and activated)
> 2. how much of the content is required (some plugins need the whole 
>    content and some not)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


Mime
View raw message