nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris A. Mattmann (JIRA)" <j...@apache.org>
Subject [jira] Updated: (NUTCH-379) ParseUtil does not pass through the content's URL to the ParserFactory
Date Wed, 04 Oct 2006 17:51:22 GMT
     [ http://issues.apache.org/jira/browse/NUTCH-379?page=all ]

Chris A. Mattmann updated NUTCH-379:
------------------------------------

    Attachment: NUTCH-379.Mattmann.100406.patch.txt

Small patch that at least gets started on fixing the larger issue of content urls and parser
mapping, in that it forwards the content URL (as is expected anyways by the ParserFactory
I/F) to the getParsers method in the ParserFactory

> ParseUtil does not pass through the content's URL to the ParserFactory
> ----------------------------------------------------------------------
>
>                 Key: NUTCH-379
>                 URL: http://issues.apache.org/jira/browse/NUTCH-379
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 0.8, 0.9.0, 0.8.1
>         Environment: Power Mac Dual G5, 2.0 Ghz, although fix is independent of environment
>            Reporter: Chris A. Mattmann
>         Assigned To: Chris A. Mattmann
>             Fix For: 0.8, 0.9.0, 0.8.1, 0.8.2
>
>         Attachments: NUTCH-379.Mattmann.100406.patch.txt
>
>
> Currently the ParseUtil class that is called by the Fetcher to actually perform the parsing
of content does not forward thorugh the content's url for use in the ParserFactory. A bigger
issue, however, is that the url (and for that matter, the pathSuffix) is no longer used to
determine which parsing plugin should be called. My colleague at JPL discovered that more
major bug and will soon input a JIRA issue for it. However, in the meantime, this small patch
at least sets up the forwarding of the content's URL to the ParserFactory.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message