nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Otis Gospodnetic (JIRA)" <j...@apache.org>
Subject [jira] Commented: (NUTCH-359) extraction of links will fail for whole page if one single link cannot be parsed
Date Fri, 08 Sep 2006 04:56:23 GMT
    [ http://issues.apache.org/jira/browse/NUTCH-359?page=comments#action_12433315 ] 
            
Otis Gospodnetic commented on NUTCH-359:
----------------------------------------

Looks fine and simple (and has a small typo in the last comment).  Sami is doing 0.8.1 soon,
so I won't mess with this now.

> extraction of links will fail for whole page if one single link cannot be parsed
> --------------------------------------------------------------------------------
>
>                 Key: NUTCH-359
>                 URL: http://issues.apache.org/jira/browse/NUTCH-359
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 0.8
>         Environment: Ubuntu Dapper
>            Reporter: Renaud Richardet
>            Priority: Minor
>         Attachments: outlink.diff
>
>
> When Nutch parses the outlinks of a fetched page, the process will fail if a single link
cannot be parsed (e.g. java.net.MalformedURLException: unknown protocol). The attached patch
will keep indexing the remaining links on that page even if one fails.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message