nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christian Johnsson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-1448) Redirected urls should be handled more cleanly (more like an outlink url)
Date Sat, 01 Sep 2012 01:14:07 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13446570#comment-13446570
] 

Christian Johnsson commented on NUTCH-1448:
-------------------------------------------

Thank you for the information.
Yes the 1461 is just quick and ugly fix so it doesn't crash. Good to have until it's properly
fixed. Saves allot of time searching for corrupt stuff :-)
                
> Redirected urls should be handled more cleanly (more like an outlink url)
> -------------------------------------------------------------------------
>
>                 Key: NUTCH-1448
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1448
>             Project: Nutch
>          Issue Type: Improvement
>            Reporter: Ferdy Galema
>             Fix For: 2.1
>
>         Attachments: nutch-1448.txt
>
>
> This is specifically for Nutch2.x. Handling a redirects url like an outlink is much more
cleaner because this makes it more simple to trace how new urls are added to the webpage database.
Instant fetching of redirects won't work, but this is a small price to pay. (Note that this
currently does not work at all, because the http.max.redirect property has no effect). Will
be attaching a patch in the upcoming days.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message