nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrzej Bialecki (JIRA)" <j...@apache.org>
Subject [jira] Commented: (NUTCH-353) pages that serverside forwards will be refetched every time
Date Tue, 03 Feb 2009 13:19:59 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12669947#action_12669947
] 

Andrzej Bialecki  commented on NUTCH-353:
-----------------------------------------

Actually, the problem in the issue description is solved now. I'm closing this one, and the
remaining functionality should be tracked as an enhancement in a separate issue.

> pages that serverside forwards will be refetched every time
> -----------------------------------------------------------
>
>                 Key: NUTCH-353
>                 URL: https://issues.apache.org/jira/browse/NUTCH-353
>             Project: Nutch
>          Issue Type: Bug
>    Affects Versions: 0.8.1, 0.9.0
>            Reporter: Stefan Groschupf
>            Assignee: Andrzej Bialecki 
>             Fix For: 1.0.0
>
>         Attachments: doNotRefecthForwarderPagesV1.patch
>
>
> Pages that do a serverside forward are not written with a status change back into the
crawlDb. Also the nextFetchTime is not changed. 
> This causes a refetch of the same page again and again. The result is nutch is not polite
and refetching the forwarding and target page in each segment iteration. Also it effects the
scoring since the forward page contribute it's score to all outlinks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message