nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sebastian Nagel (Jira)" <>
Subject [jira] [Created] (NUTCH-2776) Fetcher to temporarily deduplicate followed redirects
Date Fri, 20 Mar 2020 18:53:00 GMT
Sebastian Nagel created NUTCH-2776:

             Summary: Fetcher to temporarily deduplicate followed redirects
                 Key: NUTCH-2776
             Project: Nutch
          Issue Type: Improvement
          Components: fetcher
    Affects Versions: 1.16
            Reporter: Sebastian Nagel
             Fix For: 1.17

If fetcher follows redirect (http.redirect.max > 0), it may happen that many redirects
of a site point to the same URL. In this situation, it might be good if fetcher could temporarily
(for a configurable time period) deduplicate the redirect targets and skip all redirects except
the first one. Typical examples of duplicated redirect targets are:
- instead of responding with HTTP status 404:
- a page to accept/decline cookies

This message was sent by Atlassian Jira

View raw message