nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yossi Tamari (JIRA)" <j...@apache.org>
Subject [jira] [Created] (NUTCH-2511) SitemapProcessor limited by http.content.limit
Date Mon, 19 Feb 2018 16:36:00 GMT
Yossi Tamari created NUTCH-2511:
-----------------------------------

             Summary: SitemapProcessor limited by http.content.limit
                 Key: NUTCH-2511
                 URL: https://issues.apache.org/jira/browse/NUTCH-2511
             Project: Nutch
          Issue Type: Bug
          Components: sitemap
    Affects Versions: 1.14
            Reporter: Yossi Tamari


Because SitemapProcessor uses the HTTP protocol plugin, which limits the size of a response
to http.content.limit (64KB by default), it can only handle sitemaps smaller than that size. 

I don't believe that is the intent of the users by setting http.content.limit - they want
to limit document size, not sitemap size. The spec specifically says that sitemaps can be
up to 50MB.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message