nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yossi Tamari (JIRA)" <j...@apache.org>
Subject [jira] [Created] (NUTCH-2716) Response headers are not stored for a compressed response
Date Mon, 06 May 2019 13:18:00 GMT
Yossi Tamari created NUTCH-2716:
-----------------------------------

             Summary: Response headers are not stored for a compressed response
                 Key: NUTCH-2716
                 URL: https://issues.apache.org/jira/browse/NUTCH-2716
             Project: Nutch
          Issue Type: Bug
          Components: protocol
    Affects Versions: 1.15
            Reporter: Yossi Tamari


Even when store.http.headers=true, the HTTP headers are not saved for a gzipped or deflated
response, because they may contain an incorrect content-length header.

This causes WARCExporter to generate "resource" (headreless) entries instead of "response"
entries.

While I can see why reporting the wrong content-encoding and length may be a bug, removing
all the headers is not a fix.

I am not submitting a patch yet since I'm not sure what the best fix is, but I guess the best
patch is to remove those two header lines and store the rest of the headers. If there is no
objection, I can submit a patch that does this. Otherwise, what would be a better fix?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message