nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-2729) protocol-okhttp: fix marking of truncated content
Date Tue, 13 Aug 2019 16:46:00 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16906391#comment-16906391
] 

ASF GitHub Bot commented on NUTCH-2729:
---------------------------------------

sebastian-nagel commented on pull request #462: NUTCH-2729 protocol-okhttp: fix marking of
truncated content
URL: https://github.com/apache/nutch/pull/462
 
 
   - request one byte more than the configured content limit (http.content.limit) to detect
truncations reliably
   - add unit tests for marking of truncations, also for gzip Content-Encoding and chunked
Transfer-Encoding
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> protocol-okhttp: fix marking of truncated content
> -------------------------------------------------
>
>                 Key: NUTCH-2729
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2729
>             Project: Nutch
>          Issue Type: Bug
>          Components: plugin, protocol
>    Affects Versions: 1.15
>            Reporter: Sebastian Nagel
>            Priority: Minor
>             Fix For: 1.16
>
>
> The plugin protocol-okhttp marks content as "truncated" including the reason for the
truncation - content limit or time limit exceeded, network disconnect during fetch.
> The detection of truncation by content limit has one bug: if the fetched content is exactly
the size of the content limit the loop to request more content is exited. It should be continued
by requesting one byte more to reliably detect whether content is truncated or not.
> Note that the Content-Length header cannot be used to determine truncation reliably:
it does not indicate the real content length for compressed or chunked content or it might
be wrong.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Mime
View raw message