nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-2575) protocol-http does not respect the maximum content-size for chunked responses
Date Fri, 11 May 2018 09:55:00 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16471706#comment-16471706
] 

ASF GitHub Bot commented on NUTCH-2575:
---------------------------------------

Omkar20895 commented on issue #327: NUTCH-2575 Storing total number of bytes read after every
chunk
URL: https://github.com/apache/nutch/pull/327#issuecomment-388318227
 
 
   Thank you @sebastian-nagel 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> protocol-http does not respect the maximum content-size for chunked responses
> -----------------------------------------------------------------------------
>
>                 Key: NUTCH-2575
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2575
>             Project: Nutch
>          Issue Type: Sub-task
>          Components: protocol
>    Affects Versions: 1.14
>            Reporter: Gerard Bouchar
>            Priority: Critical
>             Fix For: 1.15
>
>
> There is a bug in HttpResponse::readChunkedContent that prevents it to stop reading content
when it exceeds the maximum allowed size.
> There [is a variable contentBytesRead|https://github.com/apache/nutch/blob/master/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java#L404]
that is used to check how much content has been read, but it is never updated, so it always
stays null, and [the size check|https://github.com/apache/nutch/blob/master/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java#L440-L442]
always returns false (unless a single chunk is larger than the maximum allowed content size).
> This allows any server to cause out-of-memory errors on our size.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message