nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sebastian Nagel (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (NUTCH-2575) protocol-http does not respect the maximum content-size for chunked responses
Date Thu, 10 May 2018 21:04:00 GMT

     [ https://issues.apache.org/jira/browse/NUTCH-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sebastian Nagel resolved NUTCH-2575.
------------------------------------
    Resolution: Fixed

Thanks, [~gbouchar]! Thanks, [~omkar20895]!

Solution confirmed, merged PR.

> protocol-http does not respect the maximum content-size for chunked responses
> -----------------------------------------------------------------------------
>
>                 Key: NUTCH-2575
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2575
>             Project: Nutch
>          Issue Type: Sub-task
>          Components: protocol
>    Affects Versions: 1.14
>            Reporter: Gerard Bouchar
>            Priority: Critical
>             Fix For: 1.15
>
>
> There is a bug in HttpResponse::readChunkedContent that prevents it to stop reading content
when it exceeds the maximum allowed size.
> There [is a variable contentBytesRead|https://github.com/apache/nutch/blob/master/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java#L404]
that is used to check how much content has been read, but it is never updated, so it always
stays null, and [the size check|https://github.com/apache/nutch/blob/master/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java#L440-L442]
always returns false (unless a single chunk is larger than the maximum allowed content size).
> This allows any server to cause out-of-memory errors on our size.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message