nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doğacan Güney (JIRA) <j...@apache.org>
Subject [jira] Closed: (NUTCH-560) protocol-httpclient reading more bytes than http.content.limit
Date Fri, 04 Jan 2008 19:53:34 GMT

     [ https://issues.apache.org/jira/browse/NUTCH-560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Doğacan Güney closed NUTCH-560.
-------------------------------

       Resolution: Fixed
    Fix Version/s: 1.0.0
         Assignee: Doğacan Güney

Fixed as part of NUTCH-559.

> protocol-httpclient reading more bytes than http.content.limit
> --------------------------------------------------------------
>
>                 Key: NUTCH-560
>                 URL: https://issues.apache.org/jira/browse/NUTCH-560
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 0.9.0, 1.0.0
>            Reporter: Joseph M.
>            Assignee: Doğacan Güney
>             Fix For: 1.0.0
>
>
> I modified protocol-httpclient HttpResponse.java to download files to file system. If
I set http.content.limit to 5000... it fetches around 5500 to 6000 bytes instead and downloads
it to file system. There is calculation mistake in calculateTryToRead() function.
> {code}
>         int tryAndRead = calculateTryToRead(totalRead);
>         while ((bufferFilled = in.read(buffer, 0, buffer.length)) != -1 && tryAndRead
> 0) {
>           totalRead += bufferFilled;
>           out.write(buffer, 0, bufferFilled);
>           tryAndRead = calculateTryToRead(totalRead);
>         }{code}
> while loop stops when calculateTryToRead() returns -ve or 0.
>   {code}private int calculateTryToRead(int totalRead) {
>     int tryToRead = Http.BUFFER_SIZE;
>     if (http.getMaxContent() <= 0) {
>       return http.BUFFER_SIZE;
>     } else if (http.getMaxContent() - totalRead < http.BUFFER_SIZE) {
>       tryToRead = http.getMaxContent() - totalRead;
>     }
>     return tryToRead;
>   }{code}
> It is returning -ve when totalRead > http.getMaxContent(). So more bytes than http.content.limit
is read before breaking while loop.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message