nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-1919) Getting timeout when server returns Content-Length: 0
Date Fri, 16 Jan 2015 11:52:34 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14280143#comment-14280143
] 

Hudson commented on NUTCH-1919:
-------------------------------

SUCCESS: Integrated in Nutch-trunk #2936 (See [https://builds.apache.org/job/Nutch-trunk/2936/])
(NUTCH-1919) Getting timeout when server returns Content-Length: 0 (jnioche: http://svn.apache.org/viewvc/nutch/trunk/?view=rev&rev=1652391)
* /nutch/trunk/CHANGES.txt
* /nutch/trunk/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java


> Getting timeout when server returns Content-Length: 0 
> ------------------------------------------------------
>
>                 Key: NUTCH-1919
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1919
>             Project: Nutch
>          Issue Type: Bug
>          Components: protocol
>            Reporter: Julien Nioche
>             Fix For: 1.10
>
>         Attachments: NUTCH-1919.patch
>
>
> This has been investigated in fixed in the Storm-Crawler [https://github.com/DigitalPebble/storm-crawler/issues/48].
> {quote}
> curl -I "http://www.dailynewslosangeles.com/"
> HTTP/1.1 301 Moved Permanently
> Location: http://www.dailynews.com
> Connection: close
> Content-Length: 0
> Content-Type: text/html; charset=UTF-8
> {quote}
> when fetching with Nutch we are getting a timeout exception :
> {quote}
> ./nutch parsechecker -D http.agent.name="PebbleCrawler" "http://www.dailynewslosangeles.com/"
> fetching: http://www.dailynewslosangeles.com/
> Fetch failed with protocol status: exception(16), lastModified=0: java.net.SocketTimeoutException:
Read timed out
> {quote}
> The reason for this is that we are trying to read from the stream even though we know
that the content length is 0.
> The patch attached fixes the issue. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message