nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Julien Nioche (JIRA)" <>
Subject [jira] [Created] (NUTCH-1919) Getting timeout when server returns Content-Length: 0
Date Thu, 15 Jan 2015 11:25:34 GMT
Julien Nioche created NUTCH-1919:

             Summary: Getting timeout when server returns Content-Length: 0 
                 Key: NUTCH-1919
             Project: Nutch
          Issue Type: Bug
          Components: protocol
            Reporter: Julien Nioche
             Fix For: 1.10

This has been investigated in fixed in the Storm-Crawler [].

curl -I ""
HTTP/1.1 301 Moved Permanently
Connection: close
Content-Length: 0
Content-Type: text/html; charset=UTF-8

when fetching with Nutch we are getting a timeout exception :

./nutch parsechecker -D"PebbleCrawler" ""
Fetch failed with protocol status: exception(16), lastModified=0:
Read timed out

The reason for this is that we are trying to read from the stream even though we know that
the content length is 0.

The patch attached fixes the issue. 

This message was sent by Atlassian JIRA

View raw message