nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From AJ Chen <>
Subject Re: fetch performance
Date Sat, 10 Sep 2005 22:22:15 GMT
Nutch 0.7 default plugin-includes property does not include 
protocol-httpclient. After it's added, crawling does recognize https 
urls.  Thanks.  However, there are still two kinds of error related to 

(1) NoRouteToHostException.  It occurs very often, for example,

050910 150336 fetching
050910 150336 fetch of failed 
with: java.lang.Exception:
Exception: No route to host: connect

(2) does not recognize https url redirected from http url. It occurs 
very often. for example,

050910 150341 fetch of failed with: 
java.lang.Exception: org.apache.n
utch.protocol.http.HttpException: Not an HTTP 

Any idea what happens?


Andrzej Bialecki wrote:

> AJ Chen wrote:
>> Andrzej, Thanks.
>> A related question: Some of the sites I crawl use https: or redirect 
>> to https:.  Nutch default setting does not recognize https: as valid 
>> url. Is there a way to crawl url starting with "https:"?
> Which version of Nutch? 0.7 recognizes and supports https urls, 
> through the protocol-httpclient plugin.

View raw message