nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From AJ Chen <>
Subject Re: fetch performance
Date Sat, 10 Sep 2005 19:14:08 GMT
Andrzej, Thanks.
A related question: Some of the sites I crawl use https: or redirect to 
https:.  Nutch default setting does not recognize https: as valid url. 
Is there a way to crawl url starting with "https:"?


Andrzej Bialecki wrote:

> AJ Chen wrote:
>> Hi Andrzej,
>> Thanks for the suggestion. I'm using pdf plugin that
>> comes with nutch from vsn.  Where to get the PDFBox
>> unreleased version 0.7.2 that works for you? 
> If you are not too familiar with the classpath setting in plugin.xml 
> then it's better to just replace the old JAR with the new one, but 
> keeping the same name as the old JAR.

View raw message