nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From AJ Chen <cano...@gmail.com>
Subject Re: fetch performance
Date Sat, 10 Sep 2005 19:14:08 GMT
Andrzej, Thanks.
A related question: Some of the sites I crawl use https: or redirect to 
https:.  Nutch default setting does not recognize https: as valid url. 
Is there a way to crawl url starting with "https:"?

-AJ


Andrzej Bialecki wrote:

> AJ Chen wrote:
>
>> Hi Andrzej,
>> Thanks for the suggestion. I'm using pdf plugin that
>> comes with nutch from vsn.  Where to get the PDFBox
>> unreleased version 0.7.2 that works for you? 
>
>
> http://www.pdfbox.com/dist
>
> If you are not too familiar with the classpath setting in plugin.xml 
> then it's better to just replace the old JAR with the new one, but 
> keeping the same name as the old JAR.
>


Mime
View raw message