nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@nutch.org>
Subject Re: problems http-client
Date Thu, 05 Jan 2006 21:33:06 GMT
Andrzej Bialecki wrote:
> Hmm... I'm not saying it's flawless, there were surely some mysterious 
> things going on with it. That large crawl you mention, was it with the 
> (recently updated in Nutch) release 3.0? What were the issues?

No, it was in early December, with the previous version.  I don't recall 
the details, but it seemed slower, had a higher error rate, and seemed 
to result in more hung thread incidents.

> The main advantage of protocol-http is that it's so simple that few 
> things can go wrong, but this also means it's relatively 
> unsophisticated, and adding more advanced features could mean a lot of 
> work. Namely, adding support for https, cookies and authentication.

These are all good reasons to use protocol-httpclient.  But if you don't 
need any of those features, protocol-http seems to presently work better.

Perhaps we should get more feedback on the 3.0 version before we make a 
decision?

Doug

Mime
View raw message