incubator-droids-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki ...@getopt.org>
Subject Re: Niocchi - java asynchronous crawl library released
Date Sun, 18 Oct 2009 14:45:04 GMT
Lukáš Vlček wrote:
> Hi,
> 
> I just noticed that Niocchi has been released recently.
> http://www.niocchi.com/
> 
> Niocchi is a java asynchronous crawl library implemented with NIO. It is 
> designed to crawl several thousands of hosts in parallel on a single low 
> end server.It is currently being used in production by Enormo 
> <http://www.enormo.com/> to crawl thousands of websites daily, and 
> by Vitalprix <http://www.vitalprix.com/>.

Well, of course we should optimize our use of resources, and we could 
check what this library can offer - but I doubt that optimizations on 
this level would bring significant benefits in terms of increased speed 
of crawling - low-level IO handling is rarely the bottleneck. Most of 
the time the politeness limits (max rate of requests per host) are the 
bottleneck.


-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Mime
View raw message