incubator-droids-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chapuis Bertil <bchap...@agimem.com>
Subject Re: Droids suitability for a 100M+ page crawl
Date Fri, 25 Mar 2011 21:56:55 GMT
Hi Otis,

if you have to handle a very large set of tasks you may be interested
in having a look at the hazelcast distribuded queue. You should be
able to use it with the current version of the trunk since droids-56
has been applied. In such cluster no task will be lost in case of
failure.

Best regards,

Bertil

Sent from my iPad

On Mar 25, 2011, at 9:09 PM, Otis Gospodnetic
<otis_gospodnetic@yahoo.com> wrote:

> Hi,
>
> Somebody (Paul?) mentioned using Droids for doing a 50M page crawl.  Anyone else
> using Droids for crawls of that size?
>
> I'm asking because I have a need to do a "semi-vertical" crawl on up to 10K
> domains and I'm considering Droids vs. Nutch.  This may translate to several
> times that many different servers - say 100K.  And that may translate to a few
> 100M web pages.  Too big for Droids without having a persistent link queue,
> right?
>
> Thanks,
> Otis
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>

Mime
View raw message