Hi Otis,
if you have to handle a very large set of tasks you may be interested
in having a look at the hazelcast distribuded queue. You should be
able to use it with the current version of the trunk since droids-56
has been applied. In such cluster no task will be lost in case of
failure.
Best regards,
Bertil
Sent from my iPad
On Mar 25, 2011, at 9:09 PM, Otis Gospodnetic
<otis_gospodnetic@yahoo.com> wrote:
> Hi,
>
> Somebody (Paul?) mentioned using Droids for doing a 50M page crawl. Anyone else
> using Droids for crawls of that size?
>
> I'm asking because I have a need to do a "semi-vertical" crawl on up to 10K
> domains and I'm considering Droids vs. Nutch. This may translate to several
> times that many different servers - say 100K. And that may translate to a few
> 100M web pages. Too big for Droids without having a persistent link queue,
> right?
>
> Thanks,
> Otis
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
|