nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Groschupf ...@media-style.com>
Subject Re: MapReduce in Nutch
Date Tue, 29 Mar 2005 08:52:17 GMT
> What about running one fetcher on each node 24/7? Each fetcher would
> take segments from a global queue. Other parts of the system do not
> have to wait untill the to-fetch queue is depleted before doing the DB
> update and new segment generation. So basically adding a queue will
> allow pipelining of the time consuming work, namely fetching, db
> update and segment generation. And we will not end up waiting for one
> or two fetchers to finish their job.
>
I agree, may we can get this work by using groups. We can have some 
workers in a fetch group and let them do the fetching.
Beside the fetch group we have the preprocessing group that does the 
rest.

Make that sense?

Stefan


Mime
View raw message