nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ledio Ago" <>
Subject RE: [Nutch-dev] distributed search
Date Tue, 20 Dec 2005 01:25:22 GMT

Based on what you're saying, this tool splits a fetchlist into several fetchlists
so that we can crawl/fetch the URLs from different fetchers, right??

If so, that's is not what I'm after.  I'm trying to split an existing index
into smaller partitions, so that I can make those partinions searchable from
multiple nutch serchers, distributed search.



-----Original Message-----
From: Rafi Iz []
Sent: Monday, December 19, 2005 4:49 PM
Subject: Re: [Nutch-dev] distributed search

check the next command
FetchListTool (-local | -ndfs <namenode:port>) <db>  <segment_dir> 
[-refetchonly] [-topN N] [-cutoff cutoffscore] [-numFetchers numFetchers] 
[-adddays numDays]

This command call to a function called emitMultipleLists which spit out 
several fetchlists, so that you can fetch across several machines.

bin/nutch ......


>From: Stefan Groschupf <>
>Subject: Re: [Nutch-dev] distributed search
>Date: Tue, 20 Dec 2005 00:38:22 +0100
>>By the way, is there an easy way to split the index I have already  have.
>>I would hate to recrawl all of the 1.9MM URLs again and waste  bandwidth.
>Well I do not know any tool that comes with nutch or a other tool  that 
>does it, may there is one.
>But to write a java class that creates two smaller indexes from one  large 
>is very easy, a hour work maximum.
>Just check any of the existing lucene tutorial, lucene java doc or  the 
>lucene book.
>BTW, Erik Hatcher's book "Lucene in action" is a MUST for all nutch  users. 

Express yourself instantly with MSN Messenger! Download today - it's FREE!

View raw message