nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dawid Weiss <>
Subject Re: Nutch crawled results for Clustering with Carrot2
Date Thu, 07 May 2009 09:11:44 GMT


You can fetch documents from Nutch indexes (which are Lucene indexes) and then 
feed them to the clustering algorithm directly, as explained in Carrot2 examples 

There are several examples you can choose to start from -- some of them accept 
raw data, some of them use Lucene document source.

If you need ultimate flexibility, go with the raw-data example:


Gaurang Patel wrote:
> Hi all,
> Can anyone know how can I use the nutch crawled results for clustering them
> with Carrot2 clustering engine? What I want is different from Carrot2
> clustering plugin that comes with nutch. I want to write my own code for
> retrieving document list from nutch crawled results, and then want to supply
> this list to the Carrot2 algorithm.
> Any kind of quick help will be appriciated.
> Regards,
> Gaurang

View raw message