nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dawid Weiss <dawid.we...@cs.put.poznan.pl>
Subject Re: Nutch crawled results for Clustering with Carrot2
Date Thu, 07 May 2009 09:11:44 GMT

Gaurang,

You can fetch documents from Nutch indexes (which are Lucene indexes) and then 
feed them to the clustering algorithm directly, as explained in Carrot2 examples 
here:

http://download.carrot2.org/head/manual/index.html#section.integration

There are several examples you can choose to start from -- some of them accept 
raw data, some of them use Lucene document source.

http://fisheye3.atlassian.com/browse/carrot2/branches/stable/applications/carrot2-examples/src/org/carrot2/examples/clustering

If you need ultimate flexibility, go with the raw-data example:

http://fisheye3.atlassian.com/browse/carrot2/branches/stable/applications/carrot2-examples/src/org/carrot2/examples/clustering/ClusteringDocumentList.java?r=3345

Dawid


Gaurang Patel wrote:
> Hi all,
> 
> Can anyone know how can I use the nutch crawled results for clustering them
> with Carrot2 clustering engine? What I want is different from Carrot2
> clustering plugin that comes with nutch. I want to write my own code for
> retrieving document list from nutch crawled results, and then want to supply
> this list to the Carrot2 algorithm.
> 
> Any kind of quick help will be appriciated.
> 
> Regards,
> Gaurang
> 

Mime
View raw message