mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suneel Marthi <suneel_mar...@yahoo.com>
Subject Re: Wiki - 'Quick tour of text analysis using the Mahout command line' clarification
Date Tue, 25 Feb 2014 15:18:13 GMT
That's a mistake on wiki that needs to be corrected. U r tight it should be the similarity.

Each row would have the 10 most similar docs  for ever doc.



Sent from my iPhone

> On Feb 25, 2014, at 9:22 AM, Juan José Ramos <jjarmos@gmail.com> wrote:
> 
> In the wiki page: 'Quick tour of text analysis using the Mahout command
> line'.
> 
> https://cwiki.apache.org/confluence/display/MAHOUT/Quick+tour+of+text+analysis+using+the+Mahout+command+line
> 
> At the very bottom it is said that
> 
>   1. This will generate the 10 most similar docs to each doc in the
>   collection.
> 
> 
>   1. Examine the similarity list:
>   mahout seqdumper -i reuters-matrix/matrix | more
> 
> 
> Instead of reuters-matrix/matrix, shouldn't it be reuters-similarity/
> part-r-00000 since that is the file of the output of rowsimilarity? Or does
> on the contrary the rowsimilarity tool also write to reuters-matrix/?
> 
> I would expect to contain the 10 most similar documents for every document
> in the reuters' catalogue. Is that correct?
> 
> Many thanks.
> Juanjo.

Mime
View raw message