mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Juan José Ramos <>
Subject Wiki - 'Quick tour of text analysis using the Mahout command line' clarification
Date Tue, 25 Feb 2014 14:22:59 GMT
In the wiki page: 'Quick tour of text analysis using the Mahout command

At the very bottom it is said that

   1. This will generate the 10 most similar docs to each doc in the

   1. Examine the similarity list:
   mahout seqdumper -i reuters-matrix/matrix | more

Instead of reuters-matrix/matrix, shouldn't it be reuters-similarity/
part-r-00000 since that is the file of the output of rowsimilarity? Or does
on the contrary the rowsimilarity tool also write to reuters-matrix/?

I would expect to contain the 10 most similar documents for every document
in the reuters' catalogue. Is that correct?

Many thanks.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message