mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suneel Marthi <>
Subject Re: Converting to sequence file in mahout
Date Fri, 23 May 2014 08:55:51 GMT
The input needs to be converted to a sequencefile of vectors in order to be
processed by Mahout's pipeline. This has been asked a few times recently
and search for Kevin Moulart's recent posts for doing this in the mail

 The converted vectors are then fed to RowIdJob with output matrix and
docIndex, then feed the matrix (which is a DRM) to RowSimilarityJob.

On Fri, May 23, 2014 at 1:31 AM, jamal sasha <> wrote:

> Hi,
>    I have data where each row is comma seperated vector...
> And these are bunch of text
> 0.123,01433,0.932
> 0.129,0.932,0.123
> And I want to run's mahout rowIdSimilarity module on it.. butI am guessing
> the input requirement is different.
> How do I convert this csv vectors into format consumed by mahout
> rowIdSimilarity module?
> Thanks

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message