mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vckay <>
Subject Question Regarding Distributed Row Matrix
Date Thu, 05 May 2011 03:54:41 GMT
Hello all,
  I am trying to create a distributed row matrix of my data which is
currently available as text input with each line supposed to become a line
of the distributed row. I am using the Spectral KMeans code as a way of
understanding how DistributedRowMatrix works and I am sort of confused.
Specifically: Does DistributedRowMatrix require that the SequenceFiles have
the row ID as the "Key" ?
( The Spectral Kmeans code implements that which is easy because their
input's first word has that information. However, since as far as I can see
TextInputFormat just renders a unique byte offset (not necessarily the line
number), I cant recover the line number from my data. Furthermore, suppose I
do change my data to say a bunch of images living in a flat directory, I am
thinking of having "key" being some combination of the file number and this
byte offset. )


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message