mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Mannix <jake.man...@gmail.com>
Subject Re: does anyone use the "row label bindings" stuff in Vector / Matrix?
Date Wed, 02 Nov 2011 15:50:13 GMT
Ah, ok, I was looking at an older source tree.  Then in that case, no
*release*
we've had touches them, and nowhere in the codebase does anyone
currently use the bindings, even if it is the case that if you *did* use
them,
they would indeed get serialized with the matrix.

Which is why I was asking the question: does anyone use these, or even
remember they really exist?  I do lots of processing with matrices which
happen to have both row and column labels, but it's really not a terribly
bit hassle to know you have to hang onto a dictionary somehwhere which
translates the ids to labels.  In fact, it's far more often that I find I'm
reusing
the same dictionary again and again as I'm exploring the data set,
rebuilding
the numeric matrix in different ways.  In this case, the dictionaries live
on
HDFS all nice and safe, and I've got a pile of numeric serialized
(DistributedRow-)Matrix instances which all use that dictionary, but don't
replicate it with them.

  -jake

On Wed, Nov 2, 2011 at 8:08 AM, Grant Ingersoll <gsingers@apache.org> wrote:

>
> On Nov 2, 2011, at 10:58 AM, Jake Mannix wrote:
>
> > On Wed, Nov 2, 2011 at 7:34 AM, Grant Ingersoll <gsingers@apache.org>
> wrote:
> >
> >> What functionality, specifically, are you proposing to remove?
> >
> >
> > I'm suggesting we kill, from Matrix.java and descendents, all of the
> > following methods:
> >
> >  Map<String, Integer> getColumnLabelBindings();
> >  Map<String, Integer> getRowLabelBindings();
> >  void setColumnLabelBindings(Map<String, Integer> bindings);
> >  void setRowLabelBindings(Map<String, Integer> bindings);
> >  double get(String rowLabel, String columnLabel);
> >  void set(String rowLabel, String columnLabel, double value);
> >  void set(String rowLabel, String columnLabel, int row, int column,
> double
> > value);
> >  void set(String rowLabel, double[] rowData);
> >  void set(String rowLabel, int row, double[] rowData);
> >
> >
> >> I know we had a lot of discussion around some of this stuff way back
> when
> >> as to how best to do it, but of course, that doesn't mean it has uptake.
> >> If it's on the Matrix, then doesn't it more easily get shipped around
> via
> >> the Writables vs. requiring the user to do that?   Not sure it is an
> issue,
> >> but it's one less piece of code someone else has to write.
> >
> >
> > MatrixWritable does not, in fact, serialize the labels along with the
> > matrix, it turns out.  There are two methods for (de-)serializing them
> > separately:
> >
> >  public static void readLabels(DataInput in,
> >                                Map<String, Integer> columnLabelBindings,
> >                                Map<String, Integer> rowLabelBindings)
> > throws IOException;
> >
> >  public static void writeLabelBindings(DataOutput out,
> >                                        Map<String, Integer>
> > columnLabelBindings,
> >                                        Map<String, Integer>
> > rowLabelBindings) throws IOException;
> >
> > but neither of these are used anywhere in the codebase (even in tests).
> >
>
> writeMatrix calls them.  Line 162 of MatrixWritable.
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message