mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Mannix <jake.man...@gmail.com>
Subject Re: Taste-GenericItemBasedRecommender
Date Sun, 13 Dec 2009 21:25:40 GMT
On Sun, Dec 13, 2009 at 11:02 AM, Ted Dunning <ted.dunning@gmail.com> wrote:

> The issue is when adding a sparse vector to a dense vector (with the dense
> vector being mutated) that the dense vector doesn't know how to use the
> sparse iterator.
>

What do you mean? AbstractVector.plus already knows this in trunk - it
always calls
iterateNonZero() on the other vector, as well it should, for this
operation.  The problem
with plus() is that it makes a copy, and so this is bad for doing
accumulation.

For accumulation, you want to do assign(Vector v, BinaryFunction f), and
this needs
to be done intelligently.


> Following up on Jake's comment, could we have a marker interface that
> indicates a function maps 0 to 0?  The abstract vector and matrix
> implementations could use instanceOf to decide what to do.
>

I tried this when I was still trying to contribute to commons-math, but
realized
after several patch iterations that it's actually far easier to just do
this:

    Iterator<Element> e = (f.apply(0) == 0) ? iterateNonZero() :
iterateAll();

Even if the method call is expensive, it's only done one extra time.

This is trickier with assign(Vector v, BinaryFunction f), because you have
no
way of testing if f.apply(x, 0) == x for all x, and here a marker interface
might
be necessary.  I'd also be fine with just checking that the function is one
of
Plus or PlusWithScale, because that's a ginormous chunk of the use case.

If we had these cases (or the marker interface which covered it in general),
then yes, assign(Vector v, BinaryFunction f) could be implemented by
checking
which argument of f it was zero preserving w.r.t., and then choosing
iterators
appropriately.  Some of this logic could live in subclasses (ie. not
AbstractVector)
because for example, SparseVectors based on open-map don't iterate in order,
but do have nice random access, while IntDoublePair vectors have their own
specialties.

But this is getting far-afield - these are important optimizations, but I
have a
feeling this isn't what is causing Sean's slowness.  There's something else
I haven't been able to put my finger on...

Maybe we should continue this on mahout-dev?  We're not in user-land
anymore. :)

  -jake

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message