mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Taste-GenericItemBasedRecommender
Date Fri, 04 Dec 2009 19:42:36 GMT
If you down-sample during conversion from triples then you never even need
to keep an entire row in memory.  After down-sampling it won't matter.
Moreover, in the actual multiplication, you only pass around individual
summand elements of (A'A) so there is little memory required there as well.
Most of the excess space by emitting all those single elements is removed by
the combiner.  The reducer will also remove elements through
sparsification.  The result is less sparse than the original data, but is
still very sparse.

In the final form of (A'A) it may even be desirable to limit the number of
non-zero elements in a row (which breaks symmetry without much harm).  You
generally need another MR step to do this, but the effect on final
recommendation run-time cost is significant and the effect on recommendation
quality is nil.  This has the effect (in the Lucene form of the recommender)
of limiting the number of terms in each recommendation document.

On Fri, Dec 4, 2009 at 10:04 AM, Jake Mannix <jake.mannix@gmail.com> wrote:

>  sum_i  (v_i cross v_i)
>
> is indeed going to get pretty dense, so the reducer may totally blow
> up if care is not taken - because it's all being done in memory in the
> reducer (ie yes, all of A'A lives in memory just before the reducer
> completes, in my naive impl).
>



-- 
Ted Dunning, CTO
DeepDyve

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message