mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <>
Subject Re: mapreduce memory issues
Date Wed, 05 May 2010 22:00:36 GMT
Depending on the data, this might be the problem.

It isn't uncommon for there to be some item that all users interact with.
 At Veoh, for instance, we had an intro video.  This item will naturally
cooccur with all other items and that makes some row of the cooccurrence
matrix be really big.

This usually needs to be dealt with by down-sampling the perverse item.
 This has good effects in general, but will make some people's heads explode
on philosophical grounds.

My current theory is that Tamas has just such an item in his data and it
isn't the item counter at all.

On Wed, May 5, 2010 at 2:26 PM, Sean Owen <> wrote:

> Besides
> this little lookup map, the worst thing it does is load a whole row of
> the co-occurrence matrix in memory.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message