mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tamas Jambor <>
Subject Re: mapreduce memory issues
Date Wed, 05 May 2010 18:35:11 GMT
I think this must be the issue. But my guess that it is regardless of 
the cluster size, because I tried to change the maximum map/reduce task 
capacity, and it looks that hadoop does not create more tasks for this 
job, even if there are more free slots available.

On 05/05/2010 19:11, Sean Owen wrote:
> I think it's UserVectorToCooccurrenceMapper, which keeps a local count
> of how many times each item has been seen. On a small cluster with a
> few mappers, which see all items, you'd have a count for each item.
> That's still not terrible, but, could take up a fair bit of memory.
> One easy solution is to cap its size and throw out low-count entries sometimes.
> Just to confirm this is the issue, you could hack in this line:
>    private void countSeen(Vector userVector) {
>      if (indexCounts.size()>  1000000) return;
>      ...
> That's not a real solution, but an easy way you could perhaps test for
> everyone whether that's the problem. If that's it i can solve this in
> a more robust way.

View raw message