mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robin Anil <robin.a...@gmail.com>
Subject Re: Naive Bayes training filling up jobcache
Date Tue, 03 Apr 2012 20:00:54 GMT
which version are you using? bayes.* or naivebayes.*
------
Robin Anil


On Tue, Apr 3, 2012 at 2:26 PM, Stuart Smith <stu24mail@yahoo.com> wrote:

> Hello all,
>
>   I've got Naive Bayes working pretty good. Now I want to train a much
> bigger model. From about 100,000 samples in each category to about a
> million.
>
>
> Everything starts ok - then map/reduce workers keep fill up the jobcache,
> and therefore the disk, and everything grinds to a halt.
>
>
> Granted, it may be more of a hadoop question... but it also seems that
> there's not much you can do about it (posted responses to other people
> include "make sure you have bigger disks" - but I don't...). Also, naive
> bayes is the only task I've run that fills up the jobcache on the
> tasktrackes.. I have 40-50 GB free on the temp dir.. not great, but
> passable.
>
> So, I'm left with wondering:
>
> Is there any tuning I could to do the Naive Bayes Classifier to make it
> use less jobcache space?
>
> Right now, I'm down to running 1 map task on every machine.. even with 5
> it filled up the jobcache. I can also run more, wait for it to fill up &
> crash, then clear the cache out by hand, restart... it recovers and gets
> farther, then crashes, repeat... Not sure which approach is faster at this
> point .. 1 map task per node goes slooow...
>
> Take care,
>   -stu
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message