mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rodolfo Viana <rodolfodelimavi...@gmail.com>
Subject Re: java.lang.OutOfMemoryError with Mahout 0.10 and Spark 1.1.1
Date Tue, 04 Aug 2015 18:58:26 GMT
Thank you Pat, you were right, when I run with Spark 1.3.1 with Mahout 0.10
I didn't get this error.

I’m trying to run Mahout with Spark with 20M, 50M, 1G and 10G.
Anybody have any ideas how many machines with 6G in ram should I configure
with Spark to be able to run this experiment?
So far I configured 3 machines, but I think it will not be enough.



On Tue, Jul 21, 2015 at 1:58 PM, Pat Ferrel <pat@occamsmachete.com> wrote:

> That should be plenty of memory on you executors but is that where you are
> running low? This may be a low heap on your driver/client code.
>
> Increase driver memory by setting MAHOUT_HEAPSIZE=6g or some such when
> launching the driver. I think the default is 4g. If you are using Yarn the
> answer is more complicated.
>
> The code creates a BiMaps for your user and item ids which will grow with
> the size of your total string storage needs, are your ids very long? With
> the default 4g of driver memory and the latest released 0.10.1 (be sure to
> upgrade!) or master-0.11.0-snapshot code I wouldn’t expect to have this
> problem.
>
> The current master mahout-0.11.0-snapshot has better partitioning as
> Dmitriy mentions but it is built for Spark 1.3.1 so not sure if it is
> backward compatible. Some things won’t work but spark-itemsimilarity may be
> ok. Somehow I doubt you are running into a partitioning problem.
>
> On Jul 20, 2015, at 2:04 PM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:
>
> assuming task memory x number of cores does not exceed ~5g, and block cache
> manager ratio does not have some really weird setting, the next best thing
> to look at is initial task split size. I don' think in the release you are
> looking at the driver manages initial off-dfs splits  satisfactorily (that
> is, in any way at all). Basically, you may want smaller splits, more tasks
> than what DFS gives you from the beginning. These apps tend to run a bit
> better when splits do not exceed 100...500k non-zero elements.
>
> I think Pat has done some stop-gap measure on current master for that
> (which i don't believe is a true optimal thing to do though).
>
> On Mon, Jul 20, 2015 at 1:40 PM, Rodolfo Viana <
> rodolfodelimaviana@gmail.com
> > wrote:
>
> > I’m trying to run Mahout 0.10 with Spark 1.1.1.
> > I have input files with 8k, 10M, 20M, 25M.
> >
> > So far I run with the following configuration:
> >
> > 8k with 1,2,3 slaves
> > 10M with 1, 2, 3 slaves
> > 20M with 1,2,3 slaves
> >
> > But when I try to run
> > bin/mahout spark-itemsimilarity --master spark://node1:7077 --input
> > filein.txt --output out --sparkExecutorMem 6g
> >
> > with 25M I got this error:
> >
> > java.lang.OutOfMemoryError: Java heap space
> >
> > or
> >
> > java.lang.OutOfMemoryError: GC overhead limit exceeded
> >
> >
> > Is that normal? Because when I was running 20M I didn’t get any error,
> now
> > I have 5M more.
> >
> > Any ideas why this is happening?
> >
> > --
> > Rodolfo de Lima Viana
> > Undergraduate in Computer Science at UFCG
> >
>
>


-- 
Rodolfo de Lima Viana
Undergraduate in Computer Science at UFCG

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message