mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pat Ferrel <>
Subject Re: java.lang.OutOfMemoryError with Mahout 0.10 and Spark 1.1.1
Date Tue, 04 Aug 2015 23:30:24 GMT
More machines won’t help with memory requirements since they are for the client, the driver
code, even if you use mahout as a library. The amount of storage is proportional to the total
amount needed for your ID strings. How many users, items, and how long are their ID strings?
This total will give you an idea for what the minimum for your client. You will need more
to hold the mapped integers and for the indexes but it will give you an idea.

6G is a lot of string storage.

On Aug 4, 2015, at 11:58 AM, Rodolfo Viana <> wrote:

Thank you Pat, you were right, when I run with Spark 1.3.1 with Mahout 0.10
I didn't get this error.

I’m trying to run Mahout with Spark with 20M, 50M, 1G and 10G.
Anybody have any ideas how many machines with 6G in ram should I configure
with Spark to be able to run this experiment?
So far I configured 3 machines, but I think it will not be enough.

On Tue, Jul 21, 2015 at 1:58 PM, Pat Ferrel <> wrote:

> That should be plenty of memory on you executors but is that where you are
> running low? This may be a low heap on your driver/client code.
> Increase driver memory by setting MAHOUT_HEAPSIZE=6g or some such when
> launching the driver. I think the default is 4g. If you are using Yarn the
> answer is more complicated.
> The code creates a BiMaps for your user and item ids which will grow with
> the size of your total string storage needs, are your ids very long? With
> the default 4g of driver memory and the latest released 0.10.1 (be sure to
> upgrade!) or master-0.11.0-snapshot code I wouldn’t expect to have this
> problem.
> The current master mahout-0.11.0-snapshot has better partitioning as
> Dmitriy mentions but it is built for Spark 1.3.1 so not sure if it is
> backward compatible. Some things won’t work but spark-itemsimilarity may be
> ok. Somehow I doubt you are running into a partitioning problem.
> On Jul 20, 2015, at 2:04 PM, Dmitriy Lyubimov <> wrote:
> assuming task memory x number of cores does not exceed ~5g, and block cache
> manager ratio does not have some really weird setting, the next best thing
> to look at is initial task split size. I don' think in the release you are
> looking at the driver manages initial off-dfs splits  satisfactorily (that
> is, in any way at all). Basically, you may want smaller splits, more tasks
> than what DFS gives you from the beginning. These apps tend to run a bit
> better when splits do not exceed 100...500k non-zero elements.
> I think Pat has done some stop-gap measure on current master for that
> (which i don't believe is a true optimal thing to do though).
> On Mon, Jul 20, 2015 at 1:40 PM, Rodolfo Viana <
>> wrote:
>> I’m trying to run Mahout 0.10 with Spark 1.1.1.
>> I have input files with 8k, 10M, 20M, 25M.
>> So far I run with the following configuration:
>> 8k with 1,2,3 slaves
>> 10M with 1, 2, 3 slaves
>> 20M with 1,2,3 slaves
>> But when I try to run
>> bin/mahout spark-itemsimilarity --master spark://node1:7077 --input
>> filein.txt --output out --sparkExecutorMem 6g
>> with 25M I got this error:
>> java.lang.OutOfMemoryError: Java heap space
>> or
>> java.lang.OutOfMemoryError: GC overhead limit exceeded
>> Is that normal? Because when I was running 20M I didn’t get any error,
> now
>> I have 5M more.
>> Any ideas why this is happening?
>> --
>> Rodolfo de Lima Viana
>> Undergraduate in Computer Science at UFCG

Rodolfo de Lima Viana
Undergraduate in Computer Science at UFCG

View raw message