spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kürşat Kurt <>
Subject RE: java.lang.OutOfMemoryError: Java heap space
Date Sun, 20 Nov 2016 23:30:06 GMT
Hi Yanbo;


I am using 20g for driver memory and 20g for executor memory (total 64GB, 59 GB free mem).
Is this not enough?

CSV file has product-productGroupName tuples (ex: iphone 6s 64gb black -> iphone 6s 64)

When i remove single tuple sets, line count decreased to ~200.000 and classNames to ~55.000

Yes,  if we look at the average it is small but the other side, important product group names
has 20-30 instances.


From: Yanbo Liang [] 
Sent: Saturday, November 19, 2016 7:43 PM
To: Kürşat Kurt <>
Cc: User <>
Subject: Re: java.lang.OutOfMemoryError: Java heap space


NaiveBayes collects the counting result to driver, and the result's size depends on the number
of distinct classes, so you will get ~1G data at driver in you case.

Could you check the available amount of memory to use for driver process? Try to increase

But I think your training dataset which including ~300.000 set with ~100.000 class is not
reasonable, It means each class will has only three instances on average.





On Fri, Nov 18, 2016 at 4:47 PM, Kürşat Kurt < <>
> wrote:


I am trying to use NaiveBayes for multi-class classification.

I have predefined multi-classification csv file(~300.000 set includes ~100.000 class). When
i try to train(fitting the data after the transformations) getting “java.lang.OutOfMemoryError:
Java heap space” error.

As you can see the below, OOM raising while executing this line : val model =

Any suggestions?



View raw message