spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kürşat Kurt <kur...@kursatkurt.com>
Subject RE: java.lang.OutOfMemoryError: Java heap space
Date Sun, 20 Nov 2016 23:30:06 GMT
Hi Yanbo;

 

I am using 20g for driver memory and 20g for executor memory (total 64GB, 59 GB free mem).
Is this not enough?

CSV file has product-productGroupName tuples (ex: iphone 6s 64gb black -> iphone 6s 64)


When i remove single tuple sets, line count decreased to ~200.000 and classNames to ~55.000

Yes,  if we look at the average it is small but the other side, important product group names
has 20-30 instances.

 

From: Yanbo Liang [mailto:ybliang8@gmail.com] 
Sent: Saturday, November 19, 2016 7:43 PM
To: Kürşat Kurt <kursat@kursatkurt.com>
Cc: User <user@spark.apache.org>
Subject: Re: java.lang.OutOfMemoryError: Java heap space

 

NaiveBayes collects the counting result to driver, and the result's size depends on the number
of distinct classes, so you will get ~1G data at driver in you case.

Could you check the available amount of memory to use for driver process? Try to increase
it.

But I think your training dataset which including ~300.000 set with ~100.000 class is not
reasonable, It means each class will has only three instances on average.

 

Thanks

Yanbo

 

On Fri, Nov 18, 2016 at 4:47 PM, Kürşat Kurt <kursat@kursatkurt.com <mailto:kursat@kursatkurt.com>
> wrote:

Hi;

I am trying to use NaiveBayes for multi-class classification.

I have predefined multi-classification csv file(~300.000 set includes ~100.000 class). When
i try to train(fitting the data after the transformations) getting “java.lang.OutOfMemoryError:
Java heap space” error.

As you can see the below, OOM raising while executing this line : val model = pipeline.fit(train)

Any suggestions?

 



 


Mime
View raw message