spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yiannis Gkoufas <johngou...@gmail.com>
Subject Re: DataFrame operation on parquet: GC overhead limit exceeded
Date Wed, 18 Mar 2015 15:42:47 GMT
Hi there, I set the executor memory to 8g but it didn't help

On 18 March 2015 at 13:59, Cheng Lian <lian.cs.zju@gmail.com> wrote:

> You should probably increase executor memory by setting
> "spark.executor.memory".
>
> Full list of available configurations can be found here
> http://spark.apache.org/docs/latest/configuration.html
>
> Cheng
>
>
> On 3/18/15 9:15 PM, Yiannis Gkoufas wrote:
>
>> Hi there,
>>
>> I was trying the new DataFrame API with some basic operations on a
>> parquet dataset.
>> I have 7 nodes of 12 cores and 8GB RAM allocated to each worker in a
>> standalone cluster mode.
>> The code is the following:
>>
>> val people = sqlContext.parquetFile("/data.parquet");
>> val res = people.groupBy("name","date").agg(sum("power"),sum("supply")
>> ).take(10);
>> System.out.println(res);
>>
>> The dataset consists of 16 billion entries.
>> The error I get is java.lang.OutOfMemoryError: GC overhead limit exceeded
>>
>> My configuration is:
>>
>> spark.serializer org.apache.spark.serializer.KryoSerializer
>> spark.driver.memory    6g
>> spark.executor.extraJavaOptions -XX:+UseCompressedOops
>> spark.shuffle.manager    sort
>>
>> Any idea how can I workaround this?
>>
>> Thanks a lot
>>
>
>

Mime
View raw message