spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cheng Lian <lian.cs....@gmail.com>
Subject Re: DataFrame operation on parquet: GC overhead limit exceeded
Date Wed, 18 Mar 2015 13:59:45 GMT
You should probably increase executor memory by setting 
"spark.executor.memory".

Full list of available configurations can be found here 
http://spark.apache.org/docs/latest/configuration.html

Cheng

On 3/18/15 9:15 PM, Yiannis Gkoufas wrote:
> Hi there,
>
> I was trying the new DataFrame API with some basic operations on a 
> parquet dataset.
> I have 7 nodes of 12 cores and 8GB RAM allocated to each worker in a 
> standalone cluster mode.
> The code is the following:
>
> val people = sqlContext.parquetFile("/data.parquet");
> val res = 
> people.groupBy("name","date").agg(sum("power"),sum("supply")).take(10);
> System.out.println(res);
>
> The dataset consists of 16 billion entries.
> The error I get is java.lang.OutOfMemoryError: GC overhead limit exceeded
>
> My configuration is:
>
> spark.serializer org.apache.spark.serializer.KryoSerializer
> spark.driver.memory    6g
> spark.executor.extraJavaOptions -XX:+UseCompressedOops
> spark.shuffle.manager    sort
>
> Any idea how can I workaround this?
>
> Thanks a lot


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message