spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ayan guha <guha.a...@gmail.com>
Subject Re: SparkSQL with large result size
Date Mon, 02 May 2016 10:52:27 GMT
How many executors are you running? Is your partition scheme ensures data
is distributed evenly? It is possible that your data is skewed and one of
the executors failing. Maybe you can try reduce per executor memory and
increase partitions.
On 2 May 2016 14:19, "Buntu Dev" <buntudev@gmail.com> wrote:

> I got a 10g limitation on the executors and operating on parquet dataset
> with block size 70M with 200 blocks. I keep hitting the memory limits when
> doing a 'select * from t1 order by c1 limit 1000000' (ie, 1M). It works if
> I limit to say 100k. What are the options to save a large dataset without
> running into memory issues?
>
> Thanks!
>

Mime
View raw message