spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: SparkSQL with large result size
Date Mon, 02 May 2016 16:59:44 GMT
That's my interpretation.

On Mon, May 2, 2016 at 9:45 AM, Buntu Dev <buntudev@gmail.com> wrote:

> Thanks Ted, I thought the avg. block size was already low and less than
> the usual 128mb. If I need to reduce it further via parquet.block.size, it
> would mean an increase in the number of blocks and that should increase the
> number of tasks/executors. Is that the correct way to interpret this?
>
> On Mon, May 2, 2016 at 6:21 AM, Ted Yu <yuzhihong@gmail.com> wrote:
>
>> Please consider decreasing block size.
>>
>> Thanks
>>
>> > On May 1, 2016, at 9:19 PM, Buntu Dev <buntudev@gmail.com> wrote:
>> >
>> > I got a 10g limitation on the executors and operating on parquet
>> dataset with block size 70M with 200 blocks. I keep hitting the memory
>> limits when doing a 'select * from t1 order by c1 limit 1000000' (ie, 1M).
>> It works if I limit to say 100k. What are the options to save a large
>> dataset without running into memory issues?
>> >
>> > Thanks!
>>
>
>

Mime
View raw message