spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ankit Khettry <justankit2...@gmail.com>
Subject Re: OOM Error
Date Sat, 07 Sep 2019 13:56:02 GMT
Sure folks, will try later today!

Best Regards
Ankit Khettry

On Sat, 7 Sep, 2019, 6:56 PM Sunil Kalra, <suneel.kalra@gmail.com> wrote:

> Ankit
>
> Can you try reducing number of cores or increasing memory. Because with
> below configuration your each core is getting ~3.5 GB. Otherwise your data
> is skewed, that one of cores is getting too much data based key.
>
> spark.executor.cores 6 spark.executor.memory 36g
>
> On Sat, Sep 7, 2019 at 6:35 AM Chris Teoh <chris.teoh@gmail.com> wrote:
>
>> It says you have 3811 tasks in earlier stages and you're going down to
>> 2001 partitions, that would make it more memory intensive. I'm guessing the
>> default spark shuffle partition was 200 so that would have failed. Go for
>> higher number, maybe even higher than 3811. What was your shuffle write
>> from stage 7 and shuffle read from stage 8?
>>
>> On Sat, 7 Sep 2019, 7:57 pm Ankit Khettry, <justankit2007@gmail.com>
>> wrote:
>>
>>> Still unable to overcome the error. Attaching some screenshots for
>>> reference.
>>> Following are the configs used:
>>> spark.yarn.max.executor.failures 1000 spark.yarn.driver.memoryOverhead
>>> 6g spark.executor.cores 6 spark.executor.memory 36g
>>> spark.sql.shuffle.partitions 2001 spark.memory.offHeap.size 8g
>>> spark.memory.offHeap.enabled true spark.executor.instances 10
>>> spark.driver.memory 14g spark.yarn.executor.memoryOverhead 10g
>>>
>>> Best Regards
>>> Ankit Khettry
>>>
>>> On Sat, Sep 7, 2019 at 2:56 PM Chris Teoh <chris.teoh@gmail.com> wrote:
>>>
>>>> You can try, consider processing each partition separately if your data
>>>> is heavily skewed when you partition it.
>>>>
>>>> On Sat, 7 Sep 2019, 7:19 pm Ankit Khettry, <justankit2007@gmail.com>
>>>> wrote:
>>>>
>>>>> Thanks Chris
>>>>>
>>>>> Going to try it soon by setting maybe spark.sql.shuffle.partitions to
>>>>> 2001. Also, I was wondering if it would help if I repartition the data
by
>>>>> the fields I am using in group by and window operations?
>>>>>
>>>>> Best Regards
>>>>> Ankit Khettry
>>>>>
>>>>> On Sat, 7 Sep, 2019, 1:05 PM Chris Teoh, <chris.teoh@gmail.com>
wrote:
>>>>>
>>>>>> Hi Ankit,
>>>>>>
>>>>>> Without looking at the Spark UI and the stages/DAG, I'm guessing
>>>>>> you're running on default number of Spark shuffle partitions.
>>>>>>
>>>>>> If you're seeing a lot of shuffle spill, you likely have to increase
>>>>>> the number of shuffle partitions to accommodate the huge shuffle
size.
>>>>>>
>>>>>> I hope that helps
>>>>>> Chris
>>>>>>
>>>>>> On Sat, 7 Sep 2019, 4:18 pm Ankit Khettry, <justankit2007@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Nope, it's a batch job.
>>>>>>>
>>>>>>> Best Regards
>>>>>>> Ankit Khettry
>>>>>>>
>>>>>>> On Sat, 7 Sep, 2019, 6:52 AM Upasana Sharma, <
>>>>>>> 028upasana820@gmail.com> wrote:
>>>>>>>
>>>>>>>> Is it a streaming job?
>>>>>>>>
>>>>>>>> On Sat, Sep 7, 2019, 5:04 AM Ankit Khettry <justankit2007@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I have a Spark job that consists of a large number of
Window
>>>>>>>>> operations and hence involves large shuffles. I have
roughly 900 GiBs of
>>>>>>>>> data, although I am using a large enough cluster (10
* m5.4xlarge
>>>>>>>>> instances). I am using the following configurations for
the job, although I
>>>>>>>>> have tried various other combinations without any success.
>>>>>>>>>
>>>>>>>>> spark.yarn.driver.memoryOverhead 6g
>>>>>>>>> spark.storage.memoryFraction 0.1
>>>>>>>>> spark.executor.cores 6
>>>>>>>>> spark.executor.memory 36g
>>>>>>>>> spark.memory.offHeap.size 8g
>>>>>>>>> spark.memory.offHeap.enabled true
>>>>>>>>> spark.executor.instances 10
>>>>>>>>> spark.driver.memory 14g
>>>>>>>>> spark.yarn.executor.memoryOverhead 10g
>>>>>>>>>
>>>>>>>>> I keep running into the following OOM error:
>>>>>>>>>
>>>>>>>>> org.apache.spark.memory.SparkOutOfMemoryError: Unable
to acquire
>>>>>>>>> 16384 bytes of memory, got 0
>>>>>>>>> at
>>>>>>>>> org.apache.spark.memory.MemoryConsumer.throwOom(MemoryConsumer.java:157)
>>>>>>>>> at
>>>>>>>>> org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:98)
>>>>>>>>> at
>>>>>>>>> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.<init>(UnsafeInMemorySorter.java:128)
>>>>>>>>> at
>>>>>>>>> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.<init>(UnsafeExternalSorter.java:163)
>>>>>>>>>
>>>>>>>>> I see there are a large number of JIRAs in place for
similar
>>>>>>>>> issues and a great many of them are even marked resolved.
>>>>>>>>> Can someone guide me as to how to approach this problem?
I am
>>>>>>>>> using Databricks Spark 2.4.1.
>>>>>>>>>
>>>>>>>>> Best Regards
>>>>>>>>> Ankit Khettry
>>>>>>>>>
>>>>>>>>

Mime
View raw message