spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ankit Khettry <justankit2...@gmail.com>
Subject Re: OOM Error
Date Sat, 07 Sep 2019 09:56:54 GMT
Still unable to overcome the error. Attaching some screenshots for
reference.
Following are the configs used:
spark.yarn.max.executor.failures 1000 spark.yarn.driver.memoryOverhead 6g
spark.executor.cores 6 spark.executor.memory 36g
spark.sql.shuffle.partitions 2001 spark.memory.offHeap.size 8g
spark.memory.offHeap.enabled true spark.executor.instances 10
spark.driver.memory 14g spark.yarn.executor.memoryOverhead 10g

Best Regards
Ankit Khettry

On Sat, Sep 7, 2019 at 2:56 PM Chris Teoh <chris.teoh@gmail.com> wrote:

> You can try, consider processing each partition separately if your data is
> heavily skewed when you partition it.
>
> On Sat, 7 Sep 2019, 7:19 pm Ankit Khettry, <justankit2007@gmail.com>
> wrote:
>
>> Thanks Chris
>>
>> Going to try it soon by setting maybe spark.sql.shuffle.partitions to
>> 2001. Also, I was wondering if it would help if I repartition the data by
>> the fields I am using in group by and window operations?
>>
>> Best Regards
>> Ankit Khettry
>>
>> On Sat, 7 Sep, 2019, 1:05 PM Chris Teoh, <chris.teoh@gmail.com> wrote:
>>
>>> Hi Ankit,
>>>
>>> Without looking at the Spark UI and the stages/DAG, I'm guessing you're
>>> running on default number of Spark shuffle partitions.
>>>
>>> If you're seeing a lot of shuffle spill, you likely have to increase the
>>> number of shuffle partitions to accommodate the huge shuffle size.
>>>
>>> I hope that helps
>>> Chris
>>>
>>> On Sat, 7 Sep 2019, 4:18 pm Ankit Khettry, <justankit2007@gmail.com>
>>> wrote:
>>>
>>>> Nope, it's a batch job.
>>>>
>>>> Best Regards
>>>> Ankit Khettry
>>>>
>>>> On Sat, 7 Sep, 2019, 6:52 AM Upasana Sharma, <028upasana820@gmail.com>
>>>> wrote:
>>>>
>>>>> Is it a streaming job?
>>>>>
>>>>> On Sat, Sep 7, 2019, 5:04 AM Ankit Khettry <justankit2007@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I have a Spark job that consists of a large number of Window
>>>>>> operations and hence involves large shuffles. I have roughly 900
GiBs of
>>>>>> data, although I am using a large enough cluster (10 * m5.4xlarge
>>>>>> instances). I am using the following configurations for the job,
although I
>>>>>> have tried various other combinations without any success.
>>>>>>
>>>>>> spark.yarn.driver.memoryOverhead 6g
>>>>>> spark.storage.memoryFraction 0.1
>>>>>> spark.executor.cores 6
>>>>>> spark.executor.memory 36g
>>>>>> spark.memory.offHeap.size 8g
>>>>>> spark.memory.offHeap.enabled true
>>>>>> spark.executor.instances 10
>>>>>> spark.driver.memory 14g
>>>>>> spark.yarn.executor.memoryOverhead 10g
>>>>>>
>>>>>> I keep running into the following OOM error:
>>>>>>
>>>>>> org.apache.spark.memory.SparkOutOfMemoryError: Unable to acquire
>>>>>> 16384 bytes of memory, got 0
>>>>>> at
>>>>>> org.apache.spark.memory.MemoryConsumer.throwOom(MemoryConsumer.java:157)
>>>>>> at
>>>>>> org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:98)
>>>>>> at
>>>>>> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.<init>(UnsafeInMemorySorter.java:128)
>>>>>> at
>>>>>> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.<init>(UnsafeExternalSorter.java:163)
>>>>>>
>>>>>> I see there are a large number of JIRAs in place for similar issues
>>>>>> and a great many of them are even marked resolved.
>>>>>> Can someone guide me as to how to approach this problem? I am using
>>>>>> Databricks Spark 2.4.1.
>>>>>>
>>>>>> Best Regards
>>>>>> Ankit Khettry
>>>>>>
>>>>>

Mime
View raw message