spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Teoh <chris.t...@gmail.com>
Subject Re: OOM Error
Date Sat, 07 Sep 2019 07:35:45 GMT
Hi Ankit,

Without looking at the Spark UI and the stages/DAG, I'm guessing you're
running on default number of Spark shuffle partitions.

If you're seeing a lot of shuffle spill, you likely have to increase the
number of shuffle partitions to accommodate the huge shuffle size.

I hope that helps
Chris

On Sat, 7 Sep 2019, 4:18 pm Ankit Khettry, <justankit2007@gmail.com> wrote:

> Nope, it's a batch job.
>
> Best Regards
> Ankit Khettry
>
> On Sat, 7 Sep, 2019, 6:52 AM Upasana Sharma, <028upasana820@gmail.com>
> wrote:
>
>> Is it a streaming job?
>>
>> On Sat, Sep 7, 2019, 5:04 AM Ankit Khettry <justankit2007@gmail.com>
>> wrote:
>>
>>> I have a Spark job that consists of a large number of Window operations
>>> and hence involves large shuffles. I have roughly 900 GiBs of data,
>>> although I am using a large enough cluster (10 * m5.4xlarge instances). I
>>> am using the following configurations for the job, although I have tried
>>> various other combinations without any success.
>>>
>>> spark.yarn.driver.memoryOverhead 6g
>>> spark.storage.memoryFraction 0.1
>>> spark.executor.cores 6
>>> spark.executor.memory 36g
>>> spark.memory.offHeap.size 8g
>>> spark.memory.offHeap.enabled true
>>> spark.executor.instances 10
>>> spark.driver.memory 14g
>>> spark.yarn.executor.memoryOverhead 10g
>>>
>>> I keep running into the following OOM error:
>>>
>>> org.apache.spark.memory.SparkOutOfMemoryError: Unable to acquire 16384
>>> bytes of memory, got 0
>>> at
>>> org.apache.spark.memory.MemoryConsumer.throwOom(MemoryConsumer.java:157)
>>> at
>>> org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:98)
>>> at
>>> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.<init>(UnsafeInMemorySorter.java:128)
>>> at
>>> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.<init>(UnsafeExternalSorter.java:163)
>>>
>>> I see there are a large number of JIRAs in place for similar issues and
>>> a great many of them are even marked resolved.
>>> Can someone guide me as to how to approach this problem? I am using
>>> Databricks Spark 2.4.1.
>>>
>>> Best Regards
>>> Ankit Khettry
>>>
>>

Mime
View raw message