I have a Spark job that consists of a large number of Window operations and hence involves large shuffles. I have roughly 900 GiBs of data, although I am using a large enough cluster (10 * m5.4xlarge instances). I am using the following configurations for the job, although I have tried various other combinations without any success.
I keep running into the following OOM error:
org.apache.spark.memory.SparkOutOfMemoryError: Unable to acquire 16384 bytes of memory, got 0
I see there are a large number of JIRAs in place for similar issues and a great many of them are even marked resolved.
Can someone guide me as to how to approach this problem? I am using Databricks Spark 2.4.1.