spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From javaguy Java <javagu...@gmail.com>
Subject Re: OutOfMemoryError
Date Tue, 06 Jul 2021 07:09:01 GMT
Hi Sean, thx for the tip.  I'm just running my app via spark-submit on CLI
ie >spark-submit --class X --master local[*] assembly.jar so I'll now add
to CLI args ie: spark-submit --class X --master local[*]
--driver-memory 8g assembly.jar
etc.

Unless I have this wrong?

Thx


On Thu, Jul 1, 2021 at 1:43 PM Sean Owen <srowen@gmail.com> wrote:

> You need to set driver memory before the driver starts, on the CLI or
> however you run your app, not in the app itself. By the time the driver
> starts to run your app, its heap is already set.
>
> On Thu, Jul 1, 2021 at 12:10 AM javaguy Java <javaguy44@gmail.com> wrote:
>
>> Hi,
>>
>> I'm getting Java OOM errors even though I'm setting my driver memory to
>> 24g and I'm executing against local[*]
>>
>> I was wondering if anyone can give me any insight.  The server this job is running
on has more than enough memory as does the spark driver.
>>
>> The final result does write 3 csv files that are 300MB each so there's no way its
coming close to the 24g
>>
>> From the OOM, I don't know about the internals of Spark itself to tell me where this
is failing + how I should refactor or change anything
>>
>> Would appreciate any advice on how I can resolve
>>
>> Thx
>>
>>
>> Parameters here:
>>
>> val spark = SparkSession
>>   .builder
>>   .master("local[*]")
>>   .appName("OOM")
>>   .config("spark.driver.host", "localhost")
>>   .config("spark.driver.maxResultSize", "0")
>>   .config("spark.sql.caseSensitive", "false")
>>   .config("spark.sql.adaptive.enabled", "true")
>>   .config("spark.sql.adaptive.coalescePartitions.enabled", "true")
>>   .config("spark.driver.memory", "24g")
>>   .getOrCreate()
>>
>>
>> My OOM errors are below:
>>
>> driver): java.lang.OutOfMemoryError: Java heap space
>> 	at java.io.BufferedOutputStream.<init>(BufferedOutputStream.java:76)
>> 	at org.apache.spark.storage.DiskBlockObjectWriter$ManualCloseBufferedOutputStream$1.<init>(DiskBlockObjectWriter.scala:109)
>> 	at org.apache.spark.storage.DiskBlockObjectWriter.initialize(DiskBlockObjectWriter.scala:110)
>> 	at org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:118)
>> 	at org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:245)
>> 	at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:158)
>> 	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
>> 	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
>> 	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
>> 	at org.apache.spark.scheduler.Task.run(Task.scala:127)
>> 	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
>> 	at org.apache.spark.executor.Executor$TaskRunner$$Lambda$1792/1058609963.apply(Unknown
Source)
>> 	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
>> 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
>> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>> 	at java.lang.Thread.run(Thread.java:748)
>> 	
>> 	
>> 	
>> 	
>> driver): java.lang.OutOfMemoryError: Java heap space
>> 	at net.jpountz.lz4.LZ4BlockOutputStream.<init>(LZ4BlockOutputStream.java:102)
>> 	at org.apache.spark.io.LZ4CompressionCodec.compressedOutputStream(CompressionCodec.scala:145)
>> 	at org.apache.spark.serializer.SerializerManager.wrapForCompression(SerializerManager.scala:158)
>> 	at org.apache.spark.serializer.SerializerManager.wrapStream(SerializerManager.scala:133)
>> 	at org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:122)
>> 	at org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:245)
>> 	at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:158)
>> 	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
>> 	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
>> 	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
>> 	at org.apache.spark.scheduler.Task.run(Task.scala:127)
>> 	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
>> 	at org.apache.spark.executor.Executor$TaskRunner$$Lambda$1792/249605067.apply(Unknown
Source)
>> 	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
>> 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
>> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>> 	at java.lang.Thread.run(Thread.java:748)
>>
>>
>>

Mime
View raw message