spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <mich.talebza...@gmail.com>
Subject Re: OutOfMemoryError
Date Tue, 06 Jul 2021 08:27:02 GMT
Personally rather than


Parameters here:

val spark = SparkSession
  .builder
  .master("local[*]")
  .appName("OOM")
  .config("spark.driver.host", "localhost")
  .config("spark.driver.maxResultSize", "0")
  .config("spark.sql.caseSensitive", "false")
  .config("spark.sql.adaptive.enabled", "true")
  .config("spark.sql.adaptive.coalescePartitions.enabled", "true")
  .config("spark.driver.memory", "24g")
  .getOrCreate()

I leave the spec to run time

def spark_session_local(appName):
    return SparkSession.builder \
        .master('local[*]') \
        .appName(appName) \
        .enableHiveSupport() \
        .getOrCreate()



And then pass the parameters at Spark submit


    ${SPARK_HOME}/bin/spark-submit \
                --driver-memory 8G \
                --num-executors 1 \
                 --master local \
                --executor-cores 2 \
                --conf "spark.scheduler.mode=FIFO" \
                --conf "spark.ui.port=55555" \
                --conf spark.executor.memoryOverhead=3000 \

HTH


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Thu, 1 Jul 2021 at 12:44, Sean Owen <srowen@gmail.com> wrote:

> You need to set driver memory before the driver starts, on the CLI or
> however you run your app, not in the app itself. By the time the driver
> starts to run your app, its heap is already set.
>
> On Thu, Jul 1, 2021 at 12:10 AM javaguy Java <javaguy44@gmail.com> wrote:
>
>> Hi,
>>
>> I'm getting Java OOM errors even though I'm setting my driver memory to
>> 24g and I'm executing against local[*]
>>
>> I was wondering if anyone can give me any insight.  The server this job is running
on has more than enough memory as does the spark driver.
>>
>> The final result does write 3 csv files that are 300MB each so there's no way its
coming close to the 24g
>>
>> From the OOM, I don't know about the internals of Spark itself to tell me where this
is failing + how I should refactor or change anything
>>
>> Would appreciate any advice on how I can resolve
>>
>> Thx
>>
>>
>> Parameters here:
>>
>> val spark = SparkSession
>>   .builder
>>   .master("local[*]")
>>   .appName("OOM")
>>   .config("spark.driver.host", "localhost")
>>   .config("spark.driver.maxResultSize", "0")
>>   .config("spark.sql.caseSensitive", "false")
>>   .config("spark.sql.adaptive.enabled", "true")
>>   .config("spark.sql.adaptive.coalescePartitions.enabled", "true")
>>   .config("spark.driver.memory", "24g")
>>   .getOrCreate()
>>
>>
>> My OOM errors are below:
>>
>> driver): java.lang.OutOfMemoryError: Java heap space
>> 	at java.io.BufferedOutputStream.<init>(BufferedOutputStream.java:76)
>> 	at org.apache.spark.storage.DiskBlockObjectWriter$ManualCloseBufferedOutputStream$1.<init>(DiskBlockObjectWriter.scala:109)
>> 	at org.apache.spark.storage.DiskBlockObjectWriter.initialize(DiskBlockObjectWriter.scala:110)
>> 	at org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:118)
>> 	at org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:245)
>> 	at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:158)
>> 	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
>> 	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
>> 	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
>> 	at org.apache.spark.scheduler.Task.run(Task.scala:127)
>> 	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
>> 	at org.apache.spark.executor.Executor$TaskRunner$$Lambda$1792/1058609963.apply(Unknown
Source)
>> 	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
>> 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
>> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>> 	at java.lang.Thread.run(Thread.java:748)
>> 	
>> 	
>> 	
>> 	
>> driver): java.lang.OutOfMemoryError: Java heap space
>> 	at net.jpountz.lz4.LZ4BlockOutputStream.<init>(LZ4BlockOutputStream.java:102)
>> 	at org.apache.spark.io.LZ4CompressionCodec.compressedOutputStream(CompressionCodec.scala:145)
>> 	at org.apache.spark.serializer.SerializerManager.wrapForCompression(SerializerManager.scala:158)
>> 	at org.apache.spark.serializer.SerializerManager.wrapStream(SerializerManager.scala:133)
>> 	at org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:122)
>> 	at org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:245)
>> 	at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:158)
>> 	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
>> 	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
>> 	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
>> 	at org.apache.spark.scheduler.Task.run(Task.scala:127)
>> 	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
>> 	at org.apache.spark.executor.Executor$TaskRunner$$Lambda$1792/249605067.apply(Unknown
Source)
>> 	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
>> 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
>> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>> 	at java.lang.Thread.run(Thread.java:748)
>>
>>
>>

Mime
View raw message