spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <mich.talebza...@gmail.com>
Subject Re: Spark driver getting out of memory
Date Mon, 18 Jul 2016 21:12:14 GMT
can you please clarify:


   1. In what mode are you running the spark standalone, yarn-client, yarn
   cluster etc
   2. You have 4 nodes with each executor having 10G. How many actual
   executors do you see in UI (Port 4040 by default)
   3. What is master memory? Are you referring to diver memory? May be I am
   misunderstanding this
   4. The only real correlation I see with the driver memory is when you
   are running in local mode where worker lives within JVM process that you
   start with spark-shell etc. In that case driver memory matters. However, it
   appears that you are running in another mode with 4 nodes?

Can you get a snapshot of your environment tab in UI and send the output
please?

HTH


Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 18 July 2016 at 11:50, Saurav Sinha <sauravsinha76@gmail.com> wrote:

> I have set --drive-memory 5g. I need to understand that as no of
> partition increase drive-memory need to be increased. What will be best
> ration of No of partition/drive-memory.
>
> On Mon, Jul 18, 2016 at 4:07 PM, Zhiliang Zhu <zchl.jump@yahoo.com> wrote:
>
>> try to set --drive-memory xg , x would be as large as can be set .
>>
>>
>> On Monday, July 18, 2016 6:31 PM, Saurav Sinha <sauravsinha76@gmail.com>
>> wrote:
>>
>>
>> Hi,
>>
>> I am running spark job.
>>
>> Master memory - 5G
>> executor memort 10G(running on 4 node)
>>
>> My job is getting killed as no of partition increase to 20K.
>>
>> 16/07/18 14:53:13 INFO DAGScheduler: Got job 17 (foreachPartition at
>> WriteToKafka.java:45) with 13524 output partitions (allowLocal=false)
>> 16/07/18 14:53:13 INFO DAGScheduler: Final stage: ResultStage
>> 640(foreachPartition at WriteToKafka.java:45)
>> 16/07/18 14:53:13 INFO DAGScheduler: Parents of final stage:
>> List(ShuffleMapStage 518, ShuffleMapStage 639)
>> 16/07/18 14:53:23 INFO DAGScheduler: Missing parents: List()
>> 16/07/18 14:53:23 INFO DAGScheduler: Submitting ResultStage 640
>> (MapPartitionsRDD[271] at map at BuildSolrDocs.java:209), which has no
>> missing
>> parents
>> 16/07/18 14:53:23 INFO MemoryStore: ensureFreeSpace(8248) called with
>> curMem=41923262, maxMem=2778778828
>> 16/07/18 14:53:23 INFO MemoryStore: Block broadcast_90 stored as values
>> in memory (estimated size 8.1 KB, free 2.5 GB)
>> Exception in thread "dag-scheduler-event-loop"
>> java.lang.OutOfMemoryError: Java heap space
>>         at
>> org.apache.spark.util.io.ByteArrayChunkOutputStream.allocateNewChunkIfNeeded(ByteArrayChunkOutputStream.scala:66)
>>         at
>> org.apache.spark.util.io.ByteArrayChunkOutputStream.write(ByteArrayChunkOutputStream.scala:55)
>>         at
>> org.xerial.snappy.SnappyOutputStream.dumpOutput(SnappyOutputStream.java:294)
>>         at
>> org.xerial.snappy.SnappyOutputStream.flush(SnappyOutputStream.java:273)
>>         at
>> org.apache.spark.io.SnappyOutputStreamWrapper.flush(CompressionCodec.scala:197)
>>         at
>> java.io.ObjectOutputStream$BlockDataOutputStream.flush(ObjectOutputStream.java:1822)
>>
>>
>> Help needed.
>>
>> --
>> Thanks and Regards,
>>
>> Saurav Sinha
>>
>> Contact: 9742879062
>>
>>
>>
>
>
> --
> Thanks and Regards,
>
> Saurav Sinha
>
> Contact: 9742879062
>

Mime
View raw message