spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raghava Mutharaju <m.vijayaragh...@gmail.com>
Subject Re: Spark driver getting out of memory
Date Sun, 24 Jul 2016 18:53:55 GMT
Saurav,

We have the same issue. Our application runs fine on 32 nodes with 4 cores
each and 256 partitions but gives an OOM on the driver when run on 64 nodes
with 512 partitions. Did you get to know the reason behind this behavior or
the relation between number of partitions and driver RAM usage?

Regards,
Raghava.


On Wed, Jul 20, 2016 at 2:08 AM, Saurav Sinha <sauravsinha76@gmail.com>
wrote:

> Hi,
>
> I have set driver memory 10 GB and job ran with intermediate failure which
> is recovered back by spark.
>
> But I still what to know if no of parts increases git driver ram need to
> be increased and what is ration of no of parts/RAM.
>
> @RK : I am using cache on RDD. Is this reason of high RAM utilization.
>
> Thanks,
> Saurav Sinha
>
> On Tue, Jul 19, 2016 at 10:14 PM, RK Aduri <rkaduri@collectivei.com>
> wrote:
>
>> Just want to see if this helps.
>>
>> Are you doing heavy collects and persist that? If that is so, you might
>> want to parallelize that collection by converting to an RDD.
>>
>> Thanks,
>> RK
>>
>> On Tue, Jul 19, 2016 at 12:09 AM, Saurav Sinha <sauravsinha76@gmail.com>
>> wrote:
>>
>>> Hi Mich,
>>>
>>>    1. In what mode are you running the spark standalone, yarn-client,
>>>    yarn cluster etc
>>>
>>> Ans: spark standalone
>>>
>>>    1. You have 4 nodes with each executor having 10G. How many actual
>>>    executors do you see in UI (Port 4040 by default)
>>>
>>> Ans: There are 4 executor on which am using 8 cores
>>> (--total-executor-core 32)
>>>
>>>    1. What is master memory? Are you referring to diver memory? May be
>>>    I am misunderstanding this
>>>
>>> Ans: Driver memory is set as --drive-memory 5g
>>>
>>>    1. The only real correlation I see with the driver memory is when
>>>    you are running in local mode where worker lives within JVM process that
>>>    you start with spark-shell etc. In that case driver memory matters.
>>>    However, it appears that you are running in another mode with 4 nodes?
>>>
>>> Ans: I am running my job as spark-submit and on my worker(executor) node
>>> there is no OOM issue ,it only happening on driver app.
>>>
>>> Thanks,
>>> Saurav Sinha
>>>
>>> On Tue, Jul 19, 2016 at 2:42 AM, Mich Talebzadeh <
>>> mich.talebzadeh@gmail.com> wrote:
>>>
>>>> can you please clarify:
>>>>
>>>>
>>>>    1. In what mode are you running the spark standalone, yarn-client,
>>>>    yarn cluster etc
>>>>    2. You have 4 nodes with each executor having 10G. How many actual
>>>>    executors do you see in UI (Port 4040 by default)
>>>>    3. What is master memory? Are you referring to diver memory? May be
>>>>    I am misunderstanding this
>>>>    4. The only real correlation I see with the driver memory is when
>>>>    you are running in local mode where worker lives within JVM process that
>>>>    you start with spark-shell etc. In that case driver memory matters.
>>>>    However, it appears that you are running in another mode with 4 nodes?
>>>>
>>>> Can you get a snapshot of your environment tab in UI and send the
>>>> output please?
>>>>
>>>> HTH
>>>>
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>> any loss, damage or destruction of data or any other property which may
>>>> arise from relying on this email's technical content is explicitly
>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>> arising from such loss, damage or destruction.
>>>>
>>>>
>>>>
>>>> On 18 July 2016 at 11:50, Saurav Sinha <sauravsinha76@gmail.com> wrote:
>>>>
>>>>> I have set --drive-memory 5g. I need to understand that as no of
>>>>> partition increase drive-memory need to be increased. What will be
>>>>> best ration of No of partition/drive-memory.
>>>>>
>>>>> On Mon, Jul 18, 2016 at 4:07 PM, Zhiliang Zhu <zchl.jump@yahoo.com>
>>>>> wrote:
>>>>>
>>>>>> try to set --drive-memory xg , x would be as large as can be set
.
>>>>>>
>>>>>>
>>>>>> On Monday, July 18, 2016 6:31 PM, Saurav Sinha <
>>>>>> sauravsinha76@gmail.com> wrote:
>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I am running spark job.
>>>>>>
>>>>>> Master memory - 5G
>>>>>> executor memort 10G(running on 4 node)
>>>>>>
>>>>>> My job is getting killed as no of partition increase to 20K.
>>>>>>
>>>>>> 16/07/18 14:53:13 INFO DAGScheduler: Got job 17 (foreachPartition
at
>>>>>> WriteToKafka.java:45) with 13524 output partitions (allowLocal=false)
>>>>>> 16/07/18 14:53:13 INFO DAGScheduler: Final stage: ResultStage
>>>>>> 640(foreachPartition at WriteToKafka.java:45)
>>>>>> 16/07/18 14:53:13 INFO DAGScheduler: Parents of final stage:
>>>>>> List(ShuffleMapStage 518, ShuffleMapStage 639)
>>>>>> 16/07/18 14:53:23 INFO DAGScheduler: Missing parents: List()
>>>>>> 16/07/18 14:53:23 INFO DAGScheduler: Submitting ResultStage 640
>>>>>> (MapPartitionsRDD[271] at map at BuildSolrDocs.java:209), which has
no
>>>>>> missing
>>>>>> parents
>>>>>> 16/07/18 14:53:23 INFO MemoryStore: ensureFreeSpace(8248) called
with
>>>>>> curMem=41923262, maxMem=2778778828
>>>>>> 16/07/18 14:53:23 INFO MemoryStore: Block broadcast_90 stored as
>>>>>> values in memory (estimated size 8.1 KB, free 2.5 GB)
>>>>>> Exception in thread "dag-scheduler-event-loop"
>>>>>> java.lang.OutOfMemoryError: Java heap space
>>>>>>         at
>>>>>> org.apache.spark.util.io.ByteArrayChunkOutputStream.allocateNewChunkIfNeeded(ByteArrayChunkOutputStream.scala:66)
>>>>>>         at
>>>>>> org.apache.spark.util.io.ByteArrayChunkOutputStream.write(ByteArrayChunkOutputStream.scala:55)
>>>>>>         at
>>>>>> org.xerial.snappy.SnappyOutputStream.dumpOutput(SnappyOutputStream.java:294)
>>>>>>         at
>>>>>> org.xerial.snappy.SnappyOutputStream.flush(SnappyOutputStream.java:273)
>>>>>>         at
>>>>>> org.apache.spark.io.SnappyOutputStreamWrapper.flush(CompressionCodec.scala:197)
>>>>>>         at
>>>>>> java.io.ObjectOutputStream$BlockDataOutputStream.flush(ObjectOutputStream.java:1822)
>>>>>>
>>>>>>
>>>>>> Help needed.
>>>>>>
>>>>>> --
>>>>>> Thanks and Regards,
>>>>>>
>>>>>> Saurav Sinha
>>>>>>
>>>>>> Contact: 9742879062
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Thanks and Regards,
>>>>>
>>>>> Saurav Sinha
>>>>>
>>>>> Contact: 9742879062
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Thanks and Regards,
>>>
>>> Saurav Sinha
>>>
>>> Contact: 9742879062
>>>
>>
>>
>> Collective[i] dramatically improves sales and marketing performance using
>> technology, applications and a revolutionary network designed to provide
>> next generation analytics and decision-support directly to business users.
>> Our goal is to maximize human potential and minimize mistakes. In most
>> cases, the results are astounding. We cannot, however, stop emails from
>> sometimes being sent to the wrong person. If you are not the intended
>> recipient, please notify us by replying to this email's sender and deleting
>> it (and any attachments) permanently from your system. If you are, please
>> respect the confidentiality of this communication's contents.
>
>
>
>
> --
> Thanks and Regards,
>
> Saurav Sinha
>
> Contact: 9742879062
>



-- 
Regards,
Raghava
http://raghavam.github.io

Mime
View raw message