spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From giive chen <thegi...@gmail.com>
Subject Re: Understanding Spark Memory distribution
Date Mon, 30 Mar 2015 16:24:32 GMT
Hi Ankur

If you using standalone mode, your config is wrong. You should use "export
SPARK_DAEMON_MEMORY=xxx "  in config/spark-env.sh. At least it works on my
spark 1.3.0 standalone mode machine.

BTW, The SPARK_DRIVER_MEMORY is used in Yarn mode and looks like the
standalone mode don't use this config.

To debug this, please type "ps auxw | grep
org.apache.spark.deploy.master.[M]aster"  in master machine.
You can see the Xmx and Xms option.

Wisely Chen






On Mon, Mar 30, 2015 at 3:55 AM, Ankur Srivastava <
ankur.srivastava@gmail.com> wrote:

> Hi Wisely,
>
> I am running on Amazon EC2 instances so I can not doubt the hardware.
> Moreover my other pipelines run successfully except for this which involves
> Broadcasting large object.
>
> My spark-en.sh setting are:
>
> SPARK_MASTER_IP=<MASTER-IP>
>
> SPARK_LOCAL_IP=<LOCAL-IP>
>
> SPARK_DRIVER_MEMORY=24g
>
> SPARK_WORKER_MEMORY=28g
>
> SPARK_EXECUTOR_MEMORY=26g
>
> SPARK_WORKER_CORES=8
>
> My spark-default.sh settings are:
>
> spark.eventLog.enabled           true
>
> spark.eventLog.dir               /srv/logs/
>
> spark.serializer                 org.apache.spark.serializer.KryoSerializer
>
> spark.kryo.registrator
> com.test.utils.KryoSerializationRegistrator
>
> spark.executor.extraJavaOptions  "-verbose:gc -XX:+PrintGCDetails
> -XX:+PrintGCTimeStamps -XX:+HeapDumpOnOutOfMemoryError
> -XX:HeapDumpPath=/srv/logs/ -XX:+UseG1GC"
>
> spark.shuffle.consolidateFiles   true
>
> spark.shuffle.manager            sort
>
> spark.shuffle.compress           true
>
> spark.rdd.compress               true
> Thanks
> Ankur
>
> On Sat, Mar 28, 2015 at 7:57 AM, Wisely Chen <wiselychen@appier.com>
> wrote:
>
>> Hi Ankur
>>
>> If your hardware is ok, looks like it is config problem. Can you show me
>> the config of spark-env.sh or JVM config?
>>
>> Thanks
>>
>> Wisely Chen
>>
>> 2015-03-28 15:39 GMT+08:00 Ankur Srivastava <ankur.srivastava@gmail.com>:
>>
>>> Hi Wisely,
>>> I have 26gb for driver and the master is running on m3.2xlarge machines.
>>>
>>> I see OOM errors on workers and even they are running with 26th of
>>> memory.
>>>
>>> Thanks
>>>
>>> On Fri, Mar 27, 2015, 11:43 PM Wisely Chen <wiselychen@appier.com>
>>> wrote:
>>>
>>>> Hi
>>>>
>>>> In broadcast, spark will collect the whole 3gb object into master node
>>>> and broadcast to each slaves. It is very common situation that the master
>>>> node don't have enough memory .
>>>>
>>>> What is your master node settings?
>>>>
>>>> Wisely Chen
>>>>
>>>> Ankur Srivastava <ankur.srivastava@gmail.com> 於 2015年3月28日
星期六寫道:
>>>>
>>>> I have increased the "spark.storage.memoryFraction" to 0.4 but I still
>>>>> get OOM errors on Spark Executor nodes
>>>>>
>>>>>
>>>>> 15/03/27 23:19:51 INFO BlockManagerMaster: Updated info of block
>>>>> broadcast_5_piece10
>>>>>
>>>>> 15/03/27 23:19:51 INFO TorrentBroadcast: Reading broadcast variable 5
>>>>> took 2704 ms
>>>>>
>>>>> 15/03/27 23:19:52 INFO MemoryStore: ensureFreeSpace(672530208) called
>>>>> with curMem=2484698683, maxMem=9631778734
>>>>>
>>>>> 15/03/27 23:19:52 INFO MemoryStore: Block broadcast_5 stored as values
>>>>> in memory (estimated size 641.4 MB, free 6.0 GB)
>>>>>
>>>>> 15/03/27 23:34:02 WARN AkkaUtils: Error sending message in 1 attempts
>>>>>
>>>>> java.util.concurrent.TimeoutException: Futures timed out after [30
>>>>> seconds]
>>>>>
>>>>>         at
>>>>> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
>>>>>
>>>>>         at
>>>>> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
>>>>>
>>>>>         at
>>>>> scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
>>>>>
>>>>>         at
>>>>> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
>>>>>
>>>>>         at scala.concurrent.Await$.result(package.scala:107)
>>>>>
>>>>>         at
>>>>> org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:187)
>>>>>
>>>>>         at
>>>>> org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:407)
>>>>>
>>>>> 15/03/27 23:34:02 ERROR Executor: Exception in task 7.0 in stage 2.0
>>>>> (TID 4007)
>>>>>
>>>>> java.lang.OutOfMemoryError: GC overhead limit exceeded
>>>>>
>>>>>         at
>>>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1986)
>>>>>
>>>>>         at
>>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>>>>>
>>>>>         at
>>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>>>>>
>>>>> Thanks
>>>>>
>>>>> Ankur
>>>>>
>>>>> On Fri, Mar 27, 2015 at 2:52 PM, Ankur Srivastava <
>>>>> ankur.srivastava@gmail.com> wrote:
>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> I am running a spark cluster on EC2 instances of type: m3.2xlarge.
I
>>>>>> have given 26gb of memory with all 8 cores to my executors. I can
see that
>>>>>> in the logs too:
>>>>>>
>>>>>> *15/03/27 21:31:06 INFO AppClient$ClientActor: Executor added:
>>>>>> app-20150327213106-0000/0 on worker-20150327212934-10.x.y.z-40128
>>>>>> (10.x.y.z:40128) with 8 cores*
>>>>>>
>>>>>> I am not caching any RDD so I have set "spark.storage.memoryFraction"
>>>>>> to 0.2. I can see on SparkUI under executors tab Memory used is 0.0/4.5
GB.
>>>>>>
>>>>>> I am now confused with these logs?
>>>>>>
>>>>>> *15/03/27 21:31:08 INFO BlockManagerMasterActor: Registering block
>>>>>> manager 10.77.100.196:58407 <http://10.77.100.196:58407> with
4.5 GB RAM,
>>>>>> BlockManagerId(4, 10.x.y.z, 58407)*
>>>>>>
>>>>>> I am broadcasting a large object of 3 gb and after that when I am
>>>>>> creating an RDD, I see logs which show this 4.5 GB memory getting
full and
>>>>>> then I get OOM.
>>>>>>
>>>>>> How can I make block manager use more memory?
>>>>>>
>>>>>> Is there any other fine tuning I need to do for broadcasting large
>>>>>> objects?
>>>>>>
>>>>>> And does broadcast variable use cache memory or rest of the heap?
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> Ankur
>>>>>>
>>>>>
>>>>>
>>
>

Mime
View raw message