Hi Wisely,
I have 26gb for driver and the master is running on m3.2xlarge machines.

I see OOM errors on workers and even they are running with 26th of memory.


On Fri, Mar 27, 2015, 11:43 PM Wisely Chen <wiselychen@appier.com> wrote:

In broadcast, spark will collect the whole 3gb object into master node and broadcast to each slaves. It is very common situation that the master node don't have enough memory . 

What is your master node settings? 

Wisely Chen

Ankur Srivastava <ankur.srivastava@gmail.com> 於 2015年3月28日 星期六寫道:

I have increased the "spark.storage.memoryFraction" to 0.4 but I still get OOM errors on Spark Executor nodes

15/03/27 23:19:51 INFO BlockManagerMaster: Updated info of block broadcast_5_piece10

15/03/27 23:19:51 INFO TorrentBroadcast: Reading broadcast variable 5 took 2704 ms

15/03/27 23:19:52 INFO MemoryStore: ensureFreeSpace(672530208) called with curMem=2484698683, maxMem=9631778734

15/03/27 23:19:52 INFO MemoryStore: Block broadcast_5 stored as values in memory (estimated size 641.4 MB, free 6.0 GB)

15/03/27 23:34:02 WARN AkkaUtils: Error sending message in 1 attempts

java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]

        at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)

        at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)

        at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)

        at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)

        at scala.concurrent.Await$.result(package.scala:107)

        at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:187)

        at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:407)

15/03/27 23:34:02 ERROR Executor: Exception in task 7.0 in stage 2.0 (TID 4007)

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1986)

        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)

        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)



On Fri, Mar 27, 2015 at 2:52 PM, Ankur Srivastava <ankur.srivastava@gmail.com> wrote:

Hi All,

I am running a spark cluster on EC2 instances of type: m3.2xlarge. I have given 26gb of memory with all 8 cores to my executors. I can see that in the logs too:

15/03/27 21:31:06 INFO AppClient$ClientActor: Executor added: app-20150327213106-0000/0 on worker-20150327212934-10.x.y.z-40128 (10.x.y.z:40128) with 8 cores

I am not caching any RDD so I have set "spark.storage.memoryFraction" to 0.2. I can see on SparkUI under executors tab Memory used is 0.0/4.5 GB.

I am now confused with these logs?

15/03/27 21:31:08 INFO BlockManagerMasterActor: Registering block manager with 4.5 GB RAM, BlockManagerId(4, 10.x.y.z, 58407)

I am broadcasting a large object of 3 gb and after that when I am creating an RDD, I see logs which show this 4.5 GB memory getting full and then I get OOM.

How can I make block manager use more memory?

Is there any other fine tuning I need to do for broadcasting large objects?

And does broadcast variable use cache memory or rest of the heap?