spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <so...@cloudera.com>
Subject Re: Running "beyond memory limits" in ConnectedComponents
Date Thu, 15 Jan 2015 17:10:37 GMT
This is a YARN setting. It just controls how much any container can
reserve, including Spark executors. That is not the problem.

You need Spark to ask for more memory from YARN, on top of the memory that
is requested by --executor-memory. Your output indicates the default of 7%
is too little. For example you can ask for 20GB for executors and ask for
2GB of overhead. Spark will ask for 22GB from YARN. (Of course, YARN needs
to be set to allow containers of at least 22GB!)

On Thu, Jan 15, 2015 at 4:31 PM, Nitin kak <nitinkak001@gmail.com> wrote:

> Thanks for sticking to this thread.
>
> I am guessing what memory my app requests and what Yarn requests on my
> part should be same and is determined by the value of *--executor-memory*
> which I had set to *20G*. Or can the two values be different?
>
> I checked in Yarn configurations(below), so I think that fits well into
> the memory overhead limits.
>
>
> Container Memory Maximum
> yarn.scheduler.maximum-allocation-mb
>  MiBGiB
> Reset to the default value: 64 GiB
> <http://10.1.1.49:7180/cmf/services/108/config#>
> Override Instances
> <http://10.1.1.49:7180/cmf/service/108/roleType/RESOURCEMANAGER/group/yarn-RESOURCEMANAGER-BASE/config/yarn_scheduler_maximum_allocation_mb?wizardMode=false&returnUrl=%2Fcmf%2Fservices%2F108%2Fconfig&filterValue=>
>
> The largest amount of physical memory, in MiB, that can be requested for a
> container.
>
>
>
>
>
> On Thu, Jan 15, 2015 at 10:28 AM, Sean Owen <sowen@cloudera.com> wrote:
>
>> Those settings aren't relevant, I think. You're concerned with what
>> your app requests, and what Spark requests of YARN on your behalf. (Of
>> course, you can't request more than what your cluster allows for a
>> YARN container for example, but that doesn't seem to be what is
>> happening here.)
>>
>> You do not want to omit --executor-memory if you need large executor
>> memory heaps, since then you just request the default and that is
>> evidently not enough memory for your app.
>>
>> Look at http://spark.apache.org/docs/latest/running-on-yarn.html and
>> spark.yarn.executor.memoryOverhead  By default it's 7% of your 20G or
>> about 1.4G. You might set this higher to 2G to give more overhead.
>>
>> See the --config property=value syntax documented in
>> http://spark.apache.org/docs/latest/submitting-applications.html
>>
>> On Thu, Jan 15, 2015 at 3:47 AM, Nitin kak <nitinkak001@gmail.com> wrote:
>> > Thanks Sean.
>> >
>> > I guess Cloudera Manager has parameters executor_total_max_heapsize and
>> > worker_max_heapsize which point to the parameters you mentioned above.
>> >
>> > How much should that cushon between the jvm heap size and yarn memory
>> limit
>> > be?
>> >
>> > I tried setting jvm memory to 20g and yarn to 24g, but it gave the same
>> > error as above.
>> >
>> > Then, I removed the "--executor-memory" clause
>> >
>> > spark-submit --class ConnectedComponentsTest --master yarn-cluster
>> > --num-executors 7 --executor-cores 1
>> > target/scala-2.10/connectedcomponentstest_2.10-1.0.jar
>> >
>> > That is not giving GC, Out of memory exception
>> >
>> > 15/01/14 21:20:33 WARN channel.DefaultChannelPipeline: An exception was
>> > thrown by a user handler while handling an exception event ([id:
>> 0x362d65d4,
>> > /10.1.1.33:35463 => /10.1.1.73:43389] EXCEPTION:
>> java.lang.OutOfMemoryError:
>> > GC overhead limit exceeded)
>> > java.lang.OutOfMemoryError: GC overhead limit exceeded
>> >       at java.lang.Object.clone(Native Method)
>> >       at akka.util.CompactByteString$.apply(ByteString.scala:410)
>> >       at akka.util.ByteString$.apply(ByteString.scala:22)
>> >       at
>> >
>> akka.remote.transport.netty.TcpHandlers$class.onMessage(TcpSupport.scala:45)
>> >       at
>> >
>> akka.remote.transport.netty.TcpServerHandler.onMessage(TcpSupport.scala:57)
>> >       at
>> >
>> akka.remote.transport.netty.NettyServerHelpers$class.messageReceived(NettyHelpers.scala:43)
>> >       at
>> >
>> akka.remote.transport.netty.ServerHandler.messageReceived(NettyTransport.scala:179)
>> >       at
>> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
>> >       at
>> >
>> org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
>> >       at
>> >
>> org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
>> >       at
>> >
>> org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
>> >       at
>> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
>> >       at
>> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
>> >       at
>> org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
>> >       at
>> >
>> org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)
>> >       at
>> >
>> org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
>> >       at
>> >
>> org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
>> >       at
>> org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
>> >       at
>> >
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> >       at
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> >       at java.lang.Thread.run(Thread.java:745)
>> > 15/01/14 21:20:33 ERROR util.Utils: Uncaught exception in thread
>> > SparkListenerBus
>> > java.lang.OutOfMemoryError: GC overhead limit exceeded
>> >       at
>> scala.collection.mutable.ListBuffer.$plus$eq(ListBuffer.scala:168)
>> >       at
>> scala.collection.mutable.ListBuffer.$plus$eq(ListBuffer.scala:45)
>> >       at
>> >
>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>> >       at
>> >
>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>> >       at scala.collection.immutable.List.foreach(List.scala:318)
>> >       at
>> scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>> >       at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>> >       at org.json4s.JsonDSL$class.seq2jvalue(JsonDSL.scala:68)
>> >       at org.json4s.JsonDSL$.seq2jvalue(JsonDSL.scala:61)
>> >       at
>> >
>> org.apache.spark.util.JsonProtocol$$anonfun$jobStartToJson$3.apply(JsonProtocol.scala:127)
>> >       at
>> >
>> org.apache.spark.util.JsonProtocol$$anonfun$jobStartToJson$3.apply(JsonProtocol.scala:127)
>> >       at org.json4s.JsonDSL$class.pair2jvalue(JsonDSL.scala:79)
>> >       at org.json4s.JsonDSL$.pair2jvalue(JsonDSL.scala:61)
>> >       at
>> >
>> org.apache.spark.util.JsonProtocol$.jobStartToJson(JsonProtocol.scala:127)
>> >       at
>> >
>> org.apache.spark.util.JsonProtocol$.sparkEventToJson(JsonProtocol.scala:59)
>> >       at
>> >
>> org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:92)
>> >       at
>> >
>> org.apache.spark.scheduler.EventLoggingListener.onJobStart(EventLoggingListener.scala:118)
>> >       at
>> >
>> org.apache.spark.scheduler.SparkListenerBus$$anonfun$postToAll$3.apply(SparkListenerBus.scala:50)
>> >       at
>> >
>> org.apache.spark.scheduler.SparkListenerBus$$anonfun$postToAll$3.apply(SparkListenerBus.scala:50)
>> >       at
>> >
>> org.apache.spark.scheduler.SparkListenerBus$$anonfun$foreachListener$1.apply(SparkListenerBus.scala:83)
>> >       at
>> >
>> org.apache.spark.scheduler.SparkListenerBus$$anonfun$foreachListener$1.apply(SparkListenerBus.scala:81)
>> >       at
>> >
>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>> >       at
>> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>> >       at
>> >
>> org.apache.spark.scheduler.SparkListenerBus$class.foreachListener(SparkListenerBus.scala:81)
>> >       at
>> >
>> org.apache.spark.scheduler.SparkListenerBus$class.postToAll(SparkListenerBus.scala:50)
>> >       at
>> >
>> org.apache.spark.scheduler.LiveListenerBus.postToAll(LiveListenerBus.scala:32)
>> >       at
>> >
>> org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:56)
>> >       at
>> >
>> org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:56)
>> >       at scala.Option.foreach(Option.scala:236)
>> >       at
>> >
>> org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(LiveListenerBus.scala:56)
>> >       at
>> >
>> org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47)
>> >       at
>> >
>> org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47)
>> > Exception in thread "SparkListenerBus" java.lang.OutOfMemoryError: GC
>> > overhead limit exceeded
>> >       at
>> scala.collection.mutable.ListBuffer.$plus$eq(ListBuffer.scala:168)
>> >       at
>> scala.collection.mutable.ListBuffer.$plus$eq(ListBuffer.scala:45)
>> >       at
>> >
>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>> >       at
>> >
>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>> >       at scala.collection.immutable.List.foreach(List.scala:318)
>> >       at
>> scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>> >       at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>> >       at org.json4s.JsonDSL$class.seq2jvalue(JsonDSL.scala:68)
>> >       at org.json4s.JsonDSL$.seq2jvalue(JsonDSL.scala:61)
>> >       at
>> >
>> org.apache.spark.util.JsonProtocol$$anonfun$jobStartToJson$3.apply(JsonProtocol.scala:127)
>> >       at
>> >
>> org.apache.spark.util.JsonProtocol$$anonfun$jobStartToJson$3.apply(JsonProtocol.scala:127)
>> >       at org.json4s.JsonDSL$class.pair2jvalue(JsonDSL.scala:79)
>> >       at org.json4s.JsonDSL$.pair2jvalue(JsonDSL.scala:61)
>> >       at
>> >
>> org.apache.spark.util.JsonProtocol$.jobStartToJson(JsonProtocol.scala:127)
>> >       at
>> >
>> org.apache.spark.util.JsonProtocol$.sparkEventToJson(JsonProtocol.scala:59)
>> >       at
>> >
>> org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:92)
>> >       at
>> >
>> org.apache.spark.scheduler.EventLoggingListener.onJobStart(EventLoggingListener.scala:118)
>> >       at
>> >
>> org.apache.spark.scheduler.SparkListenerBus$$anonfun$postToAll$3.apply(SparkListenerBus.scala:50)
>> >       at
>> >
>> org.apache.spark.scheduler.SparkListenerBus$$anonfun$postToAll$3.apply(SparkListenerBus.scala:50)
>> >       at
>> >
>> org.apache.spark.scheduler.SparkListenerBus$$anonfun$foreachListener$1.apply(SparkListenerBus.scala:83)
>> >       at
>> >
>> org.apache.spark.scheduler.SparkListenerBus$$anonfun$foreachListener$1.apply(SparkListenerBus.scala:81)
>> >       at
>> >
>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>> >       at
>> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>> >       at
>> >
>> org.apache.spark.scheduler.SparkListenerBus$class.foreachListener(SparkListenerBus.scala:81)
>> >       at
>> >
>> org.apache.spark.scheduler.SparkListenerBus$class.postToAll(SparkListenerBus.scala:50)
>> >       at
>> >
>> org.apache.spark.scheduler.LiveListenerBus.postToAll(LiveListenerBus.scala:32)
>> >       at
>> >
>> org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:56)
>> >       at
>> >
>> org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:56)
>> >       at scala.Option.foreach(Option.scala:236)
>> >       at
>> >
>> org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(LiveListenerBus.scala:56)
>> >       at
>> >
>> org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47)
>> >       at
>> >
>> org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47)
>> >
>> >
>> > On Wed, Jan 14, 2015 at 4:44 PM, Sean Owen <sowen@cloudera.com> wrote:
>> >>
>> >> That's not quite what that error means. Spark is not out of memory. It
>> >> means that Spark is using more memory than it asked YARN for. That in
>> >> turn is because the default amount of cushion established between the
>> >> YARN allowed container size and the JVM heap size is too small. See
>> >> spark.yarn.executor.memoryOverhead in
>> >> http://spark.apache.org/docs/latest/running-on-yarn.html
>> >>
>> >> On Wed, Jan 14, 2015 at 9:18 PM, nitinkak001 <nitinkak001@gmail.com>
>> >> wrote:
>> >> > I am trying to run connected components algorithm in Spark. The graph
>> >> > has
>> >> > roughly 28M edges and 3.2M vertices. Here is the code I am using
>> >> >
>> >> >  /val inputFile =
>> >> >
>> "/user/hive/warehouse/spark_poc.db/window_compare_output_text/000000_0"
>> >> >     val conf = new SparkConf().setAppName("ConnectedComponentsTest")
>> >> >     val sc = new SparkContext(conf)
>> >> >     val graph = GraphLoader.edgeListFile(sc, inputFile, true, 7,
>> >> > StorageLevel.MEMORY_AND_DISK, StorageLevel.MEMORY_AND_DISK);
>> >> >     graph.cache();
>> >> >     val cc = graph.connectedComponents();
>> >> >     graph.edges.saveAsTextFile("/user/kakn/output");/
>> >> >
>> >> > and here is the command:
>> >> >
>> >> > /spark-submit --class ConnectedComponentsTest --master yarn-cluster
>> >> > --num-executors 7 --driver-memory 6g --executor-memory 8g
>> >> > --executor-cores 1
>> >> > target/scala-2.10/connectedcomponentstest_2.10-1.0.jar/
>> >> >
>> >> > It runs for about an hour and then fails with below error. *Isnt
>> Spark
>> >> > supposed to spill on disk if the RDDs dont fit into the memory?*
>> >> >
>> >> > Application application_1418082773407_8587 failed 2 times due to AM
>> >> > Container for appattempt_1418082773407_8587_000002 exited with
>> exitCode:
>> >> > -104 due to: Container
>> >> > [pid=19790,containerID=container_1418082773407_8587_02_000001] is
>> >> > running
>> >> > beyond physical memory limits. Current usage: 6.5 GB of 6.5 GB
>> physical
>> >> > memory used; 8.9 GB of 13.6 GB virtual memory used. Killing
>> container.
>> >> > Dump of the process-tree for container_1418082773407_8587_02_000001
:
>> >> > |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)
>> >> > SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES)
>> FULL_CMD_LINE
>> >> > |- 19790 19788 19790 19790 (bash) 0 0 110809088 336 /bin/bash -c
>> >> > /usr/java/jdk1.7.0_67-cloudera/bin/java -server -Xmx6144m
>> >> >
>> >> >
>> -Djava.io.tmpdir=/mnt/DATA1/yarn/nm/usercache/kakn/appcache/application_1418082773407_8587/container_1418082773407_8587_02_000001/tmp
>> >> > '-Dspark.executor.memory=8g' '-Dspark.eventLog.enabled=true'
>> >> > '-Dspark.yarn.secondary.jars='
>> >> > '-Dspark.app.name=ConnectedComponentsTest'
>> >> >
>> >> >
>> '-Dspark.eventLog.dir=hdfs://<server-name-replaced>:8020/user/spark/applicationHistory'
>> >> > '-Dspark.master=yarn-cluster'
>> >> > org.apache.spark.deploy.yarn.ApplicationMaster
>> >> > --class 'ConnectedComponentsTest' --jar
>> >> >
>> >> >
>> 'file:/home/kakn01/Spark/SparkSource/target/scala-2.10/connectedcomponentstest_2.10-1.0.jar'
>> >> > --executor-memory 8192 --executor-cores 1 --num-executors 7 1>
>> >> >
>> >> >
>> /var/log/hadoop-yarn/container/application_1418082773407_8587/container_1418082773407_8587_02_000001/stdout
>> >> > 2>
>> >> >
>> >> >
>> /var/log/hadoop-yarn/container/application_1418082773407_8587/container_1418082773407_8587_02_000001/stderr
>> >> > |- 19794 19790 19790 19790 (java) 205066 9152 9477726208 1707599
>> >> > /usr/java/jdk1.7.0_67-cloudera/bin/java -server -Xmx6144m
>> >> >
>> >> >
>> -Djava.io.tmpdir=/mnt/DATA1/yarn/nm/usercache/kakn/appcache/application_1418082773407_8587/container_1418082773407_8587_02_000001/tmp
>> >> > -Dspark.executor.memory=8g -Dspark.eventLog.enabled=true
>> >> > -Dspark.yarn.secondary.jars= -Dspark.app.name
>> =ConnectedComponentsTest
>> >> >
>> >> >
>> -Dspark.eventLog.dir=hdfs://<server-name-replaced>:8020/user/spark/applicationHistory
>> >> > -Dspark.master=yarn-cluster
>> >> > org.apache.spark.deploy.yarn.ApplicationMaster
>> >> > --class ConnectedComponentsTest --jar
>> >> >
>> >> >
>> file:/home/kakn01/Spark/SparkSource/target/scala-2.10/connectedcomponentstest_2.10-1.0.jar
>> >> > --executor-memory 8192 --executor-cores 1 --num-executors 7
>> >> > Container killed on request. Exit code is 143
>> >> > Container exited with a non-zero exit code 143
>> >> > .Failing this attempt.. Failing the application.
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > View this message in context:
>> >> >
>> http://apache-spark-user-list.1001560.n3.nabble.com/Running-beyond-memory-limits-in-ConnectedComponents-tp21139.html
>> >> > Sent from the Apache Spark User List mailing list archive at
>> Nabble.com.
>> >> >
>> >> > ---------------------------------------------------------------------
>> >> > To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> >> > For additional commands, e-mail: user-help@spark.apache.org
>> >> >
>> >
>> >
>>
>
>

Mime
View raw message