spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <jan.zi...@centrum.cz>
Subject Re: Spark on Yarn probably trying to load all the data to RAM
Date Mon, 03 Nov 2014 19:46:02 GMT
I have 3 datasets in all the datasets the average file size is 10-12Kb. 
I am able to run my code on the dataset with 70K files, but I am not able to run it on datasets
with 1.1M and 3.8M files. 
______________________________________________________________


On Sun, Nov 2, 2014 at 1:35 AM,  <jan.zikes@centrum.cz> wrote:
> Hi,
>
> I am using Spark on Yarn, particularly Spark in Python. I am trying to run:
>
> myrdd = sc.textFile("s3n://mybucket/files/*/*/*.json")

How many files do you have? and the average size of each file?

> myrdd.getNumPartitions()
>
> Unfortunately it seems that Spark tries to load everything to RAM, or at
> least after while of running this everything slows down and then I am
> getting errors with log below. Everything works fine for datasets smaller
> than RAM, but I would expect Spark doing this without storing everything to
> RAM. So I would like to ask if I'm not missing some settings in Spark on
> Yarn?
>
>
> Thank you in advance for any help.
>
>
> 14/11/01 22:06:57 ERROR actor.ActorSystemImpl: Uncaught fatal error from
> thread [sparkDriver-akka.actor.default-dispatcher-375] shutting down
> ActorSystem [sparkDriver]
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
> 14/11/01 22:06:57 ERROR actor.ActorSystemImpl: Uncaught fatal error from
> thread [sparkDriver-akka.actor.default-dispatcher-381] shutting down
> ActorSystem [sparkDriver]
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
> 11744,575: [Full GC 1194515K->1192839K(1365504K), 2,2367150 secs]
>
> 11746,814: [Full GC 1194507K->1193186K(1365504K), 2,1788150 secs]
>
> 11748,995: [Full GC 1194507K->1193278K(1365504K), 1,3511480 secs]
>
> 11750,347: [Full GC 1194507K->1193263K(1365504K), 2,2735350 secs]
>
> 11752,622: [Full GC 1194506K->1193192K(1365504K), 1,2700110 secs]
>
> Traceback (most recent call last):
>
>   File "<stdin>", line 1, in <module>
>
>   File "/home/hadoop/spark/python/pyspark/rdd.py", line 391, in
> getNumPartitions
>
>     return self._jrdd.partitions().size()
>
>   File
> "/home/hadoop/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
> line 538, in __call__
>
>   File
> "/home/hadoop/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line
> 300, in get_return_value
>
> py4j.protocol.Py4JJavaError14/11/01 22:07:07 INFO scheduler.DAGScheduler:
> Failed to run saveAsTextFile at NativeMethodAccessorImpl.java:-2
>
> : An error occurred while calling o112.partitions.
>
> : java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>
>
>>>> 11753,896: [Full GC 1194506K->947839K(1365504K), 2,1483780 secs]
>
> 14/11/01 22:07:09 INFO remote.RemoteActorRefProvider$RemotingTerminator:
> Shutting down remote daemon.
>
> 14/11/01 22:07:09 ERROR actor.ActorSystemImpl: Uncaught fatal error from
> thread [sparkDriver-akka.actor.default-dispatcher-381] shutting down
> ActorSystem [sparkDriver]
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
> 14/11/01 22:07:09 ERROR actor.ActorSystemImpl: Uncaught fatal error from
> thread [sparkDriver-akka.actor.default-dispatcher-309] shutting down
> ActorSystem [sparkDriver]
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
> 14/11/01 22:07:09 INFO remote.RemoteActorRefProvider$RemotingTerminator:
> Remote daemon shut down; proceeding with flushing remote transports.
>
> 14/11/01 22:07:09 INFO Remoting: Remoting shut down
>
> 14/11/01 22:07:09 INFO remote.RemoteActorRefProvider$RemotingTerminator:
> Remoting shut down.
>
> 14/11/01 22:07:09 INFO network.ConnectionManager: Removing
> ReceivingConnection to
> ConnectionManagerId(ip-172-31-18-35.us-west-2.compute.internal,55871)
>
> 14/11/01 22:07:09 INFO network.ConnectionManager: Removing SendingConnection
> to ConnectionManagerId(ip-172-31-18-35.us-west-2.compute.internal,55871)
>
> 14/11/01 22:07:09 INFO network.ConnectionManager: Removing SendingConnection
> to ConnectionManagerId(ip-172-31-18-35.us-west-2.compute.internal,55871)
>
> 14/11/01 22:07:09 INFO network.ConnectionManager: Key not valid ?
> sun.nio.ch.SelectionKeyImpl@5ca1c790
>
> 14/11/01 22:07:09 INFO network.ConnectionManager: key already cancelled ?
> sun.nio.ch.SelectionKeyImpl@5ca1c790
>
> java.nio.channels.CancelledKeyException
>
> at
> org.apache.spark.network.ConnectionManager.run(ConnectionManager.scala:386)
>
> at
> org.apache.spark.network.ConnectionManager$$anon$4.run(ConnectionManager.scala:139)
>
> 14/11/01 22:07:09 INFO network.ConnectionManager: Removing SendingConnection
> to ConnectionManagerId(ip-172-31-18-35.us-west-2.compute.internal,52768)
>
> 14/11/01 22:07:09 INFO network.ConnectionManager: Removing
> ReceivingConnection to
> ConnectionManagerId(ip-172-31-18-35.us-west-2.compute.internal,52768)
>
> 14/11/01 22:07:09 ERROR network.ConnectionManager: Corresponding
> SendingConnection to
> ConnectionManagerId(ip-172-31-18-35.us-west-2.compute.internal,52768) not
> found
>
> 14/11/01 22:07:10 ERROR cluster.YarnClientSchedulerBackend: Yarn application
> already ended: FINISHED
>
> 14/11/01 22:07:10 INFO handler.ContextHandler: stopped
> o.e.j.s.ServletContextHandler{/metrics/json,null}
>
> 14/11/01 22:07:10 INFO handler.ContextHandler: stopped
> o.e.j.s.ServletContextHandler{/stages/stage/kill,null}
>
> 14/11/01 22:07:10 INFO handler.ContextHandler: stopped
> o.e.j.s.ServletContextHandler{/,null}
>
> 14/11/01 22:07:10 INFO handler.ContextHandler: stopped
> o.e.j.s.ServletContextHandler{/static,null}
>
> 14/11/01 22:07:10 INFO handler.ContextHandler: stopped
> o.e.j.s.ServletContextHandler{/executors/json,null}
>
> 14/11/01 22:07:10 INFO handler.ContextHandler: stopped
> o.e.j.s.ServletContextHandler{/executors,null}
>
> 14/11/01 22:07:10 INFO handler.ContextHandler: stopped
> o.e.j.s.ServletContextHandler{/environment/json,null}
>
> 14/11/01 22:07:10 INFO handler.ContextHandler: stopped
> o.e.j.s.ServletContextHandler{/environment,null}
>
> 14/11/01 22:07:10 INFO handler.ContextHandler: stopped
> o.e.j.s.ServletContextHandler{/storage/rdd/json,null}
>
> 14/11/01 22:07:10 INFO handler.ContextHandler: stopped
> o.e.j.s.ServletContextHandler{/storage/rdd,null}
>
> 14/11/01 22:07:10 INFO handler.ContextHandler: stopped
> o.e.j.s.ServletContextHandler{/storage/json,null}
>
> 14/11/01 22:07:10 INFO handler.ContextHandler: stopped
> o.e.j.s.ServletContextHandler{/storage,null}
>
> 14/11/01 22:07:10 INFO handler.ContextHandler: stopped
> o.e.j.s.ServletContextHandler{/stages/pool/json,null}
>
> 14/11/01 22:07:10 INFO handler.ContextHandler: stopped
> o.e.j.s.ServletContextHandler{/stages/pool,null}
>
> 14/11/01 22:07:10 INFO handler.ContextHandler: stopped
> o.e.j.s.ServletContextHandler{/stages/stage/json,null}
>
> 14/11/01 22:07:10 INFO handler.ContextHandler: stopped
> o.e.j.s.ServletContextHandler{/stages/stage,null}
>
> 14/11/01 22:07:10 INFO handler.ContextHandler: stopped
> o.e.j.s.ServletContextHandler{/stages/json,null}
>
> 14/11/01 22:07:10 INFO handler.ContextHandler: stopped
> o.e.j.s.ServletContextHandler{/stages,null}
>
> 14/11/01 22:07:10 INFO ui.SparkUI: Stopped Spark web UI at
> http://ip-172-31-20-69.us-west-2.compute.internal:4040 <http://ip-172-31-20-69.us-west-2.compute.internal:4040>
>
> 14/11/01 22:07:10 INFO scheduler.DAGScheduler: Stopping DAGScheduler
>
> 14/11/01 22:07:10 INFO cluster.YarnClientSchedulerBackend: Shutting down all
> executors
>
> 14/11/01 22:07:10 INFO cluster.YarnClientSchedulerBackend: Stopped
>
> 14/11/01 22:07:11 ERROR spark.MapOutputTrackerMaster: Error communicating
> with MapOutputTracker
>
> akka.pattern.AskTimeoutException:
> Recipient[Actor[akka://sparkDriver/user/MapOutputTracker#132167931]] had
> already been terminated.
>
> at akka.pattern.AskableActorRef$.ask$extension(AskSupport.scala:134)
>
> at org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:106)
>
> at org.apache.spark.MapOutputTracker.sendTracker(MapOutputTracker.scala:117)
>
> at org.apache.spark.MapOutputTrackerMaster.stop(MapOutputTracker.scala:324)
>
> at org.apache.spark.SparkEnv.stop(SparkEnv.scala:80)
>
> at org.apache.spark.SparkContext.stop(SparkContext.scala:1024)
>
> at
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend$$anon$1.run(YarnClientSchedulerBackend.scala:131)
>
> Exception in thread "Yarn Application State Checker"
> org.apache.spark.SparkException: Error communicating with MapOutputTracker
>
> at org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:111)
>
> at org.apache.spark.MapOutputTracker.sendTracker(MapOutputTracker.scala:117)
>
> at org.apache.spark.MapOutputTrackerMaster.stop(MapOutputTracker.scala:324)
>
> at org.apache.spark.SparkEnv.stop(SparkEnv.scala:80)
>
> at org.apache.spark.SparkContext.stop(SparkContext.scala:1024)
>
> at
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend$$anon$1.run(YarnClientSchedulerBackend.scala:131)
>
> Caused by: akka.pattern.AskTimeoutException:
> Recipient[Actor[akka://sparkDriver/user/MapOutputTracker#132167931]] had
> already been terminated.
>
> at akka.pattern.AskableActorRef$.ask$extension(AskSupport.scala:134)
>
> at org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:106)
>
> ... 5 more
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org




Mime
View raw message