spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <jan.zi...@centrum.cz>
Subject Spark on Yarn probably trying to load all the data to RAM
Date Sun, 02 Nov 2014 09:35:10 GMT
Hi,

I am using Spark on Yarn, particularly Spark in Python. I am trying to run:

myrdd = sc.textFile("s3n://mybucket/files/*/*/*.json")
myrdd.getNumPartitions()

Unfortunately it seems that Spark tries to load everything to RAM, or at least after while
of running this everything slows down and then I am getting errors with log below. Everything
works fine for datasets smaller than RAM, but I would expect Spark doing this without storing
everything to RAM. So I would like to ask if I'm not missing some settings in Spark on Yarn?


Thank you in advance for any help.


14/11/01 22:06:57 ERROR actor.ActorSystemImpl: Uncaught fatal error from thread [sparkDriver-akka.actor.default-dispatcher-375]
shutting down ActorSystem [sparkDriver]
java.lang.OutOfMemoryError: GC overhead limit exceeded
14/11/01 22:06:57 ERROR actor.ActorSystemImpl: Uncaught fatal error from thread [sparkDriver-akka.actor.default-dispatcher-381]
shutting down ActorSystem [sparkDriver]
java.lang.OutOfMemoryError: GC overhead limit exceeded
11744,575: [Full GC 1194515K->1192839K(1365504K), 2,2367150 secs]
11746,814: [Full GC 1194507K->1193186K(1365504K), 2,1788150 secs]
11748,995: [Full GC 1194507K->1193278K(1365504K), 1,3511480 secs]
11750,347: [Full GC 1194507K->1193263K(1365504K), 2,2735350 secs]
11752,622: [Full GC 1194506K->1193192K(1365504K), 1,2700110 secs]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/hadoop/spark/python/pyspark/rdd.py", line 391, in getNumPartitions
    return self._jrdd.partitions().size()
  File "/home/hadoop/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538,
in __call__
  File "/home/hadoop/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in
get_return_value
py4j.protocol.Py4JJavaError14/11/01 22:07:07 INFO scheduler.DAGScheduler: Failed to run saveAsTextFile
at NativeMethodAccessorImpl.java:-2
: An error occurred while calling o112.partitions.
: java.lang.OutOfMemoryError: GC overhead limit exceeded
 
>>> 11753,896: [Full GC 1194506K->947839K(1365504K), 2,1483780 secs]
14/11/01 22:07:09 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote
daemon.
14/11/01 22:07:09 ERROR actor.ActorSystemImpl: Uncaught fatal error from thread [sparkDriver-akka.actor.default-dispatcher-381]
shutting down ActorSystem [sparkDriver]
java.lang.OutOfMemoryError: GC overhead limit exceeded
14/11/01 22:07:09 ERROR actor.ActorSystemImpl: Uncaught fatal error from thread [sparkDriver-akka.actor.default-dispatcher-309]
shutting down ActorSystem [sparkDriver]
java.lang.OutOfMemoryError: GC overhead limit exceeded
14/11/01 22:07:09 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut
down; proceeding with flushing remote transports.
14/11/01 22:07:09 INFO Remoting: Remoting shut down
14/11/01 22:07:09 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
14/11/01 22:07:09 INFO network.ConnectionManager: Removing ReceivingConnection to ConnectionManagerId(ip-172-31-18-35.us-west-2.compute.internal,55871)
14/11/01 22:07:09 INFO network.ConnectionManager: Removing SendingConnection to ConnectionManagerId(ip-172-31-18-35.us-west-2.compute.internal,55871)
14/11/01 22:07:09 INFO network.ConnectionManager: Removing SendingConnection to ConnectionManagerId(ip-172-31-18-35.us-west-2.compute.internal,55871)
14/11/01 22:07:09 INFO network.ConnectionManager: Key not valid ? sun.nio.ch.SelectionKeyImpl@5ca1c790
14/11/01 22:07:09 INFO network.ConnectionManager: key already cancelled ? sun.nio.ch.SelectionKeyImpl@5ca1c790
java.nio.channels.CancelledKeyException
 at org.apache.spark.network.ConnectionManager.run(ConnectionManager.scala:386)
 at org.apache.spark.network.ConnectionManager$$anon$4.run(ConnectionManager.scala:139)
14/11/01 22:07:09 INFO network.ConnectionManager: Removing SendingConnection to ConnectionManagerId(ip-172-31-18-35.us-west-2.compute.internal,52768)
14/11/01 22:07:09 INFO network.ConnectionManager: Removing ReceivingConnection to ConnectionManagerId(ip-172-31-18-35.us-west-2.compute.internal,52768)
14/11/01 22:07:09 ERROR network.ConnectionManager: Corresponding SendingConnection to ConnectionManagerId(ip-172-31-18-35.us-west-2.compute.internal,52768)
not found
14/11/01 22:07:10 ERROR cluster.YarnClientSchedulerBackend: Yarn application already ended:
FINISHED
14/11/01 22:07:10 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/metrics/json,null}
14/11/01 22:07:10 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage/kill,null}
14/11/01 22:07:10 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/,null}
14/11/01 22:07:10 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/static,null}
14/11/01 22:07:10 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors/json,null}
14/11/01 22:07:10 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors,null}
14/11/01 22:07:10 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/environment/json,null}
14/11/01 22:07:10 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/environment,null}
14/11/01 22:07:10 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/rdd/json,null}
14/11/01 22:07:10 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/rdd,null}
14/11/01 22:07:10 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/json,null}
14/11/01 22:07:10 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage,null}
14/11/01 22:07:10 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/pool/json,null}
14/11/01 22:07:10 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/pool,null}
14/11/01 22:07:10 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage/json,null}
14/11/01 22:07:10 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage,null}
14/11/01 22:07:10 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/json,null}
14/11/01 22:07:10 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages,null}
14/11/01 22:07:10 INFO ui.SparkUI: Stopped Spark web UI at http://ip-172-31-20-69.us-west-2.compute.internal:4040
14/11/01 22:07:10 INFO scheduler.DAGScheduler: Stopping DAGScheduler
14/11/01 22:07:10 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors
14/11/01 22:07:10 INFO cluster.YarnClientSchedulerBackend: Stopped
14/11/01 22:07:11 ERROR spark.MapOutputTrackerMaster: Error communicating with MapOutputTracker
akka.pattern.AskTimeoutException: Recipient[Actor[akka://sparkDriver/user/MapOutputTracker#132167931]]
had already been terminated.
 at akka.pattern.AskableActorRef$.ask$extension(AskSupport.scala:134)
 at org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:106)
 at org.apache.spark.MapOutputTracker.sendTracker(MapOutputTracker.scala:117)
 at org.apache.spark.MapOutputTrackerMaster.stop(MapOutputTracker.scala:324)
 at org.apache.spark.SparkEnv.stop(SparkEnv.scala:80)
 at org.apache.spark.SparkContext.stop(SparkContext.scala:1024)
 at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend$$anon$1.run(YarnClientSchedulerBackend.scala:131)
Exception in thread "Yarn Application State Checker" org.apache.spark.SparkException: Error
communicating with MapOutputTracker
 at org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:111)
 at org.apache.spark.MapOutputTracker.sendTracker(MapOutputTracker.scala:117)
 at org.apache.spark.MapOutputTrackerMaster.stop(MapOutputTracker.scala:324)
 at org.apache.spark.SparkEnv.stop(SparkEnv.scala:80)
 at org.apache.spark.SparkContext.stop(SparkContext.scala:1024)
 at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend$$anon$1.run(YarnClientSchedulerBackend.scala:131)
Caused by: akka.pattern.AskTimeoutException: Recipient[Actor[akka://sparkDriver/user/MapOutputTracker#132167931]]
had already been terminated.
 at akka.pattern.AskableActorRef$.ask$extension(AskSupport.scala:134)
 at org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:106)
 ... 5 more 
 


Mime
View raw message