spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Aspegren <david.aspeg...@gmail.com>
Subject Spark 2.4.3 on Kubernetes Client mode fails
Date Sun, 26 May 2019 11:41:50 GMT
Hi,

I can run successfully in cluster mode but when trying in client mode the
job won't run.

I have a driver pod created from jupyter/pyspark-notebook that I would
assume is sufficient.

I execute:

bin/spark-submit  \
    --master k8s://https://192.168.99.100:8443  \
    --deploy-mode client  \
    --conf spark.executor.instances=1  \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark  \
    --conf spark.kubernetes.container.image=spark:spark-docker  \
--conf spark.driver.host=jupyter-pyspark.default.svc.cluster.local \
--conf spark.driver.port=9888 \
    --class org.apache.spark.examples.SparkPi  \
    --name spark-pi  \
    local:///opt/spark/examples/jars/spark-examples_2.11-2.4.3.jar 10

I am then looking at the created executor pod.
All seems well, it can connect back to the driver pod on 9888 but then:

19/05/26 11:35:38 ERROR RetryingBlockFetcher: Exception while beginning
fetch of 1 outstanding blocks
java.io.IOException: Connecting to
jupyter-pyspark.default.svc.cluster.local/10.111.230.250:37615 timed out
(120000 ms)
        at
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:243)
        at
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:187)
        at
org.apache.spark.network.netty.NettyBlockTransferService$$anon$2.createAndStart(NettyBlockTransferService.scala:114)
        at
org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:141)
        at
org.apache.spark.network.shuffle.RetryingBlockFetcher.start(RetryingBlockFetcher.java:121)
        at
org.apache.spark.network.netty.NettyBlockTransferService.fetchBlocks(NettyBlockTransferService.scala:124)
        at
org.apache.spark.network.BlockTransferService.fetchBlockSync(BlockTransferService.scala:98)
        at
org.apache.spark.storage.BlockManager.getRemoteBytes(BlockManager.scala:757)
        at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply$mcVI$sp(TorrentBroadcast.scala:162)
        at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:151)
        at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:151)
        at scala.collection.immutable.List.foreach(List.scala:392)
        at org.apache.spark.broadcast.TorrentBroadcast.org
$apache$spark$broadcast$TorrentBroadcast$$readBlocks(TorrentBroadcast.scala:151)
        at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$apply$2.apply(TorrentBroadcast.scala:231)
        at scala.Option.getOrElse(Option.scala:121)
        at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:211)
        at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1326)
        at
org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:207)
        at
org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66)
        at
org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66)
        at
org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96)
        at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
        at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:84)
        at org.apache.spark.scheduler.Task.run(Task.scala:121)
        at
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
        at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)



What does this mean?

Thanks in advance
David

Mime
View raw message