spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luca Toscano <>
Subject Spark 2.4.4, RPC encryption and Python
Date Thu, 16 Jan 2020 08:16:37 GMT
Hi everybody,

I am currently testing Spark 2.4.4 with the following new settings:

spark.authenticate   true   true   256   HmacSHA256   true   PBKDF2WithHmacSHA256   256   false

I use dynamic allocation and the Spark shuffler is set correctly in
Yarn. I added the following two options to yarn-site.xml's config:



This works very well in all the scala-based code (spark2-shell,
spark-submit, etc..) but it doesn't for Pyspark, since I do see the
following warnings repeating over and over:

20/01/14 10:23:50 WARN YarnSchedulerBackend$YarnSchedulerEndpoint:
Attempted to request executors before the AM has registered!
20/01/14 10:23:50 WARN ExecutorAllocationManager: Unable to reach the
cluster manager to request 1 total executors!

The culprit seems to be the option "",
since without it everything works fine.

At first I thought that it was a Yarn resource allocation problem, but
then I checked and the cluster has plenty of space. After digging a
bit more into Yarn's container logs and I discovered that it seems a
problem related to the Application master not being able to contact
the Driver in time (assuming client mode of course):

20/01/14 09:45:21 INFO ApplicationMaster: ApplicationAttemptId:
20/01/14 09:45:21 INFO YarnRMClient: Registering the ApplicationMaster
20/01/14 09:45:52 ERROR TransportClientFactory: Exception while
bootstrapping client after 30120 ms
java.lang.RuntimeException: java.util.concurrent.TimeoutException:
Timeout waiting for task.
        at org.spark_project.guava.base.Throwables.propagate(
        at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:198)
        at org.apache.spark.rpc.netty.Outbox$$anon$
        at org.apache.spark.rpc.netty.Outbox$$anon$
        at java.util.concurrent.ThreadPoolExecutor.runWorker(
        at java.util.concurrent.ThreadPoolExecutor$
Caused by: java.util.concurrent.TimeoutException: Timeout waiting for task.
        at org.spark_project.guava.util.concurrent.AbstractFuture$Sync.get(
        at org.spark_project.guava.util.concurrent.AbstractFuture.get(
        ... 11 more

The strange part is that sometimes the timeout doesn't occur, and
sometimes it does. I checked the code related to the above stacktrace
and ended up to:

The "" option seems to help, even if it
is not advertised in the docs as far as I can see (the 30s mentioned
in the exception are definitely trigger by this setting though). What
I am wondering is where/what I should check to debug this further,
since it seems a Python only problem that doesn't affect Scala. I
didn't find any outstanding bugs, so given the fact that 2.4.4 is very
recent I thought to report it in here to ask for an advice :)

Thanks in advance!


To unsubscribe e-mail:

View raw message