spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcelo Vanzin <van...@cloudera.com.INVALID>
Subject Re: RPC timeout error for AES based encryption between driver and executor
Date Tue, 26 Mar 2019 15:40:15 GMT
I don't think "spark.authenticate" works properly with k8s in 2.4
(which would make it impossible to enable encryption since it requires
authentication). I'm pretty sure I fixed it in master, though.

On Tue, Mar 26, 2019 at 2:29 AM Sinha, Breeta (Nokia - IN/Bangalore)
<breeta.sinha@nokia.com> wrote:
>
> Hi All,
>
>
>
> We are trying to enable RPC encryption between driver and executor. Currently we're working
on Spark 2.4 on Kubernetes.
>
>
>
> According to Apache Spark Security document (https://spark.apache.org/docs/latest/security.html)
and our understanding on the same, it is clear that Spark supports AES-based encryption for
RPC connections. There is also support for SASL-based encryption, although it should be considered
deprecated.
>
>
>
> spark.network.crypto.enabled true , will enable AES-based RPC encryption.
>
> However, when we enable AES based encryption between driver and executor, we could observe
a very sporadic behaviour in communication between driver and executor in the logs.
>
>
>
> Follwing are the options and their default values, we used for enabling encryption:-
>
>
>
> spark.authenticate true
>
> spark.authenticate.secret <some-value>
>
> spark.network.crypto.enabled true
>
> spark.network.crypto.keyLength 256
>
> spark.network.crypto.saslFallback false
>
>
>
> A snippet of the executor log is provided below:-
>
> Exception in thread "main" 19/02/26 07:27:08 ERROR RpcOutboxMessage: Ask timeout before
connecting successfully
>
> Caused by: java.util.concurrent.TimeoutException: Cannot receive any reply from sts-spark-thrift-server-1551165767426-driver-svc.default.svc:7078
in 120 seconds
>
>
>
> But, there is no error message or any message from executor seen in the driver log for
the same timestamp.
>
>
>
> We also tried increasing spark.network.timeout, but no luck.
>
>
>
> This issue is seen sporadically, as the following observations were noted:-
>
> 1) Sometimes, enabling AES encryption works completely fine.
>
> 2) Sometimes, enabling AES encryption works fine for around 10 consecutive spark-submits
but next trigger of spark-submit would go into hang state with the above mentioned error in
the executor log.
>
> 3) Also, there are times, when enabling AES encryption would not work at all, as it would
keep on spawnning more than 50 executors where the executors fail with the above mentioned
error.
>
> Even, setting spark.network.crypto.saslFallback to true didn't help.
>
>
>
> Things are working fine when we enable SASL encryption, that is, only setting the following
parameters:-
>
> spark.authenticate true
>
> spark.authenticate.secret <some-value>
>
>
>
> I have attached the log file containing detailed error message. Please let us know if
any configuration is missing or if any one has faced the same issue.
>
>
>
> Any leads would be highly appreciated!!
>
>
>
> Kind Regards,
>
> Breeta Sinha
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org



-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message