spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sun, Keith" <ai...@ebay.com>
Subject RE: A bug in spark or hadoop RPC with kerberos authentication?
Date Wed, 23 Aug 2017 11:01:10 GMT
Finally find the root cause and raise a bug issue in https://issues.apache.org/jira/browse/SPARK-21819



Thanks very much.
Keith

From: Sun, Keith
Sent: 2017年8月22日 8:48
To: user@spark.apache.org
Subject: A bug in spark or hadoop RPC with kerberos authentication?

Hello ,

I met this very weird issue, while easy to reproduce, and stuck me for more than 1 day .I
suspect this may be an issue/bug related to the class loader.
Can you help confirm the root cause ?

I want to specify a customized Hadoop configuration set instead of those on the class path(we
have a few hadoop clusters and all have Kerberos security and I want to support different
configuration).
Code/error like below.


The work around I found is to place a core-site.xml on the class path with below 2 properties
will work.
By checking  the rpc code under org.apache.hadoop.ipc.RPC, I suspect the RPC code may not
see the UGI class in the same classloader.
So UGI is initialized with default value on the classpth which is simple authentication.

core-site.xml with the security setup on the classpath:
<configuration>
    <property>
        <name>hadoop.security.authentication</name>
         <value>kerberos</value>
    </property>
    <property>
        <name>hadoop.security.authorization</name>
        <value>true</value>
    </property>

</configuration>

------------error------------------
2673 [main] DEBUG org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil 
- DataTransferProtocol using SaslPropertiesResolver, configured QOP dfs.data.transfer.protection
= privacy, configured class dfs.data.transfer.saslproperties.resolver.class = class org.apache.hadoop.security.WhitelistBasedResolver
2696 [main] DEBUG org.apache.hadoop.service.AbstractService  - Service: org.apache.hadoop.yarn.client.api.impl.YarnClientImpl
entered state INITED
2744 [main] DEBUG org.apache.hadoop.security.UserGroupInformation  - PrivilegedAction as:xxxxx@xxxxxxxCOM
(auth:KERBEROS) from:org.apache.hadoop.yarn.client.RMProxy.getProxy(RMProxy.java:136) //
2746 [main] DEBUG org.apache.hadoop.yarn.ipc.YarnRPC  - Creating YarnRPC for org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC
2746 [main] DEBUG org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC  - Creating a HadoopYarnProtoRpc
proxy for protocol interface org.apache.hadoop.yarn.api.ApplicationClientProtocol
2801 [main] DEBUG org.apache.hadoop.ipc.Client  - getting client out of cache: org.apache.hadoop.ipc.Client@748fe51d<mailto:org.apache.hadoop.ipc.Client@748fe51d>
2981 [main] DEBUG org.apache.hadoop.service.AbstractService  - Service org.apache.hadoop.yarn.client.api.impl.YarnClientImpl
is started
3004 [main] DEBUG org.apache.hadoop.ipc.Client  - The ping interval is 60000 ms.
3005 [main] DEBUG org.apache.hadoop.ipc.Client  - Connecting to yarn-rm-1/xxxxx:8032
3019 [IPC Client (2012095985) connection to yarn-rm-1/xxxxx:8032 from xxxxx@xxxxxx] DEBUG
org.apache.hadoop.ipc.Client  - IPC Client (2012095985) connection to yarn-rm-1/xxxxx:8032
from xxxxx@xxxxxx: starting, having connections 1
3020 [IPC Parameter Sending Thread #0] DEBUG org.apache.hadoop.ipc.Client  - IPC Client (2012095985)
connection to yarn-rm-1/xxxxx:8032 from xxxxx@xxxxxx sending #0
3025 [IPC Client (2012095985) connection to yarn-rm-1/xxxxx:8032 from xxxxx@xxxxxx] DEBUG
org.apache.hadoop.ipc.Client  - IPC Client (2012095985) connection to yarn-rm-1/xxxxx:8032
from xxxxx@xxxxxx got value #-1
3026 [IPC Client (2012095985) connection to yarn-rm-1/xxxxx:8032 from xxxxx@xxxxxx] DEBUG
org.apache.hadoop.ipc.Client  - closing ipc connection to yarn-rm-1/xxxxx:8032: SIMPLE authentication
is not enabled.  Available:[TOKEN, KERBEROS]
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException):
SIMPLE authentication is not enabled.  Available:[TOKEN, KERBEROS]
        at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1131)
        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:979)
---------------code-------------------

      Configuration hc = new  Configuration(false);

      hc.addResource("myconf /yarn-site.xml");
      hc.addResource("myconf/core-site.xml");
      hc.addResource("myconf/hdfs-site.xml");
      hc.addResource("myconf/hive-site.xml");

      SparkConf sc = new SparkConf(true);
      // add config in spark conf as no xml in the classpath except those “default.xml”
from Hadoop jars.
      hc.forEach(entry-> {
            if(entry.getKey().startsWith("hive")) {
                sc.set(entry.getKey(), entry.getValue());
            }else {
                sc.set("spark.hadoop."+entry.getKey(), entry.getValue());
            }
         });

       UserGroupInformation.setConfiguration(hc);
       UserGroupInformation.loginUserFromKeytab(Principal, Keytab);

      System.out.println("####spark-conf######");
      System.out.println(sc.toDebugString());


      SparkSession sparkSessesion= SparkSession
            .builder()
            .master("yarn-client") //"yarn-client", "local"
            .config(sc)
            .appName(SparkEAZDebug.class.getName())
            .enableHiveSupport()
            .getOrCreate();

Thanks very much.
Keith

Mime
View raw message