spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ajay Chander <>
Subject Spark_Jdbc_Hive
Date Mon, 03 Oct 2016 13:56:29 GMT
Hi Everyone,

First of all let me explain you what I am trying to do and I apologize for
writing a lengthy mail.

1) Pragmatically connect to remote secured(Kerberized) Hadoop cluster(CDH
5.7) from my local machine.

   - Once connected, I want to read the data from remote Hive table into
   Spark Dataframe.
   - Once the data is loaded into my local Dataframe, I would like to apply
   some transformations and do some tests.

I know that we can use spark-shell from edge node and do these things, but
I am trying to find out a way to do it from my IDE.

My Local Environment (Windows 7):
I am using IntelliJ IDE, Maven as build tool and Java .

Things that I have got working,

   - Since the cluster is secured using Kerberos, I had to use a keytab
   file to authenticate like below,


Configuration conf = new Configuration();

conf.set("", "Kerberos");




   - Now that I have authenticated myself to talk to cluster using keytab,
   initially I tried to make a pure JDBC call (Not Spark API) and see if I was
   able to read the data successfully? and Yes I was able to read the data
   successfully this way like below,

String driverName = "org.apache.hive.jdbc.HiveDriver";
String url = "jdbc:hive2://;principal=hive/_HOST@INTERNAL.DOMAIN.COM;saslQop=auth-conf";
Connection con = DriverManager.getConnection(url);
String query = "select * from test.test_data limit 10";
Statement stmt = con.createStatement();
System.out.println("Executing Query...");
ResultSet rs = stmt.executeQuery(query);
while ( {
    String emp_name = rs.getString("emp_name");
    System.out.println("Employee Name: "+emp_name);

Here is the Hive JDBC driver in my pom.xml


   - Now that I have made sure that the JDBC connection to secured
cluster is working fine, the next step is to try to use Spark API to
read the Hive table into Dataframe. I use Spark 1.6. I tried below,

// Trying to use jdbc connection to Hive through spark 1.6 and hive jdbc 1.1.0
String JDBC_DB_URL =
Map<String, String> options = new HashMap<String, String>();
options.put("driver", "org.apache.hive.jdbc.HiveDriver");
options.put("url", JDBC_DB_URL);
options.put("dbtable", "test.test_data");
DataFrame jdbcDF ="jdbc").options(options).load();

Now I came across the below error,

Exception in thread "main" java.sql.SQLException: Method not supported

	at org.apache.hive.jdbc.HiveResultSetMetaData.isSigned(

	at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:139)

	at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.<init>(JDBCRelation.scala:117)

	at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:53)

	at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:315)

	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:149)

	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:122)

	at Dev_Cluster_Test.main(

	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

	at sun.reflect.NativeMethodAccessorImpl.invoke(

	at sun.reflect.DelegatingMethodAccessorImpl.invoke(

	at java.lang.reflect.Method.invoke(

	at com.intellij.rt.execution.application.AppMain.main(

Then I did look at the Spark API code base below.

which is referring to hive-jdbc API code base below.

Thus the error.

Then I looked at Spark 2.0.0 API below.

Which results in same error "Method not supported".

Can anyone please shed some lights on this and tell me if I am missing
anything here. I appreciate your time. Thank you.



View raw message