spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Evert Lammerts <evert.lamme...@gmail.com>
Subject Re: Querying registered RDD (AsTable) using JDBC
Date Fri, 19 Dec 2014 17:09:58 GMT
Yes you can, using HiveContext, a metastore and the thriftserver. The
metastore persists information about your SchemaRDD, and the HiveContext,
initialised with information on the metastore, can interact with the
metastore. The thriftserver provides JDBC connections using the metastore.

Using MySQL as an example backend for the metastore:

1. Install MySQL
2. Create a database: CREATE database hive_metastore CHARSET latin1;
3. Create a metastore user: GRANT ALL ON hive_metastore.* TO metastore_user
IDENTIFIED BY 'password';
4. Create a hive-site.xml in your Spark's conf dir: see
http://pastebin.com/VXcmJWdX for an example
5. Download the mysql jdbc driver from
http://dev.mysql.com/downloads/connector/j/
6. Start the spark-shell with the mysql driver on the classpath: $
./bin/spark-shell --driver-class-path mysql-connector-java-5.1.34-bin.jar
7. Register the table using something like:
> val sqlct = new org.apache.spark.sql.hive.HiveContext(sc)
> sqlct.setConf("hive.metastore.warehouse.dir”,
"/some/path/to/store/tables") # if you're local. i.e. not using HDFS
> ... # create your schemardd using sqlct
> rdd.saveAsTable("mytable")
8. Start the thriftserver (which provides the JDBC
connection): $ ./sbin/start-thriftserver.sh --driver-class-path
mysql-connector-java-5.1.34-bin.jar --conf
hive.metastore.warehouse.dir=/some/path/to/store/tables

Something like that should do it. Now you can connect from for example
beeline:

$ ./bin/beeline
> !connect jdbc:hive2://localhost:10000
> show tables;

This is a good guide re the metastore regardless of your distribution:
http://www.cloudera.com/content/cloudera/en/documentation/cdh4/v4-2-0/CDH4-Installation-Guide/cdh4ig_topic_18_4.html
.



On Fri Dec 19 2014 at 5:34:49 PM shahab <shahab.mokari@gmail.com> wrote:

> Hi,
>
> Sorry for repeating the same question, just wanted to clarify the issue :
>
> Is it possible to expose a RDD (or SchemaRDD) to external components
> (outside spark) so it can  be queried over JDBC (my goal is not to place
> the RDD back in a database  but use this cached RDD to server JDBC queries)
> ?
>
> best,
>
> /shahab
>

Mime
View raw message