Chris,

I have a question about your setup. Does it allow the same usage of Cassandra/HBase data sources? Can I create a table that links to and be used by Spark SQL? The reason for asking is that I see the Cassandra connector package included in your script.

Thanks,
Ben

On Dec 25, 2015, at 6:41 AM, Chris Fregly <chris@fregly.com> wrote:

Configuring JDBC drivers with Spark is a bit tricky as the JDBC driver needs to be on the Java System Classpath per this troubleshooting section in the Spark SQL programming guide.

Here is an example hive-thrift-server start script from my Spark-based reference pipeline project.  Here is an example script that decorates the out-of-the-box spark-sql command to use the MySQL JDBC driver.

These scripts explicitly set --jars to $SPARK_SUBMIT_JARS which is defined here and here and includes the path to the local MySQL JDBC driver.  This approach is described here in the Spark docs that describe the advanced spark-submit options.  

Any jar specified with --jars will be passed to each worker node in the cluster - specifically in the work directory for each SparkContext for isolation purposes.

Cleanup of these jars on the worker nodes is handled by YARN automatically, and by Spark Standalone per the spark.worker.cleanup.appDataTtl config param.

The Spark SQL programming guide says to use SPARK_CLASSPATH for this purpose, but I couldn't get this to work for whatever reason, so i'm sticking to the --jars approach used in my examples.

On Tue, Dec 22, 2015 at 9:51 PM, Benjamin Kim <bbuild11@gmail.com> wrote:
Stephen,

Let me confirm. I just need to propagate these settings I put in spark-defaults.conf to all the worker nodes? Do I need to do the same with the PostgreSQL driver jar file too? If so, is there a way to have it read from HDFS rather than copying out to the cluster manually. 

Thanks for your help,
Ben


On Tuesday, December 22, 2015, Stephen Boesch <javadba@gmail.com> wrote:
HI Benjamin,  yes by adding to the thrift server then the create table would work.  But querying is performed by the workers: so you need to add to the classpath of all nodes for reads to work.

2015-12-22 18:35 GMT-08:00 Benjamin Kim <bbuild11@gmail.com>:
Hi Stephen,

I forgot to mention that I added these lines below to the spark-default.conf on the node with Spark SQL Thrift JDBC/ODBC Server running on it. Then, I restarted it.

spark.driver.extraClassPath=/usr/share/java/postgresql-9.3-1104.jdbc41.jar
spark.executor.extraClassPath=/usr/share/java/postgresql-9.3-1104.jdbc41.jar

I read in another thread that this would work. I was able to create the table and could see it in my SHOW TABLES list. But, when I try to query the table, I get the same error. It looks like I’m getting close.

Are there any other things that I have to do that you can think of?

Thanks,
Ben


On Dec 22, 2015, at 6:25 PM, Stephen Boesch <javadba@gmail.com> wrote:

The postgres jdbc driver needs to be added to the  classpath of your spark workers.  You can do a search for how to do that (multiple ways).

2015-12-22 17:22 GMT-08:00 b2k70 <bbuild11@gmail.com>:
I see in the Spark SQL documentation that a temporary table can be created
directly onto a remote PostgreSQL table.

CREATE TEMPORARY TABLE <table_name>
USING org.apache.spark.sql.jdbc
OPTIONS (
url "jdbc:postgresql://<PostgreSQL_Hostname_IP>/<database_name>",
dbtable "impressions"
);
When I run this against our PostgreSQL server, I get the following error.

Error: java.sql.SQLException: No suitable driver found for
jdbc:postgresql://<PostgreSQL_Hostname_IP>/<database_name> (state=,code=0)

Can someone help me understand why this is?

Thanks, Ben



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-1-5-2-missing-JDBC-driver-for-PostgreSQL-tp25773.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org







--

Chris Fregly
Principal Data Solutions Engineer
IBM Spark Technology Center, San Francisco, CA