Configuring JDBC drivers with Spark is a bit tricky as the JDBC driver needs to be on the Java System Classpath per this troubleshooting section in the Spark SQL programming guide.

Here is an example hive-thrift-server start script from my Spark-based reference pipeline project.  Here is an example script that decorates the out-of-the-box spark-sql command to use the MySQL JDBC driver.

These scripts explicitly set --jars to $SPARK_SUBMIT_JARS which is defined here and here and includes the path to the local MySQL JDBC driver.  This approach is described here in the Spark docs that describe the advanced spark-submit options.  

Any jar specified with --jars will be passed to each worker node in the cluster - specifically in the work directory for each SparkContext for isolation purposes.

Cleanup of these jars on the worker nodes is handled by YARN automatically, and by Spark Standalone per the spark.worker.cleanup.appDataTtl config param.

The Spark SQL programming guide says to use SPARK_CLASSPATH for this purpose, but I couldn't get this to work for whatever reason, so i'm sticking to the --jars approach used in my examples.

On Tue, Dec 22, 2015 at 9:51 PM, Benjamin Kim <> wrote:

Let me confirm. I just need to propagate these settings I put in spark-defaults.conf to all the worker nodes? Do I need to do the same with the PostgreSQL driver jar file too? If so, is there a way to have it read from HDFS rather than copying out to the cluster manually. 

Thanks for your help,

On Tuesday, December 22, 2015, Stephen Boesch <> wrote:
HI Benjamin,  yes by adding to the thrift server then the create table would work.  But querying is performed by the workers: so you need to add to the classpath of all nodes for reads to work.

2015-12-22 18:35 GMT-08:00 Benjamin Kim <>:
Hi Stephen,

I forgot to mention that I added these lines below to the spark-default.conf on the node with Spark SQL Thrift JDBC/ODBC Server running on it. Then, I restarted it.


I read in another thread that this would work. I was able to create the table and could see it in my SHOW TABLES list. But, when I try to query the table, I get the same error. It looks like I’m getting close.

Are there any other things that I have to do that you can think of?


On Dec 22, 2015, at 6:25 PM, Stephen Boesch <> wrote:

The postgres jdbc driver needs to be added to the  classpath of your spark workers.  You can do a search for how to do that (multiple ways).

2015-12-22 17:22 GMT-08:00 b2k70 <>:
I see in the Spark SQL documentation that a temporary table can be created
directly onto a remote PostgreSQL table.

USING org.apache.spark.sql.jdbc
url "jdbc:postgresql://<PostgreSQL_Hostname_IP>/<database_name>",
dbtable "impressions"
When I run this against our PostgreSQL server, I get the following error.

Error: java.sql.SQLException: No suitable driver found for
jdbc:postgresql://<PostgreSQL_Hostname_IP>/<database_name> (state=,code=0)

Can someone help me understand why this is?

Thanks, Ben

View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:


Chris Fregly
Principal Data Solutions Engineer
IBM Spark Technology Center, San Francisco, CA