spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Kim <bbuil...@gmail.com>
Subject Re: Spark SQL 1.5.2 missing JDBC driver for PostgreSQL?
Date Fri, 25 Dec 2015 20:54:14 GMT
Hi Chris,

I did what you did. It works for me now! Thanks for your help.

Have a Merry Christmas!

Cheers,
Ben


> On Dec 25, 2015, at 6:41 AM, Chris Fregly <chris@fregly.com> wrote:
> 
> Configuring JDBC drivers with Spark is a bit tricky as the JDBC driver needs to be on
the Java System Classpath per this <http://spark.apache.org/docs/latest/sql-programming-guide.html#troubleshooting>
troubleshooting section in the Spark SQL programming guide.
> 
> Here <https://github.com/fluxcapacitor/pipeline/blob/master/bin/start-hive-thriftserver.sh>
is an example hive-thrift-server start script from my Spark-based reference pipeline project.
 Here <https://github.com/fluxcapacitor/pipeline/blob/master/bin/pipeline-spark-sql.sh>
is an example script that decorates the out-of-the-box spark-sql command to use the MySQL
JDBC driver.
> 
> These scripts explicitly set --jars to $SPARK_SUBMIT_JARS which is defined here <https://github.com/fluxcapacitor/pipeline/blob/master/config/bash/.profile#L144>
and here <https://github.com/fluxcapacitor/pipeline/blob/master/config/bash/.profile#L87>
and includes the path to the local MySQL JDBC driver.  This approach is described here <http://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management>
in the Spark docs that describe the advanced spark-submit options.  
> 
> Any jar specified with --jars will be passed to each worker node in the cluster - specifically
in the work directory for each SparkContext for isolation purposes.
> 
> Cleanup of these jars on the worker nodes is handled by YARN automatically, and by Spark
Standalone per the spark.worker.cleanup.appDataTtl config param.
> 
> The Spark SQL programming guide says to use SPARK_CLASSPATH for this purpose, but I couldn't
get this to work for whatever reason, so i'm sticking to the --jars approach used in my examples.
> 
> On Tue, Dec 22, 2015 at 9:51 PM, Benjamin Kim <bbuild11@gmail.com <mailto:bbuild11@gmail.com>>
wrote:
> Stephen,
> 
> Let me confirm. I just need to propagate these settings I put in spark-defaults.conf
to all the worker nodes? Do I need to do the same with the PostgreSQL driver jar file too?
If so, is there a way to have it read from HDFS rather than copying out to the cluster manually.

> 
> Thanks for your help,
> Ben
> 
> 
> On Tuesday, December 22, 2015, Stephen Boesch <javadba@gmail.com <mailto:javadba@gmail.com>>
wrote:
> HI Benjamin,  yes by adding to the thrift server then the create table would work.  But
querying is performed by the workers: so you need to add to the classpath of all nodes for
reads to work.
> 
> 2015-12-22 18:35 GMT-08:00 Benjamin Kim <bbuild11@gmail.com <>>:
> Hi Stephen,
> 
> I forgot to mention that I added these lines below to the spark-default.conf on the node
with Spark SQL Thrift JDBC/ODBC Server running on it. Then, I restarted it.
> 
> spark.driver.extraClassPath=/usr/share/java/postgresql-9.3-1104.jdbc41.jar
> spark.executor.extraClassPath=/usr/share/java/postgresql-9.3-1104.jdbc41.jar
> 
> I read in another thread that this would work. I was able to create the table and could
see it in my SHOW TABLES list. But, when I try to query the table, I get the same error. It
looks like I’m getting close.
> 
> Are there any other things that I have to do that you can think of?
> 
> Thanks,
> Ben
> 
> 
>> On Dec 22, 2015, at 6:25 PM, Stephen Boesch <javadba@gmail.com <>> wrote:
>> 
>> The postgres jdbc driver needs to be added to the  classpath of your spark workers.
 You can do a search for how to do that (multiple ways).
>> 
>> 2015-12-22 17:22 GMT-08:00 b2k70 <bbuild11@gmail.com <>>:
>> I see in the Spark SQL documentation that a temporary table can be created
>> directly onto a remote PostgreSQL table.
>> 
>> CREATE TEMPORARY TABLE <table_name>
>> USING org.apache.spark.sql.jdbc
>> OPTIONS (
>> url "jdbc:postgresql://<PostgreSQL_Hostname_IP>/<database_name>",
>> dbtable "impressions"
>> );
>> When I run this against our PostgreSQL server, I get the following error.
>> 
>> Error: java.sql.SQLException: No suitable driver found for
>> jdbc:postgresql://<PostgreSQL_Hostname_IP>/<database_name> (state=,code=0)
>> 
>> Can someone help me understand why this is?
>> 
>> Thanks, Ben
>> 
>> 
>> 
>> --
>> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-1-5-2-missing-JDBC-driver-for-PostgreSQL-tp25773.html
<http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-1-5-2-missing-JDBC-driver-for-PostgreSQL-tp25773.html>
>> Sent from the Apache Spark User List mailing list archive at Nabble.com <http://nabble.com/>.
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org <>
>> For additional commands, e-mail: user-help@spark.apache.org <>
>> 
>> 
> 
> 
> 
> 
> 
> -- 
> 
> Chris Fregly
> Principal Data Solutions Engineer
> IBM Spark Technology Center, San Francisco, CA
> http://spark.tc <http://spark.tc/> | http://advancedspark.com <http://advancedspark.com/>

Mime
View raw message