From user-return-48598-apmail-spark-user-archive=spark.apache.org@spark.apache.org Fri Dec 25 14:42:01 2015 Return-Path: X-Original-To: apmail-spark-user-archive@minotaur.apache.org Delivered-To: apmail-spark-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5ED9E18295 for ; Fri, 25 Dec 2015 14:42:01 +0000 (UTC) Received: (qmail 47328 invoked by uid 500); 25 Dec 2015 14:41:57 -0000 Delivered-To: apmail-spark-user-archive@spark.apache.org Received: (qmail 47198 invoked by uid 500); 25 Dec 2015 14:41:57 -0000 Mailing-List: contact user-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@spark.apache.org Received: (qmail 47188 invoked by uid 99); 25 Dec 2015 14:41:57 -0000 Received: from Unknown (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 25 Dec 2015 14:41:57 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 24BE5C65FA for ; Fri, 25 Dec 2015 14:41:57 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 4.314 X-Spam-Level: **** X-Spam-Status: No, score=4.314 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=3, URIBL_BLOCKED=0.001, URI_HEX=1.313] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=fregly-com.20150623.gappssmtp.com Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id IkSBgIGw_dBq for ; Fri, 25 Dec 2015 14:41:45 +0000 (UTC) Received: from mail-yk0-f182.google.com (mail-yk0-f182.google.com [209.85.160.182]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id 3B89920599 for ; Fri, 25 Dec 2015 14:41:44 +0000 (UTC) Received: by mail-yk0-f182.google.com with SMTP id k129so36043222yke.0 for ; Fri, 25 Dec 2015 06:41:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fregly-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=19PGpIx5t0AgHAb/UNG6GDRno7YmTzdyv/7kZetcZAg=; b=oRmSQBtq7O7gG2tJ0ECRFyzzNrgnxu7yTVPApM+4/wj6B1VUdE+NDUnz1Lt/NEPOk9 C4fmISorLw8FrbtWAMwqdY4n7cr59efiZwbedrEt2aYqKHnGQ4/usAoCI/k8oLJCt6C0 pvtluq6OA3XmSnKVUU5mVMhrxcuXaJ57RfyZ1s6FaKQ7KgxB5e3pRVwSuRyEo6Vd+yw7 bQzeFXMS7KZcNKB8XJ4Idt6DIV4LPurkTq8dcol4jsDYrHt0H24kYuGz7OUELNCBZgdg xnX6A9X22RDSQnO+QWPQ3KvLyapIrBUVFX8sLFWrDUcE0Ua3psr+eYANM8zbHmnSLvns bStw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-type; bh=19PGpIx5t0AgHAb/UNG6GDRno7YmTzdyv/7kZetcZAg=; b=BXkgXqt+WnKbNv+n6RwGx9CD+CDKJKYBi5oPURxymUqvEPgyWzQAjDr0nLyEafvx9W lfghlC4V4JezK3nnitKDPkw2YlXcES1w3q8cVFKQFStZMU+BeGaoVoZwplhwh+Cc3jaz WXu1HWUNwRgQYyNy84PFuj4fVwjrC8X6PerFWRvGz8ecefAwH+v/CXykeW5TvqCr/iVV yuLKDCWXXoJOy+piYfWCtbfgRgCCEK9CQYQyZdNbd5qPS/RP2zE7QhIcq2c/nX8H5o1k WYj0kPqaAL+TP5O90vzj8dKkjLnMXoYGOd4jQVwMaCpCho6ORi1M8CmdJG8z+9t4VGwY RjhA== X-Gm-Message-State: ALoCoQk1W+8pYlzILT8dMZ0IMNsusVbDyBOwiImTIVy9KmW3b0yIP6tZek0c8G63L/JhArY6yQbCl1N4H+yqm5+1W/zmixAdKA== X-Received: by 10.129.125.87 with SMTP id y84mr24008783ywc.261.1451054496580; Fri, 25 Dec 2015 06:41:36 -0800 (PST) MIME-Version: 1.0 Received: by 10.129.84.194 with HTTP; Fri, 25 Dec 2015 06:41:06 -0800 (PST) In-Reply-To: References: <1450833734985-25773.post@n3.nabble.com> <2E3250D6-F53D-43EB-A274-23E8F5940215@gmail.com> From: Chris Fregly Date: Fri, 25 Dec 2015 09:41:06 -0500 Message-ID: Subject: Re: Spark SQL 1.5.2 missing JDBC driver for PostgreSQL? To: Benjamin Kim Cc: Stephen Boesch , user Content-Type: multipart/alternative; boundary=001a11493144d1c3030527b9f3dd --001a11493144d1c3030527b9f3dd Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Configuring JDBC drivers with Spark is a bit tricky as the JDBC driver needs to be on the Java System Classpath per this troubleshooting section in the Spark SQL programming guide. Here is an example hive-thrift-server start script from my Spark-based reference pipeline project. Here is an example script that decorates the out-of-the-box spark-sql command to use the MySQL JDBC driver. These scripts explicitly set --jars to $SPARK_SUBMIT_JARS which is defined here and here and includes the path to the local MySQL JDBC driver. This approach is described here in the Spark docs that describe the advanced spark-submit options. Any jar specified with --jars will be passed to each worker node in the cluster - specifically in the work directory for each SparkContext for isolation purposes. Cleanup of these jars on the worker nodes is handled by YARN automatically, and by Spark Standalone per the spark.worker.cleanup.appDataTtl config param. The Spark SQL programming guide says to use SPARK_CLASSPATH for this purpose, but I couldn't get this to work for whatever reason, so i'm sticking to the --jars approach used in my examples. On Tue, Dec 22, 2015 at 9:51 PM, Benjamin Kim wrote: > Stephen, > > Let me confirm. I just need to propagate these settings I put in > spark-defaults.conf to all the worker nodes? Do I need to do the same wit= h > the PostgreSQL driver jar file too? If so, is there a way to have it read > from HDFS rather than copying out to the cluster manually. > > Thanks for your help, > Ben > > > On Tuesday, December 22, 2015, Stephen Boesch wrote: > >> HI Benjamin, yes by adding to the thrift server then the create table >> would work. But querying is performed by the workers: so you need to ad= d >> to the classpath of all nodes for reads to work. >> >> 2015-12-22 18:35 GMT-08:00 Benjamin Kim : >> >>> Hi Stephen, >>> >>> I forgot to mention that I added these lines below to the >>> spark-default.conf on the node with Spark SQL Thrift JDBC/ODBC Server >>> running on it. Then, I restarted it. >>> >>> >>> spark.driver.extraClassPath=3D/usr/share/java/postgresql-9.3-1104.jdbc4= 1.jar >>> >>> spark.executor.extraClassPath=3D/usr/share/java/postgresql-9.3-1104.jdb= c41.jar >>> >>> I read in another thread that this would work. I was able to create the >>> table and could see it in my SHOW TABLES list. But, when I try to query= the >>> table, I get the same error. It looks like I=E2=80=99m getting close. >>> >>> Are there any other things that I have to do that you can think of? >>> >>> Thanks, >>> Ben >>> >>> >>> On Dec 22, 2015, at 6:25 PM, Stephen Boesch wrote: >>> >>> The postgres jdbc driver needs to be added to the classpath of your >>> spark workers. You can do a search for how to do that (multiple ways). >>> >>> 2015-12-22 17:22 GMT-08:00 b2k70 : >>> >>>> I see in the Spark SQL documentation that a temporary table can be >>>> created >>>> directly onto a remote PostgreSQL table. >>>> >>>> CREATE TEMPORARY TABLE >>>> USING org.apache.spark.sql.jdbc >>>> OPTIONS ( >>>> url "jdbc:postgresql:///", >>>> dbtable "impressions" >>>> ); >>>> When I run this against our PostgreSQL server, I get the following >>>> error. >>>> >>>> Error: java.sql.SQLException: No suitable driver found for >>>> jdbc:postgresql:/// >>>> (state=3D,code=3D0) >>>> >>>> Can someone help me understand why this is? >>>> >>>> Thanks, Ben >>>> >>>> >>>> >>>> -- >>>> View this message in context: >>>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-1-5-2-mi= ssing-JDBC-driver-for-PostgreSQL-tp25773.html >>>> Sent from the Apache Spark User List mailing list archive at Nabble.co= m >>>> . >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org >>>> For additional commands, e-mail: user-help@spark.apache.org >>>> >>>> >>> >>> >> --=20 *Chris Fregly* Principal Data Solutions Engineer IBM Spark Technology Center, San Francisco, CA http://spark.tc | http://advancedspark.com --001a11493144d1c3030527b9f3dd Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Configuring JDBC drivers with Spark is a bit tricky as the= JDBC driver needs to be on the Java System Classpath per t= his=C2=A0troubleshooting section in the Spark SQL programming guide.
Here is an example hive-thrift-= server start script from my Spark-based reference pipeline project. =C2=A0<= a href=3D"https://github.com/fluxcapacitor/pipeline/blob/master/bin/pipelin= e-spark-sql.sh">Here is an example script that decorates the out-of-the= -box spark-sql command to use the MySQL JDBC driver.

These scripts explicitly set --jars to $SPARK_SUBMIT_JARS which is defin= ed here=C2=A0and here=C2=A0and inc= ludes the path to the local MySQL JDBC driver.=C2=A0 This approach is descr= ibed here=C2=A0in the Spark docs that = describe the advanced spark-submit options. =C2=A0

Any jar specified with --jars will be passed to each worker node in the cl= uster - specifically in the work directory for each SparkContext for isolat= ion purposes.

Cleanup of these jars on the worker = nodes is handled by YARN automatically, and by Spark Standalone per the spa= rk.worker.cleanup.appDataTtl config param.

The Spa= rk SQL programming guide says to use SPARK_CLASSPATH for this purpose, but = I couldn't get this to work for whatever reason, so i'm sticking to= the --jars approach used in my examples.

On Tue, Dec 22, 2015 at 9:51 PM, Benjam= in Kim <bbuild11@gmail.com> wrote:
Stephen,

Let me confirm. I just need to pr= opagate these settings I put in spark-defaults.conf to all the worker nodes= ? Do I need to do the same with the PostgreSQL driver jar file too? If so, = is there a way to have it read from HDFS rather than copying out to the clu= ster manually.=C2=A0

Thanks for your help,
Ben


On Tuesday, December 22, 2015, Stephen= Boesch <javadba@= gmail.com> wrote:
HI Benjamin, =C2=A0yes by adding to the thrift server then the create tabl= e would work.=C2=A0 But querying is performed by the workers: so you need t= o add to the classpath of all nodes for reads to work.

2015-12-22 18:35 GMT-08:00 Benja= min Kim <bbuild11@gmail.com>:
Hi Stephen,<= div>
I forgot to mention that I added these lines below to th= e spark-default.conf on the node with Spark SQL Thrift JDBC/ODBC Server run= ning on it. Then, I restarted it.

spark.drive= r.extraClassPath=3D/usr/share/java/postgresql-9.3-1104.jdbc41.jar
spark.executor.extraClassPath=3D/usr/share/java/postgresql-9.3-1104.jdbc41= .jar

I read in another thread that this woul= d work. I was able to create the table and could see it in my SHOW TABLES l= ist. But, when I try to query the table, I get the same error. It looks lik= e I=E2=80=99m getting close.

Are there any other t= hings that I have to do that you can think of?

Tha= nks,
Ben


On Dec 22, 2015, at 6:25 PM, Stephen Boesch <javad= ba@gmail.com> wrote:

The postgres jdb= c driver needs to be added to the =C2=A0classpath of your spark workers.=C2= =A0 You can do a search for how to do that (multiple ways).

2015-12-22 17:22 GMT-08:00 = b2k70 <bbuild11@gmail.com>:
I see in the Spark SQL documentation that a tempor= ary table can be created
directly onto a remote PostgreSQL table.

CREATE TEMPORARY TABLE <table_name>
USING org.apache.spark.sql.jdbc
OPTIONS (
url "jdbc:postgresql://<PostgreSQL_Hostname_IP>/<database_nam= e>",
dbtable "impressions"
);
When I run this against our PostgreSQL server, I get the following error.
Error: java.sql.SQLException: No suitable driver found for
jdbc:postgresql://<PostgreSQL_Hostname_IP>/<database_name> (sta= te=3D,code=3D0)

Can someone help me understand why this is?

Thanks, Ben



--
View this message in context: http://apache-spark-user-list.= 1001560.n3.nabble.com/Spark-SQL-1-5-2-missing-JDBC-driver-for-PostgreSQL-tp= 25773.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org






--

Chris Fregly
= Principal Data Solutions Engineer
IBM Spark Technology Center, San Francisco, CA
<= a href=3D"http://spark.tc/" style=3D"color:rgb(17,85,204)" target=3D"_blank= ">http://spark.tc=C2=A0|=C2=A0http://advan= cedspark.com
--001a11493144d1c3030527b9f3dd--