From user-return-48636-apmail-spark-user-archive=spark.apache.org@spark.apache.org Sat Dec 26 21:13:14 2015 Return-Path: X-Original-To: apmail-spark-user-archive@minotaur.apache.org Delivered-To: apmail-spark-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2D7CB1889F for ; Sat, 26 Dec 2015 21:13:14 +0000 (UTC) Received: (qmail 9936 invoked by uid 500); 26 Dec 2015 21:13:10 -0000 Delivered-To: apmail-spark-user-archive@spark.apache.org Received: (qmail 9807 invoked by uid 500); 26 Dec 2015 21:13:09 -0000 Mailing-List: contact user-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@spark.apache.org Received: (qmail 9797 invoked by uid 99); 26 Dec 2015 21:13:09 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 26 Dec 2015 21:13:09 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 6DDD11A1102 for ; Sat, 26 Dec 2015 21:13:09 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 5.464 X-Spam-Level: ***** X-Spam-Status: No, score=5.464 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_REPLY=1, HTML_MESSAGE=3, URIBL_BLOCKED=0.001, URI_HEX=1.313] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id ITzUFy_OwSaF for ; Sat, 26 Dec 2015 21:12:55 +0000 (UTC) Received: from mail-pa0-f42.google.com (mail-pa0-f42.google.com [209.85.220.42]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id 84E3021277 for ; Sat, 26 Dec 2015 21:12:54 +0000 (UTC) Received: by mail-pa0-f42.google.com with SMTP id cy9so91491387pac.0 for ; Sat, 26 Dec 2015 13:12:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=content-type:mime-version:subject:from:in-reply-to:date:cc :message-id:references:to; bh=3kflkP8NOAWePfCVSP5MzzEOGQCxDvG1UwyghvB3ufE=; b=t4x76pZbhUo8DVl9qFtFj+psDKYE+uOgZrTft1Ktt75a6Af1sDsVmxkwi/yoz3rf5O s1I0Trxbjtfo2BHxdwx+CeQlWjpVevs//pI5sXUDUK24EidZDnjzIsQmj2TmyLJSB7qs /7Pq0EKYro9cOzcVyFA/hOgzjpMz7U8ns3smauTpDCsKAeLoDjVSh94ThgRQXJKz3x5G zJNUidj2t4jqlbEwMJJ/MupxcxJN6d22FsrDwZLUPTWioL3jGwz9oa/VKMVky1vt94QZ 1SI5fqQXkaLEHhq8xJx2Yvg1L4snD4ivcwQwRlFy1APWyL+nqOBgASrj/e38N7mlizjG SZEg== X-Received: by 10.66.150.228 with SMTP id ul4mr67647871pab.15.1451164367291; Sat, 26 Dec 2015 13:12:47 -0800 (PST) Received: from [10.0.0.8] ([162.211.149.104]) by smtp.gmail.com with ESMTPSA id o15sm40701625pfa.66.2015.12.26.13.12.46 (version=TLSv1/SSLv3 cipher=OTHER); Sat, 26 Dec 2015 13:12:46 -0800 (PST) Content-Type: multipart/alternative; boundary="Apple-Mail=_E6E0684F-835D-49C8-B23F-B2BCD3538C5F" Mime-Version: 1.0 (Mac OS X Mail 9.2 \(3112\)) Subject: Re: Spark SQL 1.5.2 missing JDBC driver for PostgreSQL? From: Benjamin Kim In-Reply-To: Date: Sat, 26 Dec 2015 13:12:45 -0800 Cc: Stephen Boesch , user Message-Id: <5B865805-C60B-42A8-ACBC-2154CEBB5BBB@gmail.com> References: <1450833734985-25773.post@n3.nabble.com> <2E3250D6-F53D-43EB-A274-23E8F5940215@gmail.com> To: Chris Fregly X-Mailer: Apple Mail (2.3112) --Apple-Mail=_E6E0684F-835D-49C8-B23F-B2BCD3538C5F Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Chris, I have a question about your setup. Does it allow the same usage of = Cassandra/HBase data sources? Can I create a table that links to and be = used by Spark SQL? The reason for asking is that I see the Cassandra = connector package included in your script. Thanks, Ben > On Dec 25, 2015, at 6:41 AM, Chris Fregly wrote: >=20 > Configuring JDBC drivers with Spark is a bit tricky as the JDBC driver = needs to be on the Java System Classpath per this = troubleshooting section in the Spark SQL programming guide. >=20 > Here = is an example hive-thrift-server start script from my = Spark-based reference pipeline project. Here = is an example script that decorates the out-of-the-box spark-sql = command to use the MySQL JDBC driver. >=20 > These scripts explicitly set --jars to $SPARK_SUBMIT_JARS which is = defined here = and here = and includes the path to the local MySQL JDBC driver. This = approach is described here = in the Spark docs that describe the advanced = spark-submit options. =20 >=20 > Any jar specified with --jars will be passed to each worker node in = the cluster - specifically in the work directory for each SparkContext = for isolation purposes. >=20 > Cleanup of these jars on the worker nodes is handled by YARN = automatically, and by Spark Standalone per the = spark.worker.cleanup.appDataTtl config param. >=20 > The Spark SQL programming guide says to use SPARK_CLASSPATH for this = purpose, but I couldn't get this to work for whatever reason, so i'm = sticking to the --jars approach used in my examples. >=20 > On Tue, Dec 22, 2015 at 9:51 PM, Benjamin Kim > wrote: > Stephen, >=20 > Let me confirm. I just need to propagate these settings I put in = spark-defaults.conf to all the worker nodes? Do I need to do the same = with the PostgreSQL driver jar file too? If so, is there a way to have = it read from HDFS rather than copying out to the cluster manually.=20 >=20 > Thanks for your help, > Ben >=20 >=20 > On Tuesday, December 22, 2015, Stephen Boesch > wrote: > HI Benjamin, yes by adding to the thrift server then the create table = would work. But querying is performed by the workers: so you need to = add to the classpath of all nodes for reads to work. >=20 > 2015-12-22 18:35 GMT-08:00 Benjamin Kim >: > Hi Stephen, >=20 > I forgot to mention that I added these lines below to the = spark-default.conf on the node with Spark SQL Thrift JDBC/ODBC Server = running on it. Then, I restarted it. >=20 > = spark.driver.extraClassPath=3D/usr/share/java/postgresql-9.3-1104.jdbc41.j= ar > = spark.executor.extraClassPath=3D/usr/share/java/postgresql-9.3-1104.jdbc41= .jar >=20 > I read in another thread that this would work. I was able to create = the table and could see it in my SHOW TABLES list. But, when I try to = query the table, I get the same error. It looks like I=E2=80=99m getting = close. >=20 > Are there any other things that I have to do that you can think of? >=20 > Thanks, > Ben >=20 >=20 >> On Dec 22, 2015, at 6:25 PM, Stephen Boesch > = wrote: >>=20 >> The postgres jdbc driver needs to be added to the classpath of your = spark workers. You can do a search for how to do that (multiple ways). >>=20 >> 2015-12-22 17:22 GMT-08:00 b2k70 >: >> I see in the Spark SQL documentation that a temporary table can be = created >> directly onto a remote PostgreSQL table. >>=20 >> CREATE TEMPORARY TABLE >> USING org.apache.spark.sql.jdbc >> OPTIONS ( >> url "jdbc:postgresql:///", >> dbtable "impressions" >> ); >> When I run this against our PostgreSQL server, I get the following = error. >>=20 >> Error: java.sql.SQLException: No suitable driver found for >> jdbc:postgresql:/// = (state=3D,code=3D0) >>=20 >> Can someone help me understand why this is? >>=20 >> Thanks, Ben >>=20 >>=20 >>=20 >> -- >> View this message in context: = http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-1-5-2-missin= g-JDBC-driver-for-PostgreSQL-tp25773.html = >> Sent from the Apache Spark User List mailing list archive at = Nabble.com . >>=20 >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org <> >> For additional commands, e-mail: user-help@spark.apache.org <> >>=20 >>=20 >=20 >=20 >=20 >=20 >=20 > --=20 >=20 > Chris Fregly > Principal Data Solutions Engineer > IBM Spark Technology Center, San Francisco, CA > http://spark.tc | http://advancedspark.com = --Apple-Mail=_E6E0684F-835D-49C8-B23F-B2BCD3538C5F Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 Chris,

I = have a question about your setup. Does it allow the same usage of = Cassandra/HBase data sources? Can I create a table that links to and be = used by Spark SQL? The reason for asking is that I see the Cassandra = connector package included in your script.

Thanks,
Ben

On Dec 25, 2015, at 6:41 AM, = Chris Fregly <chris@fregly.com> wrote:

Configuring JDBC drivers with Spark is a bit tricky as the = JDBC driver needs to be on the Java System Classpath per this troubleshooting section in the = Spark SQL programming guide.

Here is an example hive-thrift-server = start script from my Spark-based reference pipeline project.  Here is an example script that decorates = the out-of-the-box spark-sql command to use the MySQL JDBC = driver.

These = scripts explicitly set --jars to $SPARK_SUBMIT_JARS which is defined here and here and includes the path to the = local MySQL JDBC driver.  This approach is described here in the Spark = docs that describe the advanced spark-submit options.  

Any jar specified with = --jars will be passed to each worker node in the cluster - specifically = in the work directory for each SparkContext for isolation = purposes.

Cleanup of these jars on the worker nodes is handled by YARN = automatically, and by Spark Standalone per the = spark.worker.cleanup.appDataTtl config param.

The Spark SQL programming guide says to = use SPARK_CLASSPATH for this purpose, but I couldn't get this to work = for whatever reason, so i'm sticking to the --jars approach used in my = examples.

On Tue, Dec 22, 2015 at 9:51 PM, Benjamin Kim = <bbuild11@gmail.com> wrote:
Stephen,

Let me confirm. I just = need to propagate these settings I put in spark-defaults.conf to all the = worker nodes? Do I need to do the same with the PostgreSQL driver jar = file too? If so, is there a way to have it read from HDFS rather than = copying out to the cluster manually. 

Thanks for your help,
Ben


On Tuesday, December 22, 2015, Stephen Boesch <javadba@gmail.com> wrote:
HI Benjamin, =  yes by adding to the thrift server then the create table would = work.  But querying is performed by the workers: so you need to add = to the classpath of all nodes for reads to work.

2015-12-22= 18:35 GMT-08:00 Benjamin Kim <bbuild11@gmail.com>:
Hi = Stephen,

I forgot to = mention that I added these lines below to the spark-default.conf on the = node with Spark SQL Thrift JDBC/ODBC Server running on it. Then, I = restarted it.

spark.driver.extraClassPath=3D/usr/share/java/postgresql-9.3-11= 04.jdbc41.jar
spark.executor.extraClassPath=3D/usr/share/java/postgresql-9.3-= 1104.jdbc41.jar

I read in another thread that this would work. I was able to = create the table and could see it in my SHOW TABLES list. But, when I = try to query the table, I get the same error. It looks like I=E2=80=99m = getting close.

Are there any other things that I have to do that you can = think of?

Thanks,
Ben


On Dec 22, 2015, at 6:25 PM, Stephen Boesch <javadba@gmail.com> wrote:

The postgres jdbc driver needs to = be added to the  classpath of your spark workers.  You can do = a search for how to do that (multiple ways).

2015-12-22= 17:22 GMT-08:00 b2k70 <bbuild11@gmail.com>:
I see in the Spark SQL documentation that a = temporary table can be created
directly onto a remote PostgreSQL table.

CREATE TEMPORARY TABLE <table_name>
USING org.apache.spark.sql.jdbc
OPTIONS (
url = "jdbc:postgresql://<PostgreSQL_Hostname_IP>/<database_name>",<= br class=3D""> dbtable "impressions"
);
When I run this against our PostgreSQL server, I get the following = error.

Error: java.sql.SQLException: No suitable driver found for
jdbc:postgresql://<PostgreSQL_Hostname_IP>/<database_name> = (state=3D,code=3D0)

Can someone help me understand why this is?

Thanks, Ben



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-1= -5-2-missing-JDBC-driver-for-PostgreSQL-tp25773.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org







--

Chris = Fregly
Principal = Data Solutions Engineer
IBM Spark Technology Center, San Francisco, = CA

= --Apple-Mail=_E6E0684F-835D-49C8-B23F-B2BCD3538C5F--