spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohammed Guller <moham...@glassbeam.com>
Subject RE: using a database connection pool to write data into an RDBMS from a Spark application
Date Fri, 20 Feb 2015 18:04:20 GMT
It looks like spark.files.userClassPathFirst gives precedence to user libraries only on the
worker nodes. Is there something similar to achieve the same behavior on the master? 

BTW, I am running Spark in stand-alone mode.

Mohammed


-----Original Message-----
From: Sean Owen [mailto:sowen@cloudera.com] 
Sent: Friday, February 20, 2015 9:42 AM
To: Mohammed Guller
Cc: Kelvin Chu; user@spark.apache.org
Subject: Re: using a database connection pool to write data into an RDBMS from a Spark application

Have a look at spark.yarn.user.classpath.first and spark.files.userClassPathFirst for a possible
way to give your copy of the libs precedence.

On Fri, Feb 20, 2015 at 5:20 PM, Mohammed Guller <mohammed@glassbeam.com> wrote:
> Sean,
> I know that Class.forName is not required since Java 1.4 :-) It was just a desperate
attempt  to make sure that the Postgres driver is getting loaded. Since Class.forName("org.postgresql.Driver")
is not throwing an exception, I assume that the driver is available in the classpath. Is that
not true?
>
> I did some more troubleshooting and here is what I found:
> 1) The hive libraries used by Spark use BoneCP 0.7.1
> 2) When Spark master is started, it initializes BoneCP, which will not 
> load any database driver at that point (that makes sense)
> 3) When my application initializes BoneCP, it thinks it is already initialized and does
not load the Postgres driver ( this is a known bug in 0.7.1). This bug is fixed in BoneCP
0.8.0 release.
>
> So I linked my app with BoneCP 0.8.0 release, but when I run my app using spark-submit,
Spark continues to use BoneCP 0.7.1. How do I override that behavior? How do I make spark-submit
script unload BoneCP 0.7.1 and load BoneCP 0.8.0? I tried the --jars and --driver-classpath
flags, but it didn't help.
>
> Thanks,
> Mohammed
>
>
> -----Original Message-----
> From: Sean Owen [mailto:sowen@cloudera.com]
> Sent: Friday, February 20, 2015 2:06 AM
> To: Mohammed Guller
> Cc: Kelvin Chu; user@spark.apache.org
> Subject: Re: using a database connection pool to write data into an 
> RDBMS from a Spark application
>
> Although I don't know if it's related, the Class.forName() method of loading drivers
is very old. You should be using DataSource and javax.sql; this has been the usual practice
since about Java 1.4.
>
> Why do you say a different driver is being loaded? that's not the error here.
>
> Try instantiating the driver directly to test whether it's available in the classpath.
Otherwise you would have to check whether the jar exists, the class exists in it, and it's
really on your classpath.
>
> On Fri, Feb 20, 2015 at 5:27 AM, Mohammed Guller <mohammed@glassbeam.com> wrote:
>> Hi Kelvin,
>>
>>
>>
>> Yes. I am creating an uber jar with the Postgres driver included, but 
>> nevertheless tried both –jars and –driver-classpath flags. It didn’t help.
>>
>>
>>
>> Interestingly, I can’t use BoneCP even in the driver program when I 
>> run my application with spark-submit. I am getting the same exception 
>> when the application initializes BoneCP before creating SparkContext.
>> It looks like Spark is loading a different version of the Postgres 
>> JDBC driver than the one that I am linking.
>>
>>
>>
>> Mohammed
>>
>>
>>
>> From: Kelvin Chu [mailto:2dot7kelvin@gmail.com]
>> Sent: Thursday, February 19, 2015 7:56 PM
>> To: Mohammed Guller
>> Cc: user@spark.apache.org
>> Subject: Re: using a database connection pool to write data into an 
>> RDBMS from a Spark application
>>
>>
>>
>> Hi Mohammed,
>>
>>
>>
>> Did you use --jars to specify your jdbc driver when you submitted your job?
>> Take a look of this link:
>> http://spark.apache.org/docs/1.2.0/submitting-applications.html
>>
>>
>>
>> Hope this help!
>>
>>
>>
>> Kelvin
>>
>>
>>
>> On Thu, Feb 19, 2015 at 7:24 PM, Mohammed Guller 
>> <mohammed@glassbeam.com>
>> wrote:
>>
>> Hi –
>>
>> I am trying to use BoneCP (a database connection pooling library) to 
>> write data from my Spark application to an RDBMS. The database 
>> inserts are inside a foreachPartition code block. I am getting this 
>> exception when the code tries to insert data using BoneCP:
>>
>>
>>
>> java.sql.SQLException: No suitable driver found for 
>> jdbc:postgresql://hostname:5432/dbname
>>
>>
>>
>> I tried explicitly loading the Postgres driver on the worker nodes by 
>> adding the following line inside the foreachPartition code block:
>>
>>
>>
>> Class.forName("org.postgresql.Driver")
>>
>>
>>
>> It didn’t help.
>>
>>
>>
>> Has anybody able to get a database connection pool library to work 
>> with Spark? If you got it working, can you please share the steps?
>>
>>
>>
>> Thanks,
>>
>> Mohammed
>>
>>
>>
>>
Mime
View raw message