spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <>
Subject RE: JdbcRDD usage
Date Wed, 06 Nov 2013 20:17:20 GMT
Dell - Internal Use - Confidential
Very helpful.


From: Josh Rosen []
Sent: Wednesday, November 06, 2013 11:44 AM
Subject: Re: JdbcRDD usage

JavaSparkContext is just a thin wrapper over SparkContext that exposes Java-friendly methods.
 You can access the underlying SparkContext instance by calling .sc() on your JavaSparkContext.

You may have to do a bit of extra work to instantiate JdbcRDD from Java, such as explicitly
passing a ClassManifest for your mapRow function.  The Java API internals guide describes
some of the steps involved in this:

After constructing the JdbcRDD[T], you can wrap it into a JavaRDD[T] by calling new JavaRDD(myJdbcRDD,

Ideally, we'd have a Java-friendly API for this, but in the meantime it's still possible to
use it from Java with a few of these extra steps.

On Wed, Nov 6, 2013 at 11:29 AM, <<>>

Dell - Internal Use - Confidential

Since I am working on java base code, to use JdbcRDD I need to first create SparkContext sc
then initialize JavaSparkConext(sc).

Any code that would allow me to create SparkConext from JavaSparkConext ?

Any sample java code that I can use to create scala Seq<String> from String[], cause
I need to create SparkConext passing my app jars as Seq<String>?


From: Reynold Xin [<>]
Sent: Wednesday, November 06, 2013 12:13 AM
Subject: Re: JdbcRDD usage

The RDD actually takes care of closing the jdbc connection at the end of the iterator. See
the code here:

The explicit close you saw in the JDBCSuite is to close the test program's own connection
for the insert statement (not for the JDBCRDD).

On Tue, Nov 5, 2013 at 3:13 PM, <<>>

I need to access JDBC from my java spark code, and thinking to use JdbcRDD as noted in

I have this questions:
When RDD decide to close the connection?

... getConnection
a function that returns an open Connection. The RDD takes care of closing the connection.

Any setting that I can tell spark to keep JdbcRDD connections open for next query, instead
of opening a new one for the same JDBC source?

Also per checking

I am seeing it's invoking explicit close for the connection in the after { }.   If RDD take
care of closing the connection then why we have to explicit invoke       DriverManager.getConnection("jdbc:derby:;shutdown=true")


View raw message