spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <Hussam_Jar...@Dell.com>
Subject RE: JdbcRDD usage
Date Wed, 06 Nov 2013 20:17:20 GMT
Dell - Internal Use - Confidential
Very helpful.

Thanks,
Hussam

From: Josh Rosen [mailto:rosenville@gmail.com]
Sent: Wednesday, November 06, 2013 11:44 AM
To: user@spark.incubator.apache.org
Subject: Re: JdbcRDD usage

JavaSparkContext is just a thin wrapper over SparkContext that exposes Java-friendly methods.
 You can access the underlying SparkContext instance by calling .sc() on your JavaSparkContext.

You may have to do a bit of extra work to instantiate JdbcRDD from Java, such as explicitly
passing a ClassManifest for your mapRow function.  The Java API internals guide describes
some of the steps involved in this:  https://cwiki.apache.org/confluence/display/SPARK/Java+API+Internals

After constructing the JdbcRDD[T], you can wrap it into a JavaRDD[T] by calling new JavaRDD(myJdbcRDD,
itsClassManifest).

Ideally, we'd have a Java-friendly API for this, but in the meantime it's still possible to
use it from Java with a few of these extra steps.

On Wed, Nov 6, 2013 at 11:29 AM, <Hussam_Jarada@dell.com<mailto:Hussam_Jarada@dell.com>>
wrote:

Dell - Internal Use - Confidential
Cool.

Since I am working on java base code, to use JdbcRDD I need to first create SparkContext sc
then initialize JavaSparkConext(sc).

Any code that would allow me to create SparkConext from JavaSparkConext ?

Any sample java code that I can use to create scala Seq<String> from String[], cause
I need to create SparkConext passing my app jars as Seq<String>?

Thanks,
Hussam

From: Reynold Xin [mailto:rxin@apache.org<mailto:rxin@apache.org>]
Sent: Wednesday, November 06, 2013 12:13 AM
To: user@spark.incubator.apache.org<mailto:user@spark.incubator.apache.org>
Subject: Re: JdbcRDD usage

The RDD actually takes care of closing the jdbc connection at the end of the iterator. See
the code here: https://github.com/apache/incubator-spark/blob/master/core/src/main/scala/org/apache/spark/rdd/JdbcRDD.scala#L107

The explicit close you saw in the JDBCSuite is to close the test program's own connection
for the insert statement (not for the JDBCRDD).

On Tue, Nov 5, 2013 at 3:13 PM, <Hussam_Jarada@dell.com<mailto:Hussam_Jarada@dell.com>>
wrote:
Hi,

I need to access JDBC from my java spark code, and thinking to use JdbcRDD as noted in http://spark.incubator.apache.org/docs/0.8.0/api/core/org/apache/spark/rdd/JdbcRDD.html

I have this questions:
When RDD decide to close the connection?

... getConnection
a function that returns an open Connection. The RDD takes care of closing the connection.

Any setting that I can tell spark to keep JdbcRDD connections open for next query, instead
of opening a new one for the same JDBC source?

Also per checking
https://github.com/apache/incubator-spark/blob/branch-0.8/core/src/test/scala/org/apache/spark/rdd/JdbcRDDSuite.scala


I am seeing it's invoking explicit close for the connection in the after { }.   If RDD take
care of closing the connection then why we have to explicit invoke       DriverManager.getConnection("jdbc:derby:;shutdown=true")

Thanks,
Hussam






Mime
View raw message