spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yana Kadiyska <yana.kadiy...@gmail.com>
Subject Re: DataFrame insertIntoJDBC parallelism while writing data into a DB table
Date Wed, 17 Jun 2015 01:20:45 GMT
When all else fails look at the source ;)

Looks like createJDBCTable is deprecated, but otherwise goes to the same
implementation as insertIntoJDBC...
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala

You can also look at DataFrameWriter in the same package...Looks like all
that code will eventually write via JDBCWriteDetails in
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/jdbc/jdbc.scala...if
I'm reading this correctly you'll have simultaneous writes from each
partition but they don't appear to be otherwise batched (if you were
thinking bulk inserts)

On Mon, Jun 15, 2015 at 1:20 PM, Mohammad Tariq <dontariq@gmail.com> wrote:

> Hello list,
>
> The method *insertIntoJDBC(url: String, table: String, overwrite:
> Boolean)* provided by Spark DataFrame allows us to copy a DataFrame into
> a JDBC DB table. Similar functionality is provided by the *createJDBCTable(url:
> String, table: String, allowExisting: Boolean) *method. But if you look
> at the docs it says that *createJDBCTable *runs a *bunch of Insert
> statements* in order to copy the data. While the docs of *insertIntoJDBC *doesn't
> have any such statement.
>
> Could someone please shed some light on this? How exactly data gets
> inserted using *insertIntoJDBC *method?
>
> And if it is same as *createJDBCTable *method, then what exactly does *bunch
> of Insert statements* mean? What's the criteria to decide the number
> *inserts/bunch*? How are these bunches generated?
>
> *An example* could be creating a DataFrame by reading all the files
> stored in a given directory. If I just do *DataFrame.save()*, it'll
> create the same number of output files as the input files. What'll happen
> in case of *DataFrame.df.insertIntoJDBC()*?
>
> I'm really sorry to be pest of questions, but I could net get much help by
> Googling about this.
>
> Thank you so much for your valuable time. really appreciate it.
>
> [image: http://]
> Tariq, Mohammad
> about.me/mti
> [image: http://]
> <http://about.me/mti>
>
>

Mime
View raw message