spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vida Ha <vid...@gmail.com>
Subject Re: Save an RDD to a SQL Database
Date Thu, 07 Aug 2014 18:56:24 GMT
That's a good idea - to write to files first and then load.   Thanks.


On Thu, Aug 7, 2014 at 11:26 AM, Flavio Pompermaier <pompermaier@okkam.it>
wrote:

> Isn't sqoop export meant for that?
>
>
> http://hadooped.blogspot.it/2013/06/apache-sqoop-part-3-data-transfer.html?m=1
> On Aug 7, 2014 7:59 PM, "Nicholas Chammas" <nicholas.chammas@gmail.com>
> wrote:
>
>> Vida,
>>
>> What kind of database are you trying to write to?
>>
>> For example, I found that for loading into Redshift, by far the easiest
>> thing to do was to save my output from Spark as a CSV to S3, and then load
>> it from there into Redshift. This is not a slow as you think, because Spark
>> can write the output in parallel to S3, and Redshift, too, can load data
>> from multiple files in parallel
>> <http://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-single-copy-command.html>
>> .
>>
>> Nick
>>
>>
>> On Thu, Aug 7, 2014 at 1:52 PM, Vida Ha <vida@databricks.com> wrote:
>>
>>> The use case I was thinking of was outputting calculations made in Spark
>>> into a SQL database for the presentation layer to access.  So in other
>>> words, having a Spark backend in Java that writes to a SQL database and
>>> then having a Rails front-end that can display the data nicely.
>>>
>>>
>>> On Thu, Aug 7, 2014 at 8:42 AM, Nicholas Chammas <
>>> nicholas.chammas@gmail.com> wrote:
>>>
>>>> On Thu, Aug 7, 2014 at 11:25 AM, Cheng Lian <lian.cs.zju@gmail.com>
>>>> wrote:
>>>>
>>>>> Maybe a little off topic, but would you mind to share your motivation
>>>>> of saving the RDD into an SQL DB?
>>>>
>>>>
>>>> Many possible reasons (Vida, please chime in with yours!):
>>>>
>>>>    - You have an existing database you want to load new data into so
>>>>    everything's together.
>>>>    - You want very low query latency, which you can probably get with
>>>>    Spark SQL but currently not with the ease you can get it from your average
>>>>    DBMS.
>>>>    - Tooling around traditional DBMSs is currently much more mature
>>>>    than tooling around Spark SQL, especially in the JDBC area.
>>>>
>>>> Nick
>>>>
>>>
>>>
>>

Mime
View raw message