spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <mich.talebza...@gmail.com>
Subject Re: update specifc rows to DB using sqlContext
Date Thu, 11 Aug 2016 11:14:03 GMT
in that case one alternative would be to save the new table on hdfs and
then using some simple ETL load it  into a staging table in MySQL and
update the original table from staging table

The whole thing can be done in a shell script.

HTH

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 11 August 2016 at 11:52, sujeet jog <sujeet.jog@gmail.com> wrote:

> I read the table via spark SQL , and perform some  ML activity on the data
> , and the resultant will be to update some specific columns with the ML
> improvised result,
> hence i do not have a option to do the whole operation in MySQL,
>
>
> Thanks,
> Sujeet
>
> On Thu, Aug 11, 2016 at 3:29 PM, Mich Talebzadeh <
> mich.talebzadeh@gmail.com> wrote:
>
>> Ok it is clearer now.
>>
>> You are using Spark as the query tool on an RDBMS table? Read table via
>> JDBC, write back updating certain records.
>>
>> I have not done this myself but I suspect the issue would be if Spark
>> write will commit the transaction and maintains ACID compliance. (locking
>> the rows etc).
>>
>> I know it cannot do this to a Hive transactional table.
>>
>> Any reason why you are not doing the whole operation in MySQL itself?
>>
>> HTH
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 11 August 2016 at 10:46, sujeet jog <sujeet.jog@gmail.com> wrote:
>>
>>> 1 ) using mysql DB
>>> 2 ) will be inserting/update/overwrite to the same table
>>> 3 ) i want to update a specific column in a record, the data is read via
>>> Spark SQL,
>>>
>>> on the below table which is read via sparkSQL, i would like to update
>>> the NumOfSamples column .
>>>
>>> consider DF as the dataFrame which holds the records,  registered as
>>> temporary table MS .
>>>
>>> spark.sqlContext.write.format("jdbc").option("url", url
>>> ).option("dbtable", "update ms  set NumOfSamples = 20 where 'TimeSeriesID =
>>> '1000'" As MS ).save
>>>
>>> I believe updating a record via sparkSQL is not supported,  the only
>>> workaround is to open up a jdbc connection without using spark API's and do
>>> a direct update ?..
>>>
>>> Sample Ex : -
>>>
>>> mysql> show columns from ms;
>>> +--------------+-------------+------+-----+---------+-------+
>>> | Field        | Type        | Null | Key | Default | Extra |
>>> +--------------+-------------+------+-----+---------+-------+
>>> | TimeSeriesID | varchar(20) | YES  |     | NULL    |       |
>>> | NumOfSamples | int(11)     | YES  |     | NULL    |       |
>>> +--------------+-------------+------+-----+---------+-------+
>>>
>>>
>>> Thanks,
>>> Sujeet
>>>
>>>
>>>
>>> On Tue, Aug 9, 2016 at 6:31 PM, Mich Talebzadeh <
>>> mich.talebzadeh@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>>
>>>>    1. what is the underlying DB, say Hive etc
>>>>    2. Is table transactional or you are going to do insert/overwrite
>>>>    to the same table
>>>>    3. can you do all this in the database itself assuming it is an
>>>>    RDBMS
>>>>    4. Can you provide the sql or pseudo code for such an update
>>>>
>>>>
>>>> HTH
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>> any loss, damage or destruction of data or any other property which may
>>>> arise from relying on this email's technical content is explicitly
>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>> arising from such loss, damage or destruction.
>>>>
>>>>
>>>>
>>>> On 9 August 2016 at 13:39, sujeet jog <sujeet.jog@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Is it possible to update certain columnr records  in DB  from spark,
>>>>>
>>>>> for example i have 10 rows with 3 columns  which are read from Spark
>>>>> SQL,
>>>>>
>>>>> i want to update specific column entries  and write back to DB, but
>>>>> since RDD"s are immutable i believe this would be difficult, is there
a
>>>>> workaround.
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Sujeet
>>>>>
>>>>
>>>>
>>>
>>
>

Mime
View raw message