spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <mich.talebza...@gmail.com>
Subject Re: Incremental Updates and custom SQL via JDBC
Date Thu, 25 Aug 2016 16:22:55 GMT
As far as I can tell Spark does not support update to ORC tables.

This is because Spark needs to send heartbeat to Hive metadata and maintain
in throughout DML transaction operation (delete, updates here) and that is
not implemented.

For the same token if you have performed DML on ORC table in Hive itself
ending up with delta files, until compaction (rolling delta files into main
files) is complete, Spark won't be able to read the ORC data!

HTH

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 25 August 2016 at 00:54, Sascha Schmied <sascha.schmied@outlook.com>
wrote:

> Thank you for your answer.
>
> I’m using ORC transactional table right now. But i’m not stuck with that.
> When I send an SQL statement like the following, where old_5sek_agg and
> new_5sek_agg are registered temp tables, I’ll get an exception in spark.
> Same without subselect.
>
> sqlContext.sql("DELETE FROM old_5sek_agg WHERE Sec in (SELECT Sec FROM
> new_5sek_agg)")
>
> When I execute the statement directly in hive ambari view, I don’t get
> exceptions, indeed I get a success info, but the pointed row won’t be
> deleted or updated by UPDATE statement.
>
> I’m not familiar with your op_type and op_time approach and couldn’t find
> any useful resources by quickly asking google, but it sounds promising.
> Unfortunately your answer seems to be cut off in the middle of your
> example. Would you really update the value of those two additional columns
> and if so, how would you do this when it’s not a ORC transactional table.
>
> Thanks again!
>
> Am 25.08.2016 um 01:24 schrieb Mich Talebzadeh <mich.talebzadeh@gmail.com
> >:
>
> Dr Mich Talebzadeh
>
>
>

Mime
View raw message