spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <jornfra...@gmail.com>
Subject Re: Update DF record with delta data in spark
Date Sun, 02 Apr 2017 15:21:53 GMT
If you trust that your delta file is correct then this might be the way forward. You just have
to keep in mind that sometimes you can have several delta files in parallel and you need to
apply then in the correct order or otherwise a deleted row might reappear again. Things get
more messy if a delta file cannot be loaded and new deltas arrive - you have to wait until
the wrong delta file can be loaded before the others etc. Delta files are usually a messy
thing that requires much more testing effort and one has to carefully thing if this is worth
it.

> On 2. Apr 2017, at 15:57, Selvam Raman <selmna@gmail.com> wrote:
> 
> Hi,
> 
> Table 1:(old File)
> 
> name	number 	salray
> Test1	1	10000
> Test2	2	10000
> 
> Table 2: (Delta File)
> 
> name	   number 	salray
> Test1	1	40000
> Test3	3	20000
> 
> 
> ​i do not have date stamp field in this table. Having composite key of name and number
fields.
> 
> Expected Result
> 
> name	number 	salray
> Test1	1	40000
> Test2	2	10000
> Test3	3	20000
> 
> 
> Current approach:
> 
> 1) Delete row in table1 where table1.composite key = table2.composite key.
> 2) Union all table and table2 to get updated result.
> 
> 
> is this right approach?. is there any other way to achieve it?​
> 
> -- 
> Selvam Raman
> "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"

Mime
View raw message