spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roland Johann <roland.joh...@phenetic.io.INVALID>
Subject Re: Delta with intelligent upsett
Date Fri, 01 Nov 2019 06:57:05 GMT
If the dataset contains a column like changed_at/created_at you can use this as watermark and
filter out rows that have changed_at/created_at before the watermark. 

Best Regards

Roland Johann
Software Developer/Data Engineer

phenetic GmbH
Lütticher Straße 10, 50674 Köln, Germany

Mobil: +49 172 365 26 46
Mail: roland.johann@phenetic.io
Web: phenetic.io

Handelsregister: Amtsgericht Köln (HRB 92595)
Geschäftsführer: Roland Johann, Uwe Reimann



> Am 01.11.2019 um 07:52 schrieb Gourav Sengupta <gourav.sengupta@gmail.com>:
> 
> should not a where clause on the partition field help with that? I am obviously missing
something in the question.
> 
> Regards,
> Gourav
> 
> On Thu, Oct 31, 2019 at 9:15 PM ayan guha <guha.ayan@gmail.com <mailto:guha.ayan@gmail.com>>
wrote:
> 
> Hi
> 
> we have a scenario where we have a large table  ie 5-6B records. The table is repository
of data from past N years. It is possible that some updates take place on the data and thus
er are using Delta table. 
> 
> As part of the business process we know updates can happen only within M years of past
records where M is much smaller than N. Eg the table can hold 20 yrs of data but we know updates
can happen only for last year not before that. 
> 
> Is there some way to indicate this additional intelligence to Delta so it can look into
only last years data while running a merge or update? It seems to be an obvious performance
booster. 
> 
> Any thoughts?
> -- 
> Best Regards,
> Ayan Guha
> -- 
> Best Regards,
> Ayan Guha


Mime
View raw message