spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Burak Yavuz <brk...@gmail.com>
Subject Re: Delta with intelligent upsett
Date Sun, 03 Nov 2019 04:57:24 GMT
You can just add the target partitioning filter to your MERGE or UPDATE
condition, e.g.

MERGE INTO target
USING source
ON target.key = source.key AND target.year = year(current_date())
...

Best,
Burak

On Thu, Oct 31, 2019, 10:15 PM ayan guha <guha.ayan@gmail.com> wrote:

>
> Hi
>
> we have a scenario where we have a large table  ie 5-6B records. The table
> is repository of data from past N years. It is possible that some updates
> take place on the data and thus er are using Delta table.
>
> As part of the business process we know updates can happen only within M
> years of past records where M is much smaller than N. Eg the table can hold
> 20 yrs of data but we know updates can happen only for last year not before
> that.
>
> Is there some way to indicate this additional intelligence to Delta so it
> can look into only last years data while running a merge or update? It
> seems to be an obvious performance booster.
>
> Any thoughts?
> --
> Best Regards,
> Ayan Guha
> --
> Best Regards,
> Ayan Guha
>

Mime
View raw message