spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chetan Khatri <chetan.opensou...@gmail.com>
Subject Re: Update / Delete records in Parquet
Date Tue, 23 Apr 2019 03:56:51 GMT
Hello Jason, Thank you for reply. My use case is that, first time I do full
load and transformation/aggregation/joins and write to parquet (as staging)
but next time onwards my source is MSSQL Server, I want to pull only those
records got changed / updated and would like to update at parquet also if
possible without side effects.
https://docs.microsoft.com/en-us/sql/relational-databases/track-changes/work-with-change-tracking-sql-server?view=sql-server-2017

On Tue, Apr 23, 2019 at 3:02 AM Jason Nerothin <jasonnerothin@gmail.com>
wrote:

> Hi Chetan,
>
> Do you have to use Parquet?
>
> It just feels like it might be the wrong sink for a high-frequency change
> scenario.
>
> What are you trying to accomplish?
>
> Thanks,
> Jason
>
> On Mon, Apr 22, 2019 at 2:09 PM Chetan Khatri <chetan.opensource@gmail.com>
> wrote:
>
>> Hello All,
>>
>> If I am doing incremental load / delta and would like to update / delete
>> the records in parquet, I understands that parquet is immutable and can't
>> be deleted / updated theoretically only append / overwrite can be done. But
>> I can see utility tools which claims to add value for that.
>>
>> https://github.com/Factual/parquet-rewriter
>>
>> Please throw a light.
>>
>> Thanks
>>
>
>
> --
> Thanks,
> Jason
>

Mime
View raw message