spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chetan Khatri <chetan.opensou...@gmail.com>
Subject Re: Need help for Delta.io
Date Sat, 11 May 2019 06:23:28 GMT
Any thoughts.. Please

On Fri, May 10, 2019 at 2:22 AM Chetan Khatri <chetan.opensource@gmail.com>
wrote:

> Hello All,
>
> I need your help / suggestions,
>
> I am using Spark 2.3.1 with HDP 2.6.1 Distribution, I will tell my use
> case so you get it where people are trying to use Delta.
> My use case is I have source as a MSSQL Server (OLTP) and get data at HDFS
> currently in Parquet and Avro formats. Now I would like to do Incremental
> load / delta load, so I am using CT (Change Tracking Ref.
> https://docs.microsoft.com/en-us/sql/relational-databases/track-changes/enable-and-disable-change-tracking-sql-server?view=sql-server-2017)
> to get updated and deleted records Primary Key and using that I am only
> pulling those records which got updated and deleted. And I would like to
> now Update / Delete Data from Parquet. Currently I am doing full  load,
> which I would like to avoid.
>
> Could you please suggest me, what is best approach.
>
> As HDP doesn't have Spark 2.4.2 available so I can't change the
> infrastructure, Is there any way to use Delta.io on Spark 2.3.1 as I have
> existing codebase written for last year and half  in Scala 2.11  which also
> I don't want to break with Scala 2.12.
>
> I don't need versioning, transaction log at parquet. So if anything else
> fits to my use case. Please do advise.
>
> Thank you.
>

Mime
View raw message