spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chetan Khatri <>
Subject Re: Need help for
Date Sat, 11 May 2019 06:23:28 GMT
Any thoughts.. Please

On Fri, May 10, 2019 at 2:22 AM Chetan Khatri <>

> Hello All,
> I need your help / suggestions,
> I am using Spark 2.3.1 with HDP 2.6.1 Distribution, I will tell my use
> case so you get it where people are trying to use Delta.
> My use case is I have source as a MSSQL Server (OLTP) and get data at HDFS
> currently in Parquet and Avro formats. Now I would like to do Incremental
> load / delta load, so I am using CT (Change Tracking Ref.
> to get updated and deleted records Primary Key and using that I am only
> pulling those records which got updated and deleted. And I would like to
> now Update / Delete Data from Parquet. Currently I am doing full  load,
> which I would like to avoid.
> Could you please suggest me, what is best approach.
> As HDP doesn't have Spark 2.4.2 available so I can't change the
> infrastructure, Is there any way to use on Spark 2.3.1 as I have
> existing codebase written for last year and half  in Scala 2.11  which also
> I don't want to break with Scala 2.12.
> I don't need versioning, transaction log at parquet. So if anything else
> fits to my use case. Please do advise.
> Thank you.

View raw message