spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mahender Sarangam <mahender.bigd...@outlook.com>
Subject Delta Logic in Spark
Date Sat, 17 Nov 2018 11:23:49 GMT
    Hi,

We have daily data pull which pulls almost 50 GB of data from upstream system. We are using
Spark SQL for processing of 50 GB. Finally insert 50 GB of data into Hive Target table and
Now we are copying whole hive target table to SQL esp. SQL Staging Table & implement merge
from staging SQL table against final SQL target table and insert only modified or new records
in SQL Target table. Since this process is time consuming due to majority of time vested in
copying data from Blob to SQL . Instead of copying whole set of data from cluster to SQL Server
& implementing merge logic in SQL . We would likes to do Merge logic implementation in
Spark SQL and Move the same Delta difference to SQL and Merge against Final SQL Target Table.
This will reduce Network & I/O cost. As any one implementing DELTA difference in Spark
/ SPark SQL

Mime
View raw message