spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <jornfra...@gmail.com>
Subject Re: Spark SaveMode
Date Sat, 20 Jul 2019 05:40:56 GMT
This is not an issue of Spark, but the underlying database. The primary key constraint has
a purpose and ignoring it would defeat that purpose. 
Then to handle your use case, you would need to make multiple decisions that may imply you
don’t want to simply insert if not exist. Maybe you want to do an upsert or how do you want
to take into account deleted data?
You could use a Merge in Oracle to achieve what you have in mind. In Spark you would need
to fetch the data from the Oracle database and then merge it in Spark with the new data depending
on your requirements.

> Am 20.07.2019 um 06:34 schrieb Richard <fifistorm123@gmail.com>:
> 
> Any reason why Spark's SaveMode doesn't have mode that ignore any Primary Key/Unique
constraint violations?
> 
> Let's say I'm using spark to migrate some data from Cassandra to Oracle, I want the insert
operation to be "ignore if exist primary keys" instead of failing the whole batch.
> 
> Thanks, 
> Richard 
> 

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message