spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicholas Gustafson <njgustaf...@gmail.com>
Subject Re: How can I use pyspark to upsert one row without replacing entire table
Date Wed, 12 Aug 2020 17:26:08 GMT
The delta docs have examples of upserting:

https://docs.delta.io/0.4.0/delta-update.html#upsert-into-a-table-using-merge

> On Aug 12, 2020, at 08:31, Siavash Namvar <snsina@gmail.com> wrote:
> 
> 
> Thanks Sean,
> 
> Do you have any URL or reference to help me how to upsert in Spark? I need to update
Sybase db
> 
>> On Wed, Aug 12, 2020 at 11:06 AM Sean Owen <srowen@gmail.com> wrote:
>> It's not so much Spark but the data format, whether it supports
>> upserts. Parquet, CSV, JSON, etc would not.
>> That is what Delta, Hudi et al are for, and yes you can upsert them in Spark.
>> 
>> On Wed, Aug 12, 2020 at 9:57 AM Siavash Namvar <snsina@gmail.com> wrote:
>> >
>> > Hi,
>> >
>> > I have a use case, and read data from a db table and need to update few rows
based on primary key without replacing the entire table.
>> >
>> > for instance if I have 3 following rows
>> >
>> > -------------------
>> > id | fname
>> > -------------------
>> >  1 | john
>> > -------------------
>> >  2 | Steve
>> > -------------------
>> >  3 | Jack
>> > -------------------
>> >
>> > And I would like to update the row with id=2 from Steve to Michael without replacing
the entire table and the outpur looks like
>> >
>> > -------------------
>> > id | fname
>> > -------------------
>> >  1 | john
>> > -------------------
>> >  2 | Michael
>> > -------------------
>> >  3 | Jack
>> > -------------------
>> >
>> > Keep in mind the actual db table is so huge and database is old and cannot read
and replace entire table
>> >
>> > Thanks

Mime
View raw message