spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <jornfra...@gmail.com>
Subject Re: Any advice how to do this usecase in spark sql ?
Date Wed, 14 Aug 2019 05:08:17 GMT
Have you tried to join both datasets, filter accordingly and then write the full dataset to
your filesystem?
Alternatively work with a NoSQL database that you update by key (eg it sounds a key/value
store could be useful for you).

However, it could be also that you need to do more depending on your use case.

> Am 14.08.2019 um 05:08 schrieb Shyam P <shyamabigdata@gmail.com>:
> 
> Hi,
> Any advice how to do this in spark sql ?
> 
> I have a scenario as below
> 
> dataframe1   = loaded from an HDFS Parquet file.
> 
> dataframe2 =   read from a Kafka Stream.
> 
> If column1 of dataframe1 value in columnX value of dataframe2 , then I need then I need
to replace column1 value of dataframe1. 
> 
> Else add column1 value of dataframe1 to dataframe2 as a new record.
> 
> 
> 
> In a sense need to implement a look up dataframe which is refresh-able.
> 
> For more information please check
> 
> https://stackoverflow.com/questions/57479581/how-to-do-this-scenario-in-spark-streaming?noredirect=1#comment101437596_57479581

> 
>  Let me know if u need more info  
> 
> Thanks

Mime
View raw message