spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Artemis User <arte...@dtechspace.com>
Subject Re: Use case advice
Date Sat, 09 Jan 2021 17:17:21 GMT
Could you please clarify what do you mean by 1)? Driver is only 
responsible for submitting Spark job, not performing.

-- ND

On 1/9/21 9:35 AM, AndrĂ¡s Kolbert wrote:
> Hi,
> I would like to get your advice on my use case.
> I have a few spark streaming applications where I need to keep 
> updating a dataframe after each batch. Each batch probably affects a 
> small fraction of the dataframe (5k out of 200k records).
>
> The options I have been considering so far:
> 1) keep dataframe on the driver, and update that after each batch
> 2) keep dataframe distributed, and use checkpointing to mitigate lineage
>
> I solved previous use cases with option 2, but I am not sure if it is 
> the most optimal as checkpointing is relatively expensive. I also 
> wondered about HBASE or some sort of quick access memory storage, 
> however it is currently not in my stack.
>
> Curious to hear your thoughts
>
> Andras
>

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message