spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arti Pande <>
Subject Refreshing Data in Spark Memory (DataFrames)
Date Fri, 13 Nov 2020 17:41:54 GMT

In the financial systems world, if some data is being updated too
frequently, and that data is to be used as reference data by a Spark job
that runs for 6/7 hours, most likely Spark job may read that data at the
beginning and keep it in memory as DataFrame and will keep running for
remaining 6/7 hours. Meanwhile if the reference data is updated by some
other system, then Spark job's in-memory copy of that data (data frame)
goes out of sync.

Is there a way to refresh that reference data in Spark memory / dataframe
by some means?

This seems to be a very common scenario. Is there a solution / workaround
for this?

Thanks & regards,
Arti Pande

View raw message