In the financial systems world, if some data is being updated too frequently, and that data is to be used as reference data by a Spark job that runs for 6/7 hours, most likely Spark job may read that data at the beginning and keep it in memory as DataFrame and will keep running for remaining 6/7 hours. Meanwhile if the reference data is updated by some other system, then Spark job's in-memory copy of that data (data frame) goes out of sync.

Is there a way to refresh that reference data in Spark memory / dataframe by some means?

This seems to be a very common scenario. Is there a solution / workaround for this?

Thanks & regards,
Arti Pande