spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <sro...@gmail.com>
Subject Re: Access to live data of cached dataFrame
Date Fri, 17 May 2019 18:57:10 GMT
A cached DataFrame isn't supposed to change, by definition.
You can re-read each time or consider setting up a streaming source on
the table which provides a result that updates as new data comes in.

On Fri, May 17, 2019 at 1:44 PM Tomas Bartalos <tomas.bartalos@gmail.com> wrote:
>
> Hello,
>
> I have a cached dataframe:
>
> spark.read.format("delta").load("/data").groupBy(col("event_hour")).count.cache
>
> I would like to access the "live" data for this data frame without deleting the cache
(using unpersist()). Whatever I do I always get the cached data on subsequent queries. Even
adding new column to the query doesn't help:
>
> spark.read.format("delta").load("/data").groupBy(col("event_hour")).count.withColumn("dummy",
lit("dummy"))
>
>
> I'm able to workaround this using cached sql view, but I couldn't find a pure dataFrame
solution.
>
> Thank you,
> Tomas

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Mime
View raw message