spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sandip Mehta <sandip.mehta....@gmail.com>
Subject Re: [Structured Streaming] Reuse computation result
Date Thu, 01 Feb 2018 12:06:34 GMT
You can use persist() or cache() operation on DataFrame.

On Tue, Dec 26, 2017 at 4:02 PM Shu Li Zheng <nezhazheng@gmail.com> wrote:

> Hi all,
>
> I have a scenario like this:
>
> val df = dataframe.map().filter()
> // agg 1
> val query1 = df.sum.writeStream.start
> // agg 2
> val query2 = df.count.writeStream.start
>
> With spark streaming, we can apply persist() on rdd to reuse the df
> computation result, when we call persist() after filter() map().filter()
> operator only run once.
> With SS, we can’t apply persist() direct on dataframe. query1 and query2
> will not reuse result after filter. map/filter run twice. So is there a way
> to solve this.
>
> Regards,
>
> Shu li Zheng
>
>

Mime
View raw message