spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "lucas.gary@gmail.com" <lucas.g...@gmail.com>
Subject Re: How can i remove the need for calling cache
Date Tue, 01 Aug 2017 18:23:04 GMT
Hi Jeff, that looks sane to me.  Do you have additional details?

On 1 August 2017 at 11:05, jeff saremi <jeffsaremi@hotmail.com> wrote:

> Calling cache/persist fails all our jobs (i have  posted 2 threads on
> this).
>
> And we're giving up hope in finding a solution.
> So I'd like to find a workaround for that:
>
> If I save an RDD to hdfs and read it back, can I use it in more than one
> operation?
>
> Example: (using cache)
> // do a whole bunch of transformations on an RDD
>
> myrdd.cache()
>
> val result1 = myrdd.map(op1(_))
>
> val result2 = myrdd.map(op2(_))
>
> // in the above I am assuming that a call to cache will prevent all
> previous transformation from being calculated twice
>
> I'd like to somehow get result1 and result2 without duplicating work. How
> can I do that?
>
> thanks
>
> Jeff
>

Mime
View raw message