spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jeff saremi <jeffsar...@hotmail.com>
Subject How can i remove the need for calling cache
Date Tue, 01 Aug 2017 18:05:37 GMT
Calling cache/persist fails all our jobs (i have  posted 2 threads on this).

And we're giving up hope in finding a solution.
So I'd like to find a workaround for that:

If I save an RDD to hdfs and read it back, can I use it in more than one operation?

Example: (using cache)
// do a whole bunch of transformations on an RDD

myrdd.cache()

val result1 = myrdd.map(op1(_))

val result2 = myrdd.map(op2(_))

// in the above I am assuming that a call to cache will prevent all previous transformation
from being calculated twice


I'd like to somehow get result1 and result2 without duplicating work. How can I do that?

thanks

Jeff

Mime
View raw message