spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yadid Ayzenberg <ya...@media.mit.edu>
Subject RDD cache question
Date Sun, 01 Dec 2013 02:01:17 GMT



Hi All,

Im trying to implement the following and would like to know in which places I should be calling
RDD.cache():

Suppose I have a group of RDDs : RDD1 to RDDn as input.

1. create a single RDD_total = RDD1.union(RDD2)..union(RDDn)

2. for i = 0 to x:    RDD_total = RDD_total.map (some map function());

3. return RDD_total.

I that I should cache RDD total in order to optimize the iterations. Should I just be calling
RDD_total.cache() at the end of each iteration ? or should I be preforming something more
elaborate:


RDD_temp = RDD_total.map (some map function());
RDD_total.unpersist();
RDD_total = RDD_temp.cache();



Thanks,
Yadid







Mime
View raw message