spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yadid Ayzenberg <>
Subject RDD cache question
Date Sun, 01 Dec 2013 02:01:17 GMT

Hi All,

Im trying to implement the following and would like to know in which places I should be calling

Suppose I have a group of RDDs : RDD1 to RDDn as input.

1. create a single RDD_total = RDD1.union(RDD2)..union(RDDn)

2. for i = 0 to x:    RDD_total = (some map function());

3. return RDD_total.

I that I should cache RDD total in order to optimize the iterations. Should I just be calling
RDD_total.cache() at the end of each iteration ? or should I be preforming something more

RDD_temp = (some map function());
RDD_total = RDD_temp.cache();


View raw message