spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pengcheng YIN <pcyin1...@gmail.com>
Subject does calling cache()/persist() on a RDD trigger its immediate evaluation?
Date Sun, 04 Jan 2015 07:23:16 GMT
Hi Pro,

I have a question regarding calling cache()/persist() on an RDD. All RDDs in Spark are lazily
evaluated, but does calling cache()/persist() on a RDD trigger its immediate evaluation?

My spark app is something like this:

val rdd = sc.textFile().map()
rdd.persist()
while(true){
    val count = rdd.filter().count
    if(count == 0)
        break
    
    newRdd = /* some codes that use `rdd` several times, and produce an new RDD */
    rdd.unpersist()
    rdd = newRdd.persist()
}

In each iteration, I persist `rdd`, and unpersist it at the end of the iteration, replace
`rdd` with persisted `newRdd`. My concern is that, if RDD is not evaluated and persisted when
persist() is called, I need to change the position of persist()/unpersist() called to make
it more efficient.

Thanks,
Pengcheng




---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message