spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrea Esposito <and1...@gmail.com>
Subject Re: Incredible slow iterative computation
Date Tue, 06 May 2014 09:54:02 GMT
Thanks all for helping.
Following the Earthson's tip i resolved. I have to report that if you
materialized the RDD and after you try to checkpoint it the operation
doesn't perform.

newRdd = oldRdd.map(myFun).persist(myStorageLevel)
newRdd.foreach(x => myFunLogic(x)) // Here materialized for other reasons
...
if(condition){ // after i would checkpoint
newRdd.checkpoint
newRdd.isCheckpointed // false here
newRdd.foreach(x => {}) // Force evaluation
newRdd.isCheckpointed // still false here
}
oldRdd.unpersist(true)


2014-05-06 3:35 GMT+02:00 Earthson <Earthson.Lu@gmail.com>:

> checkpoint seems to be just add a CheckPoint mark? You need an action after
> marked it. I have tried it with success:)
>
> newRdd = oldRdd.map(myFun).persist(myStorageLevel)
> newRdd.checkpoint // <<checkpoint here
> newRdd.isCheckpointed // false here
> newRdd.foreach(x => {}) // Force evaluation
> newRdd.isCheckpointed // true here
> oldRdd.unpersist(true)
>
>
> ~~~~~~~~
>
> If you have new broadcast object for each step of iteration, broadcast will
> eat up all of the memory. You may need to set "spark.cleaner.ttl" to a
> small
> enough value.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Incredible-slow-iterative-computation-tp4204p5407.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Mime
View raw message