spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrea Esposito <and1...@gmail.com>
Subject Re: cache not work as expected for iteration?
Date Sun, 04 May 2014 08:47:54 GMT
Maybe your memory isn't enough to contain the current RDD and also all the
past ones?
RDDs that are cached or persisted have to be unpersisted explicitly, no
auto-unpersist (maybe changes will be for 1.0 version?) exists.
Be careful that calling cache() or persist() doesn't imply the RDD will be
materialised......
I personally found this pattern of usage as simpler one:

> val mwzNew = mwz.mapPartitions(...).cache.persist
> mwzNew.count() or mwzNew.foreach(x => {}) // Force evaluation of the new
> RDD in order to have it materialized
> mwz.unpersist() // Drop from memory and disk the old, not  anymore used,
> RDD
>




2014-05-04 5:16 GMT+02:00 Earthson <Earthson.Lu@gmail.com>:

> I'm using spark for LDA impementation. I need cache RDD for next step of
> Gibbs Sampling, and cached the result and the cache previous could be
> uncache. Something like LRU cache should delete the previous cache because
> it is never used then, but the cache runs into confusion:
>
> Here is the code:)
> <
> https://github.com/Earthson/sparklda/blob/master/src/main/scala/net/earthson/nlp/lda/lda.scala#L99
> >
>
> <
> http://apache-spark-user-list.1001560.n3.nabble.com/file/n5292/sparklda_cache1.png
> >
>
> <
> http://apache-spark-user-list.1001560.n3.nabble.com/file/n5292/sparklda_cache2.png
> >
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/cache-not-work-as-expected-for-iteration-tp5292.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Mime
View raw message