spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xuefeng Wu <ben...@gmail.com>
Subject Re: If an RDD appeared twice in a DAG, of which calculation is triggered by a single action, will this RDD be calculated twice?
Date Mon, 19 Jan 2015 15:24:47 GMT
I think it's always twice,  could you provide some demo case for sometimes
the RDD1 calculated only once?

On Sat, Jan 17, 2015 at 2:37 AM, Peng Cheng <pc175@uow.edu.au> wrote:

> I'm talking about RDD1 (not persisted or checkpointed) in this situation:
>
> ...(somewhere) -> RDD1 -> RDD2
>                               |                |
>                              V               V
>                              RDD3 -> RDD4 -> Action!
>
> To my experience the change RDD1 get recalculated is volatile, sometimes
> once, sometimes twice. When calculation of this RDD is expensive (e.g.
> involves using an RESTful service that charges me money), this compels me
> to
> persist RDD1 which takes extra memory, and in case the Action! doesn't
> always happen, I don't know when to unpersist it to  free those memory.
>
> A related problem might be in $SQLContest.jsonRDD(), since the source
> jsonRDD is used twice (one for schema inferring, another for data read). It
> almost guarantees that the source jsonRDD is calculated twice. Has this
> problem be addressed so far?
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/If-an-RDD-appeared-twice-in-a-DAG-of-which-calculation-is-triggered-by-a-single-action-will-this-RDD-tp21192.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>


-- 

~Yours, Xuefeng Wu/吴雪峰  敬上

Mime
View raw message