spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ayan guha <guha.a...@gmail.com>
Subject Re: Intermedate stage will be cached automatically ?
Date Wed, 17 Jun 2015 12:45:13 GMT
Its not cached per se. For example, you will not see this in Storage tab in
UI. However, spark has read the data and its in memory right now. So, the
next count call should be very fast.


Best
Ayan

On Wed, Jun 17, 2015 at 10:21 PM, Mark Tse <Mark.Tse@d2l.com> wrote:

>  I think
> https://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence
> might shed some light on the behaviour you’re seeing.
>
>
>
> Mark
>
>
>
> *From:* canan chen [mailto:ccnfdu@gmail.com]
> *Sent:* June-17-15 5:57 AM
> *To:* spark users
> *Subject:* Intermedate stage will be cached automatically ?
>
>
>
> Here's one simple spark example that I call RDD#count 2 times. The first
> time it would invoke 2 stages, but the second one only need 1 stage. Seems
> the first stage is cached. Is that true ? Any flag can I control whether
> the cache the intermediate stage
>
>
>     *val *data = sc.parallelize(1 to 10, 2).map(e=>(e%2,2)).reduceByKey(_ + _, 2)
>     *println*(data.count())
>     *println*(data.count())
>
>


-- 
Best Regards,
Ayan Guha

Mime
View raw message