spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Tse <>
Subject RE: Intermedate stage will be cached automatically ?
Date Wed, 17 Jun 2015 12:21:41 GMT
I think might shed
some light on the behaviour you’re seeing.


From: canan chen []
Sent: June-17-15 5:57 AM
To: spark users
Subject: Intermedate stage will be cached automatically ?

Here's one simple spark example that I call RDD#count 2 times. The first time it would invoke
2 stages, but the second one only need 1 stage. Seems the first stage is cached. Is that true
? Any flag can I control whether the cache the intermediate stage

    val data = sc.parallelize(1 to 10, 2).map(e=>(e%2,2)).reduceByKey(_ + _, 2)
View raw message