spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From canan chen <ccn...@gmail.com>
Subject Intermedate stage will be cached automatically ?
Date Wed, 17 Jun 2015 09:56:44 GMT
Here's one simple spark example that I call RDD#count 2 times. The first
time it would invoke 2 stages, but the second one only need 1 stage. Seems
the first stage is cached. Is that true ? Any flag can I control whether
the cache the intermediate stage


    val data = sc.parallelize(1 to 10, 2).map(e=>(e%2,2)).reduceByKey(_ + _, 2)
    println(data.count())
    println(data.count())

Mime
View raw message