spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sujee Maniyam <>
Subject persisting RDD in memory
Date Fri, 01 Aug 2014 17:59:07 GMT
Hi all,
I have a scenario of a web application submitting multiple jobs to Spark.
These jobs may be operating on the same RDD.

It is possible to cache() the RDD during one call...
And all subsequent calls can use the cached RDD?

basically, during one invocation
   val rdd1 = sparkContext1.textFile( file1).cache ()

another invocation..
    val rdd2 = sparkContext2.textFile(file1).cache()

(note that spark context are different, but the file is the same)

will the same file be loaded again in another spark context?
or there will be only one cached copy (since RDDs are immutable)

Sujee Maniyam ( | )

View raw message