spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From PengWeiPRC <>
Subject What does Spark cache() actually do?
Date Thu, 15 May 2014 23:09:40 GMT
Hi there,

I was wondering if some one could explain me how the cache() function works
in Spark in these phases:

(1) If I have a huge file, say 1TB, which cannot be entirely stored in
Memory. What will happen if I try to create a RDD of this huge file and

(2) If it works in Spark, it can definitely store part of the data. Which
part of the data will be stored in memory, especially, do the new data evict
the old data out of memory just like what cache works?

(3) What would happen if I try to load one RDD and cache, and then another
and cache too, and so on so forth? Will the new RDDs evict the old RDDs
cached in memory?

Thanks very much.

View this message in context:
Sent from the Apache Spark User List mailing list archive at

View raw message