spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From charles li <>
Subject rdd cache priority
Date Fri, 05 Feb 2016 03:15:09 GMT
say I have 2 RDDs, RDD1 and RDD2.

both are 20g in memory.

and I cache both of them in memory using RDD1.cache() and RDD2.cache()

the in the further steps on my app, I never use RDD1 but use RDD2 for lots
of time.

then here is my question:

if there is only 40G memory in my cluster, and here I have another RDD,
RDD3 for 20g, what happened if I cache RDD3 using RDD3.cache()?

as the document says, cache using the default cache level : MEMORY_ONLY .
it means that it will not definitely cache RDD3 but re-compute it every
time used.

I feel a little confused, will spark help me remove RDD1 and put RDD3 in
the memory?

or is there any concept like " Priority cache " in spark?

great thanks

a spark lover, a quant, a developer and a good man.

View raw message