I am not 100% sure of the root cause, but if you need rdd caching then look at Apache Ignite or similar.
Thank you for getting back to me. If this is not possible then perhaps you can help me with the root problem that caused me to ask this question.
Basically I have a job where I'm loading/persisting an RDD and running queries against it. The problem I'm having is that even though there is plenty of space in memory, the RDD is not fully persisting. Once I run multiple queries against it the RDD fully persists, but this means that the first 4/5 queries I run are extremely slow.
Is there any way I can make sure that the entire RDD ends up in memory the first time I load it?
---------- Forwarded message ----------
From: Takeshi Yamamuro <email@example.com>
Date: Thu, Mar 24, 2016 at 5:19 PM
Subject: Re: Forcing data from disk to memory
To: Daniel Imberman <firstname.lastname@example.org
We have no direct approach; we need to unpersist cached data, then
re-cache data as MEMORY_ONLY.