I'm not 100% sure what you wanna do though, how about caching whole data and then querying?

On Fri, Mar 25, 2016 at 12:22 AM, Daniel Imberman <daniel.imberman@gmail.com> wrote:
Hi Takeshi,

Thank you for getting back to me. If this is not possible then perhaps you can help me with the root problem that caused me to ask this question.

Basically I have a job where I'm loading/persisting an RDD and running queries against it. The problem I'm having is that even though there is plenty of space in memory, the RDD is not fully persisting. Once I run multiple queries against it the RDD fully persists, but this means that the first 4/5 queries I run are extremely slow.

Is there any way I can make sure that the entire RDD ends up in memory the first time I load it?

Thank you

On Thu, Mar 24, 2016 at 1:21 AM Takeshi Yamamuro <linguin.m.s@gmail.com> wrote:
just re-sent,

---------- Forwarded message ----------
From: Takeshi Yamamuro <linguin.m.s@gmail.com>
Date: Thu, Mar 24, 2016 at 5:19 PM
Subject: Re: Forcing data from disk to memory
To: Daniel Imberman <daniel.imberman@gmail.com>


We have no direct approach; we need to unpersist cached data, then
re-cache data as MEMORY_ONLY.

// maropu

On Thu, Mar 24, 2016 at 8:22 AM, Daniel Imberman <daniel.imberman@gmail.com> wrote:
Hi all,

So I have a question about persistence. Let's say I have an RDD that's
persisted MEMORY_AND_DISK, and I know that I now have enough memory space
cleared up that I can force the data on disk into memory. Is it possible to
tell spark to re-evaluate the open RDD memory and move that information?

Thank you

View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Forcing-data-from-disk-to-memory-tp26585.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Takeshi Yamamuro

Takeshi Yamamuro

Takeshi Yamamuro