spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicholas Chammas <nicholas.cham...@gmail.com>
Subject Re: What should happen if we try to cache more data than the cluster can hold in memory?
Date Fri, 01 Aug 2014 17:17:46 GMT
On Fri, Aug 1, 2014 at 12:39 PM, Sean Owen <sowen@cloudera.com> wrote:

Isn't this your worker running out of its memory for computations,
> rather than for caching RDDs?
>
I’m not sure how to interpret the stack trace, but let’s say that’s true.
I’m even seeing this with a simple a = sc.textFile().cache() and then
a.count(). Spark shouldn’t need that much memory for this kind of work, no?

then the answer is that you should tell
> it to use less memory for caching.
>
I can try that. That’s done by changing spark.storage.memoryFraction, right?

This still seems strange though. The default fraction of the JVM left for
non-cache activity (1 - 0.6 = 40%
<http://spark.apache.org/docs/latest/configuration.html#execution-behavior>)
should be plenty for just counting elements. I’m using m1.xlarge nodes that
have 15GB of memory apiece.

Nick
​

Mime
View raw message