spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicholas Chammas <>
Subject Re: What should happen if we try to cache more data than the cluster can hold in memory?
Date Fri, 01 Aug 2014 17:17:46 GMT
On Fri, Aug 1, 2014 at 12:39 PM, Sean Owen <> wrote:

Isn't this your worker running out of its memory for computations,
> rather than for caching RDDs?
I’m not sure how to interpret the stack trace, but let’s say that’s true.
I’m even seeing this with a simple a = sc.textFile().cache() and then
a.count(). Spark shouldn’t need that much memory for this kind of work, no?

then the answer is that you should tell
> it to use less memory for caching.
I can try that. That’s done by changing, right?

This still seems strange though. The default fraction of the JVM left for
non-cache activity (1 - 0.6 = 40%
should be plenty for just counting elements. I’m using m1.xlarge nodes that
have 15GB of memory apiece.


View raw message