spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Nguyen <>
Subject Re: Not caching rdds, setting
Date Sat, 09 Nov 2013 00:40:27 GMT
Grega, the way to think about this setting is that it sets the maximum
amount of memory Spark is allowed to use for caching RDDs before it must
expire or spill them to disk. Spark in principle knows at all times how
many RDDs are kept in memory and their total sizes, so it can for example
persist then free older RDDs when it's allocating space for new RDDs, when
this limit is hit.

There's otherwise no partitioning of the heap to reserve for RDDs vs.
"normal" objects. The entire heap is still managed by the JVM, accessible
to your client code. Of course you do want to be careful with and minimize
your own memory use to avoid OOMEs.

Sent while mobile. Pls excuse typos etc.
On Nov 8, 2013 8:02 AM, "Grega Kešpret" <> wrote:

> Hi,
> The docs say: Fraction of Java heap to use for Spark's memory cache. This
> should not be larger than the "old" generation of objects in the JVM, which
> by default is given 2/3 of the heap, but you can increase it if you
> configure your own old generation size.
> if we are not caching any RDDs, does it mean that we only have
> 1-memoryFraction heap available for "normal" JVM objects? Would it make
> sense then to set memoryFraction to 0?
> Thanks,
> Grega
> --
> [image: Inline image 1]
> *Grega Kešpret*
> Analytics engineer
> Celtra — Rich Media Mobile Advertising
> <> | @celtramobile<>

View raw message