spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <jornfra...@gmail.com>
Subject Re: Off Heap (Tungsten) Memory Usage / Management ?
Date Thu, 22 Sep 2016 17:46:16 GMT
this is probably the best way to manage it

On Thu, Sep 22, 2016 at 6:42 PM, Josh Rosen <joshrosen@databricks.com>
wrote:

> Spark SQL / Tungsten's explicitly-managed off-heap memory will be capped
> at spark.memory.offHeap.size bytes. This is purposely specified as an
> absolute size rather than as a percentage of the heap size in order to
> allow end users to tune Spark so that its overall memory consumption stays
> within container memory limits.
>
> To use your example of a 3GB YARN container, you could configure Spark so
> that it's maximum heap size plus spark.memory.offHeap.size is smaller than
> 3GB (minus some overhead fudge-factor).
>
> On Thu, Sep 22, 2016 at 7:56 AM Sean Owen <sowen@cloudera.com> wrote:
>
>> It's looking at the whole process's memory usage, and doesn't care
>> whether the memory is used by the heap or not within the JVM. Of
>> course, allocating memory off-heap still counts against you at the OS
>> level.
>>
>> On Thu, Sep 22, 2016 at 3:54 PM, Michael Segel
>> <msegel_hadoop@hotmail.com> wrote:
>> > Thanks for the response Sean.
>> >
>> > But how does YARN know about the off-heap memory usage?
>> > That’s the piece that I’m missing.
>> >
>> > Thx again,
>> >
>> > -Mike
>> >
>> >> On Sep 21, 2016, at 10:09 PM, Sean Owen <sowen@cloudera.com> wrote:
>> >>
>> >> No, Xmx only controls the maximum size of on-heap allocated memory.
>> >> The JVM doesn't manage/limit off-heap (how could it? it doesn't know
>> >> when it can be released).
>> >>
>> >> The answer is that YARN will kill the process because it's using more
>> >> memory than it asked for. A JVM is always going to use a little
>> >> off-heap memory by itself, so setting a max heap size of 2GB means the
>> >> JVM process may use a bit more than 2GB of memory. With an off-heap
>> >> intensive app like Spark it can be a lot more.
>> >>
>> >> There's a built-in 10% overhead, so that if you ask for a 3GB executor
>> >> it will ask for 3.3GB from YARN. You can increase the overhead.
>> >>
>> >> On Wed, Sep 21, 2016 at 11:41 PM, Jörn Franke <jornfranke@gmail.com>
>> wrote:
>> >>> All off-heap memory is still managed by the JVM process. If you limit
>> the
>> >>> memory of this process then you limit the memory. I think the memory
>> of the
>> >>> JVM process could be limited via the xms/xmx parameter of the JVM.
>> This can
>> >>> be configured via spark options for yarn (be aware that they are
>> different
>> >>> in cluster and client mode), but i recommend to use the spark options
>> for
>> >>> the off heap maximum.
>> >>>
>> >>> https://spark.apache.org/docs/latest/running-on-yarn.html
>> >>>
>> >>>
>> >>> On 21 Sep 2016, at 22:02, Michael Segel <msegel_hadoop@hotmail.com>
>> wrote:
>> >>>
>> >>> I’ve asked this question a couple of times from a friend who
>> didn’t know
>> >>> the answer… so I thought I would try here.
>> >>>
>> >>>
>> >>> Suppose we launch a job on a cluster (YARN) and we have set up the
>> >>> containers to be 3GB in size.
>> >>>
>> >>>
>> >>> What does that 3GB represent?
>> >>>
>> >>> I mean what happens if we end up using 2-3GB of off heap storage via
>> >>> tungsten?
>> >>> What will Spark do?
>> >>> Will it try to honor the container’s limits and throw an exception
>> or will
>> >>> it allow my job to grab that amount of memory and exceed YARN’s
>> >>> expectations since its off heap?
>> >>>
>> >>> Thx
>> >>>
>> >>> -Mike
>> >>>
>> >>> B‹KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKCB•
>> È
>> >>> [œÝXœØÜšX™H K[XZ[ ˆ \Ù\‹][œÝXœØÜšX™P Ü \šË˜\
XÚ K›Ü™ÃBƒ
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>
>>

Mime
View raw message