spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gene Pang <gene.p...@gmail.com>
Subject Re: Spark 2.x OFF_HEAP persistence
Date Wed, 04 Jan 2017 21:20:19 GMT
Hi Vin,

>From Spark 2.x, OFF_HEAP was changed to no longer directly interface with
an external block store. The previous tight dependency was restrictive and
reduced flexibility. It looks like the new version uses the executor's off
heap memory to allocate direct byte buffers, and does not interface with
any external system for the data storage. I am not aware of a way to
connect the new version of OFF_HEAP to Alluxio.

You can experience similar benefits of the old OFF_HEAP <-> Tachyon mode as
well as additional benefits like unified namespace
<http://www.alluxio.org/docs/master/en/Unified-and-Transparent-Namespace.html>
or
sharing in-memory data across applications, by using the Alluxio filesystem
API <http://www.alluxio.org/docs/master/en/File-System-API.html>.

I hope this helps!

Thanks,
Gene

On Wed, Jan 4, 2017 at 10:50 AM, Vin J <winjoshi3@gmail.com> wrote:

> Until Spark 1.6 I see there were specific properties to configure such as
> the external block store master url (spark.externalBlockStore.url) etc to
> use OFF_HEAP storage level which made it clear that an external Tachyon
> type of block store as required/used for OFF_HEAP storage.
>
> Can someone clarify how this has been changed in Spark 2.x - because I do
> not see config settings anymore that point Spark to an external block store
> like Tachyon (now Alluxio) (or am i missing seeing it?)
>
> I understand there are ways to use Alluxio with Spark, but how about
> OFF_HEAP storage - can Spark 2.x OFF_HEAP rdd persistence still exploit
> alluxio/external block store? Any pointers to design decisions/Spark JIRAs
> related to this will also help.
>
> Thanks,
> Vin.
>

Mime
View raw message