spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Imran Rashid <im...@therashids.com>
Subject off-heap RDDs
Date Sun, 25 Aug 2013 22:26:02 GMT
Hi,

I was wondering if anyone has thought about putting cached data in an
RDD into off-heap memory, eg. w/ direct byte buffers.  For really
long-lived RDDs that use a lot of memory, this seems like a huge
improvement, since all the memory is now totally ignored during GC.
(and reading data from direct byte buffers is potentially faster as
well, buts thats just a nice bonus).

The easiest thing to do is to store memory-serialized RDDs in direct
byte buffers, but I guess we could also store the serialized RDD on
disk and use a memory mapped file.  Serializing into off-heap buffers
is a really simple patch, I just changed a few lines (I haven't done
any real tests w/ it yet, though).  But I dont' really have a ton of
experience w/ off-heap memory, so I thought I would ask what others
think of the idea, if it makes sense or if there are any gotchas I
should be aware of, etc.

thanks,
Imran

Mime
View raw message