spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Davidson <ilike...@gmail.com>
Subject Re: on shark, is tachyon less efficient than memory_only cache strategy ?
Date Tue, 08 Jul 2014 16:58:16 GMT
Shark's in-memory format is already serialized (it's compressed and
column-based).


On Tue, Jul 8, 2014 at 9:50 AM, Mridul Muralidharan <mridul@gmail.com>
wrote:

> You are ignoring serde costs :-)
>
> - Mridul
>
> On Tue, Jul 8, 2014 at 8:48 PM, Aaron Davidson <ilikerps@gmail.com> wrote:
> > Tachyon should only be marginally less performant than memory_only,
> because
> > we mmap the data from Tachyon's ramdisk. We do not have to, say, transfer
> > the data over a pipe from Tachyon; we can directly read from the buffers
> in
> > the same way that Shark reads from its in-memory columnar format.
> >
> >
> >
> > On Tue, Jul 8, 2014 at 1:18 AM, qingyang li <liqingyang1985@gmail.com>
> > wrote:
> >
> >> hi, when i create a table, i can point the cache strategy using
> >> shark.cache,
> >> i think "shark.cache=memory_only"  means data are managed by spark, and
> >> data are in the same jvm with excutor;   while  "shark.cache=tachyon"
> >>  means  data are managed by tachyon which is off heap, and data are not
> in
> >> the same jvm with excutor,  so spark will load data from tachyon for
> each
> >> query sql , so,  is  tachyon less efficient than memory_only cache
> strategy
> >>  ?
> >> if yes, can we let spark load all data once from tachyon  for all sql
> query
> >>  if i want to use tachyon cache strategy since tachyon is more HA than
> >> memory_only ?
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message