spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From qingyang li <liqingyang1...@gmail.com>
Subject Re: on shark, is tachyon less efficient than memory_only cache strategy ?
Date Thu, 10 Jul 2014 02:40:40 GMT
could i set some cache policy to let spark load data from tachyon only one
time for all sql query?  for example by using CacheAllPolicy
FIFOCachePolicy LRUCachePolicy.  But I have tried that three policy, they
are not useful.
I think , if spark always load data for each sql query,  it will impact the
query speed , it will take more time than the case that data are managed by
spark itself.




2014-07-09 1:19 GMT+08:00 Haoyuan Li <haoyuan.li@gmail.com>:

> Yes. For Shark, two modes, "shark.cache=tachyon" and "shark.cache=memory",
> have the same ser/de overhead. Shark loads data from outsize of the process
> in Tachyon mode with the following benefits:
>
>
>    - In-memory data sharing across multiple Shark instances (i.e. stronger
>    isolation)
>    - Instant recovery of in-memory tables
>    - Reduce heap size => faster GC in shark
>    - If the table is larger than the memory size, only the hot columns will
>    be cached in memory
>
> from http://tachyon-project.org/master/Running-Shark-on-Tachyon.html and
> https://github.com/amplab/shark/wiki/Running-Shark-with-Tachyon
>
> Haoyuan
>
>
> On Tue, Jul 8, 2014 at 9:58 AM, Aaron Davidson <ilikerps@gmail.com> wrote:
>
> > Shark's in-memory format is already serialized (it's compressed and
> > column-based).
> >
> >
> > On Tue, Jul 8, 2014 at 9:50 AM, Mridul Muralidharan <mridul@gmail.com>
> > wrote:
> >
> > > You are ignoring serde costs :-)
> > >
> > > - Mridul
> > >
> > > On Tue, Jul 8, 2014 at 8:48 PM, Aaron Davidson <ilikerps@gmail.com>
> > wrote:
> > > > Tachyon should only be marginally less performant than memory_only,
> > > because
> > > > we mmap the data from Tachyon's ramdisk. We do not have to, say,
> > transfer
> > > > the data over a pipe from Tachyon; we can directly read from the
> > buffers
> > > in
> > > > the same way that Shark reads from its in-memory columnar format.
> > > >
> > > >
> > > >
> > > > On Tue, Jul 8, 2014 at 1:18 AM, qingyang li <
> liqingyang1985@gmail.com>
> > > > wrote:
> > > >
> > > >> hi, when i create a table, i can point the cache strategy using
> > > >> shark.cache,
> > > >> i think "shark.cache=memory_only"  means data are managed by spark,
> > and
> > > >> data are in the same jvm with excutor;   while
>  "shark.cache=tachyon"
> > > >>  means  data are managed by tachyon which is off heap, and data are
> > not
> > > in
> > > >> the same jvm with excutor,  so spark will load data from tachyon for
> > > each
> > > >> query sql , so,  is  tachyon less efficient than memory_only cache
> > > strategy
> > > >>  ?
> > > >> if yes, can we let spark load all data once from tachyon  for all
> sql
> > > query
> > > >>  if i want to use tachyon cache strategy since tachyon is more HA
> than
> > > >> memory_only ?
> > > >>
> > >
> >
>
>
>
> --
> Haoyuan Li
> AMPLab, EECS, UC Berkeley
> http://www.cs.berkeley.edu/~haoyuan/
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message