spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reynold Xin <r...@databricks.com>
Subject Re: better compression codecs for shuffle blocks?
Date Mon, 14 Jul 2014 23:08:48 GMT
Copying Jon here since he worked on the lzf library at Ning.

Jon - any comments on this topic?


On Mon, Jul 14, 2014 at 3:54 PM, Matei Zaharia <matei.zaharia@gmail.com>
wrote:

> You can actually turn off shuffle compression by setting
> spark.shuffle.compress to false. Try that out, there will still be some
> buffers for the various OutputStreams, but they should be smaller.
>
> Matei
>
> On Jul 14, 2014, at 3:30 PM, Stephen Haberman <stephen.haberman@gmail.com>
> wrote:
>
> >
> > Just a comment from the peanut gallery, but these buffers are a real
> > PITA for us as well. Probably 75% of our non-user-error job failures
> > are related to them.
> >
> > Just naively, what about not doing compression on the fly? E.g. during
> > the shuffle just write straight to disk, uncompressed?
> >
> > For us, we always have plenty of disk space, and if you're concerned
> > about network transmission, you could add a separate compress step
> > after the blocks have been written to disk, but before being sent over
> > the wire.
> >
> > Granted, IANAE, so perhaps this is a bad idea; either way, awesome to
> > see work in this area!
> >
> > - Stephen
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message