spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephen Haberman <>
Subject Re: better compression codecs for shuffle blocks?
Date Mon, 14 Jul 2014 22:30:27 GMT

Just a comment from the peanut gallery, but these buffers are a real
PITA for us as well. Probably 75% of our non-user-error job failures
are related to them.

Just naively, what about not doing compression on the fly? E.g. during
the shuffle just write straight to disk, uncompressed?

For us, we always have plenty of disk space, and if you're concerned
about network transmission, you could add a separate compress step
after the blocks have been written to disk, but before being sent over
the wire.

Granted, IANAE, so perhaps this is a bad idea; either way, awesome to
see work in this area!

- Stephen

View raw message