spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sadhan Sood <sadhan.s...@gmail.com>
Subject Re: Cache sparkSql data without uncompressing it in memory
Date Thu, 13 Nov 2014 17:27:45 GMT
Thanks Chneg, Just one more question - does that mean that we still need
enough memory in the cluster to uncompress the data before it can be
compressed again or does that just read the raw data as is?

On Wed, Nov 12, 2014 at 10:05 PM, Cheng Lian <lian.cs.zju@gmail.com> wrote:

>  Currently there’s no way to cache the compressed sequence file directly.
> Spark SQL uses in-memory columnar format while caching table rows, so we
> must read all the raw data and convert them into columnar format. However,
> you can enable in-memory columnar compression by setting
> spark.sql.inMemoryColumnarStorage.compressed to true. This property is
> already set to true by default in master branch and branch-1.2.
>
> On 11/13/14 7:16 AM, Sadhan Sood wrote:
>
>   We noticed while caching data from our hive tables which contain data
> in compressed sequence file format that it gets uncompressed in memory when
> getting cached. Is there a way to turn this off and cache the compressed
> data as is ?
>
>   ​
>

Mime
View raw message