spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sadhan Sood <sadhan.s...@gmail.com>
Subject Re: does spark sql support columnar compression with encoding when caching tables
Date Fri, 19 Dec 2014 22:17:29 GMT
Hey Michael,

Thank you for clarifying that. Is tachyon the right way to get compressed
data in memory or should we explore the option of adding compression to
cached data. This is because our uncompressed data set is too big to fit in
memory right now. I see the benefit of tachyon not just with storing
compressed data in memory but we wouldn't have to create a separate table
for caching some partitions like 'cache table table_cached as select * from
table where date = 201412XX' - the way we are doing right now.


On Thu, Dec 18, 2014 at 6:46 PM, Michael Armbrust <michael@databricks.com>
wrote:
>
> There is only column level encoding (run length encoding, delta encoding,
> dictionary encoding) and no generic compression.
>
> On Thu, Dec 18, 2014 at 12:07 PM, Sadhan Sood <sadhan.sood@gmail.com>
> wrote:
>>
>> Hi All,
>>
>> Wondering if when caching a table backed by lzo compressed parquet data,
>> if spark also compresses it (using lzo/gzip/snappy) along with column level
>> encoding or just does the column level encoding when "*spark.sql.inMemoryColumnarStorage.compressed"
>> *is set to true. This is because when I try to cache the data, I notice
>> the memory being used is almost as much as the uncompressed size of the
>> data.
>>
>> Thanks!
>>
>

Mime
View raw message