spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sadhan Sood <sadhan.s...@gmail.com>
Subject Re: does spark sql support columnar compression with encoding when caching tables
Date Mon, 22 Dec 2014 21:26:35 GMT
Thanks Cheng, Michael - that was super helpful.

On Sun, Dec 21, 2014 at 7:27 AM, Cheng Lian <lian.cs.zju@gmail.com> wrote:

>  Would like to add that compression schemes built in in-memory columnar
> storage only supports primitive columns (int, string, etc.), complex types
> like array, map and struct are not supported.
>
>
> On 12/20/14 6:17 AM, Sadhan Sood wrote:
>
>  Hey Michael,
>
> Thank you for clarifying that. Is tachyon the right way to get compressed
> data in memory or should we explore the option of adding compression to
> cached data. This is because our uncompressed data set is too big to fit in
> memory right now. I see the benefit of tachyon not just with storing
> compressed data in memory but we wouldn't have to create a separate table
> for caching some partitions like 'cache table table_cached as select * from
> table where date = 201412XX' - the way we are doing right now.
>
>
> On Thu, Dec 18, 2014 at 6:46 PM, Michael Armbrust <michael@databricks.com>
> wrote:
>>
>> There is only column level encoding (run length encoding, delta encoding,
>> dictionary encoding) and no generic compression.
>>
>> On Thu, Dec 18, 2014 at 12:07 PM, Sadhan Sood <sadhan.sood@gmail.com>
>> wrote:
>>>
>>> Hi All,
>>>
>>>  Wondering if when caching a table backed by lzo compressed parquet
>>> data, if spark also compresses it (using lzo/gzip/snappy) along with column
>>> level encoding or just does the column level encoding when "*spark.sql.inMemoryColumnarStorage.compressed"
>>> *is set to true. This is because when I try to cache the data, I notice
>>> the memory being used is almost as much as the uncompressed size of the
>>> data.
>>>
>>>  Thanks!
>>>
>>
>

Mime
View raw message