kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Pirz <james.p...@gmail.com>
Subject Re: RLE encoding limitation
Date Tue, 09 Aug 2016 22:13:22 GMT
Thanks Todd for your prompt reply !
I will check Bit_Shuffle encoding.

w.r.t the fix, I will be happy to try that. I realized that the initial
shift in PutValue() :

v &= (1ULL << num_bits) - 1;

potentially makes the results to be incorrect when applied to the buffered
values longer than 32 bits (simply turns them to become zero).

(If it is on top of your head) can you please give me a bit of context for
"some of the edge cases" that are pointed out in the TODO comment (in the
BitWriter::PutValue() )?

Thanks.

On Tue, Aug 9, 2016 at 2:34 PM, Todd Lipcon <todd@cloudera.com> wrote:

> Hi James,
>
> Yes, that's correct. Unfortunately right now the RLE encoding code doesn't
> support int64.
>
> I'd suggest trying the "BIT_SHUFFLE" encoding, which can get you a lot of
> the same benefits as RLE and does work on int64.
>
> Since it seems you're good at checking out the code, I'd also be happy to
> review a patch to fix this if you want to give it a try :)
>
> -Todd
>
> On Tue, Aug 9, 2016 at 2:19 PM, James Pirz <james.pirz@gmail.com> wrote:
>
>> Hi,
>>
>> I am trying to use RLE-encoding in Kudu with fixed bit-width values. I am
>> specifically dealing with uint64 values which are supposed to be 64-bit
>> long. I realized that the code under BitWriter (which is used in flushing
>> RLE runs) only handles values up to 32-bit values:
>>
>> inline void BitWriter::PutValue(uint64_t v, int num_bits) {
>>         // TODO: revisit this limit if necessary (can be raised to 64 by
>> fixing some edge cases)
>>         DCHECK_LE(num_bits, 32);
>> ...
>>
>>
>> Can you please verify if indeed such a limitation exists in Kudu for RLE
>> encoding (i.e. it can not be applied to fixed size values longer than 32
>> bits) and if yes is there a work around for that ?
>> (I also checked RLE code in Impala and parquet-cpp which share the RLE
>> code and they seem to have the same limitation).
>>
>> Thanks
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Mime
View raw message