hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "tim robertson" <timrobertson...@gmail.com>
Subject Re: Column types - smaller the better?
Date Sat, 27 Dec 2008 18:17:34 GMT
Thanks Andy, that helps a lot.

Best wishes,

Tim


On Sat, Dec 27, 2008 at 7:06 PM, Andrew Purtell <apurtell@apache.org> wrote:
> Hi Tim,
>
> All data in a table for a given column family will be stored together on disk. Depending
on your DFS blocksize, they will
> be fetched from disk in increments of 64MB (Hadoop default)
> or 8MB (HBase recommended value), etc. It stands to reason
> that the more values you can pack into a block, the more
> efficient your scans will be. I would not expect much
> benefit for random read usage patterns.
>
> Taking that to a logical conclusion, you may want to enable
> block compression for the given table and column family or
> families. However at this time enabling compression is not
> recommended. It is not well tested and may contribute to out
> of memory conditions under high load.
>
> Also, smaller values will require fewer bytes to transport
> from the regionserver to the client via RPC.
>
> Another question I would ask myself is the following: Would
> the compact representation levy a tax on client side
> processing? If so, will it take back any gains achieved at
> disk or RPC?
>
> Hope that helps,
>
>   - Andy
>
>> From: tim robertson <timrobertson100@gmail.com>
>> Subject: Column types - smaller the better?
>> To: hbase-user@hadoop.apache.org
>> Date: Saturday, December 27, 2008, 9:33 AM
>> Hi all,
>>
>> Beginner question, but does it make sense to use the
>> smallest data type you can in HBase?
>>
>> Is there much performance gain over say 1 Billion records
>> saving new Integer(1) instead of new
>> String("observation") ?
>>
>> I am proposing to parse one column family into a new
>> "parsed values" family, which would be these integer
>> style types.  If my guess is
>> correct then there will be more rows in one region (correct
>> terminology?) and therefore less shuffling around and
>> faster scanning.  Or am I way off the mark?
>>
>> Cheers,
>>
>> Tim
>
>
>
>
>

Mime
View raw message