hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "tim robertson" <timrobertson...@gmail.com>
Subject Re: Column types - smaller the better?
Date Sat, 27 Dec 2008 18:17:34 GMT
Thanks Andy, that helps a lot.

Best wishes,


On Sat, Dec 27, 2008 at 7:06 PM, Andrew Purtell <apurtell@apache.org> wrote:
> Hi Tim,
> All data in a table for a given column family will be stored together on disk. Depending
on your DFS blocksize, they will
> be fetched from disk in increments of 64MB (Hadoop default)
> or 8MB (HBase recommended value), etc. It stands to reason
> that the more values you can pack into a block, the more
> efficient your scans will be. I would not expect much
> benefit for random read usage patterns.
> Taking that to a logical conclusion, you may want to enable
> block compression for the given table and column family or
> families. However at this time enabling compression is not
> recommended. It is not well tested and may contribute to out
> of memory conditions under high load.
> Also, smaller values will require fewer bytes to transport
> from the regionserver to the client via RPC.
> Another question I would ask myself is the following: Would
> the compact representation levy a tax on client side
> processing? If so, will it take back any gains achieved at
> disk or RPC?
> Hope that helps,
>   - Andy
>> From: tim robertson <timrobertson100@gmail.com>
>> Subject: Column types - smaller the better?
>> To: hbase-user@hadoop.apache.org
>> Date: Saturday, December 27, 2008, 9:33 AM
>> Hi all,
>> Beginner question, but does it make sense to use the
>> smallest data type you can in HBase?
>> Is there much performance gain over say 1 Billion records
>> saving new Integer(1) instead of new
>> String("observation") ?
>> I am proposing to parse one column family into a new
>> "parsed values" family, which would be these integer
>> style types.  If my guess is
>> correct then there will be more rows in one region (correct
>> terminology?) and therefore less shuffling around and
>> faster scanning.  Or am I way off the mark?
>> Cheers,
>> Tim

View raw message