hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: Column types - smaller the better?
Date Sat, 27 Dec 2008 18:06:31 GMT
Hi Tim,

All data in a table for a given column family will be stored together on disk. Depending on
your DFS blocksize, they will
be fetched from disk in increments of 64MB (Hadoop default)
or 8MB (HBase recommended value), etc. It stands to reason
that the more values you can pack into a block, the more
efficient your scans will be. I would not expect much 
benefit for random read usage patterns.

Taking that to a logical conclusion, you may want to enable
block compression for the given table and column family or
families. However at this time enabling compression is not
recommended. It is not well tested and may contribute to out
of memory conditions under high load. 

Also, smaller values will require fewer bytes to transport
from the regionserver to the client via RPC. 

Another question I would ask myself is the following: Would
the compact representation levy a tax on client side
processing? If so, will it take back any gains achieved at
disk or RPC? 

Hope that helps,

   - Andy

> From: tim robertson <timrobertson100@gmail.com>
> Subject: Column types - smaller the better?
> To: hbase-user@hadoop.apache.org
> Date: Saturday, December 27, 2008, 9:33 AM
> Hi all,
> Beginner question, but does it make sense to use the
> smallest data type you can in HBase?
> Is there much performance gain over say 1 Billion records
> saving new Integer(1) instead of new
> String("observation") ?
> I am proposing to parse one column family into a new
> "parsed values" family, which would be these integer
> style types.  If my guess is
> correct then there will be more rows in one region (correct
> terminology?) and therefore less shuffling around and
> faster scanning.  Or am I way off the mark?
> Cheers,
> Tim


View raw message