hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Nguyen <andrew-lists-hb...@ucsfcti.org>
Subject HBase minimum block size for sequential access
Date Tue, 27 Jul 2010 05:41:07 GMT
I found the following snippet in the HFile javadocs and had some questions seeking clarification.
 The recommendation is a minimum block size between 8KB and 1MB with larger for sequential
accesses.  Our data are time series data (high resolution, sampled at 125Hz).  The primary/typical
access pattern are subsets of the data, anywhere from 37k points to millions of points.  

Should I be setting this to 1MB?  Would even larger values be a good idea (i.e. greater than
1MB)?  What are the tradeoffs for larger values?

From the HFile javadocs:

Minimum block size. We recommend a setting of minimum block size between 8KB to 1MB for general
usage. Larger block size is preferred if files are primarily for sequential access. However,
it would lead to inefficient random access (because there are more data to decompress). Smaller
blocks are good for random access, but require more memory to hold the block index, and may
be slower to create (because we must flush the compressor stream at the conclusion of each
data block, which leads to an FS I/O flush). Further, due to the internal caching in Compression
codec, the smallest possible block size would be around 20KB-30KB.



View raw message