hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aditya Karanth A <Aditya_Kara...@mindtree.com>
Subject HBase region size config
Date Tue, 28 Jun 2011 07:47:16 GMT


  We have been using Hadoop in our project as a DFS cluster to store some critical information.
This critical information is stored as zip files of about 3-5 MB in size each. The number
of these files would grow to more than a billion files and more than 1 peta byte of storage.
We are aware of the “too many small files” problem in HDFS and hence have considered moving
to HBase to store these files for the following reasons:
1. Indexed reads. Also, this information is archived data, which will not be read very often.
2. The Regions managed by HBase would help ensure that we don’t end up having too many files
on the DFS.

In order to move from a HDFS to an HBase cluster, we are considering to have the following
setup, we would require someone to validate the same and let us know of better configurations
if any:

1. The setup would have a 20 node HBase cluster.
2. The Hbase region size would be 256 MB.
3. Each datanode to have atleast 32 TB (Tera Bytes) of disk space. (We may add more data nodes
to accomodate > 1PB)

> The question here is, if we have a region size of 256MB, will we still have a problem
of "too many small files" in the Hadoop for the number of regions it may start generating.
What is the optimum size of the region to go with, given the above scenario?
Given that, we may not be accessing the HBase cluster in a highly concurrent environment,
can we increase the region size.

> I have heard that bigger the size of the regionserver, more time it takes for region
splitting and slower the reads are. Is this true?
(I have not been able to experiment with all these in our environments yet, but if anyone
has been there and done that, would be good to know)

> Is it good to have smaller clusters with larger disk spaces or have more number of clusters
with lesser diskspaces?

Any help appreciated.

Thanks & Regards,



View raw message