hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun Patel <arunp.bigd...@gmail.com>
Subject Re: Regions and Rowkeys
Date Tue, 12 May 2015 12:48:39 GMT
Thank you.  This helps.

So, when I pre-split regions with below command, SPLITALGO is creating the
rowkey boundaries for each region?

create 't1', 'f1', {NUMREGIONS => 15, SPLITALGO => 'HexStringSplit'}

I am failing to understand HexStringSplit.  As per documentation,The format
of a HexStringSplit region boundary is the ASCII representation of an MD5
checksum, or any other uniformly distributed hexadecimal value.

My Question is MD5 Checksum of what?


On Mon, May 11, 2015 at 8:57 PM, Nick Dimiduk <ndimiduk@gmail.com> wrote:

> On Mon, May 11, 2015 at 3:38 PM, Arun Patel <arunp.bigdata@gmail.com>
> wrote:
> > 1) I have a 10 node HBase cluster.  When I create a table in HBase,
> > how many regions will be allocated by default?
> In HBase, the number of region servers is orthogonal to table partitions.
> These two operational details are related but managed independently.
> I looked at the HBase Master UIand it seems regions are not allocated to
> > all the Regionservers by
> > default.  How can I allocate the regions in all Region Servers?
> HBase will evenly balance the regions of all tables it's hosting across all
> region servers in the cluster. If you have fewer regions than region
> servers, some servers will have no regions to host.
> Basically, This distributes the data in a better way If I am using a slated
> > key. My requirement is to distribute the data across the cluster using
> > salted keys.  But, Having few regions is a constraint?
> >
> You're moving in the right direction. The next step would be to split your
> table according to some prefix value, presumably related to your "salting"
> choice. This will depend on what value you're prepending to the row keys
> and the cardinality of those values. Apache Phoenix does this, for example,
> with a fixed byte prefix and an one pre-split per salt-byte value (i.e., 0,
> 1, 2, 3, ... 15).
> 2) How does the rowkey to region mapping works?  In Cassandra, we have a
> > concept of assigning token range for each node.  Rowkey will be assigned
> to
> > a node based on the token range.  How does this work in HBase?
> HBase is ordered and range-partitioned. Basically, your row keys are sorted
> and region boundaries are determined at points within that range. So if you
> have rows 'a' - 'z', HBase will define regions as contiguous segments of
> this range, 'a' - 'f', and 'g' - 'k' for example. The range of a region is
> dictated primarily by the amount of data contained therein. When a region
> becomes too big, it will be split in half and two child regions are created
> (i.e., 'a' - 'f' becomes 'a' - 'c' and 'd' - 'f'). Once a region splits,
> the children are independent and can be moved to other region servers.
> I explain a bit of this and more in my talk "HBase for Architects". I link
> to a video from my blog [0]. As Michael mentioned, there's more detail
> published in both our book [1], as well as our other books [2], [3].
> Welcome to HBase ;)
> -n
> [0]: http://www.n10k.com/blog/hbase-for-architects-redux/
> [1]: https://hbase.apache.org/book.html#regions.arch
> [2]: http://www.manning.com/dimidukkhurana/
> [3]: http://shop.oreilly.com/product/0636920033943.do

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message