hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andra Adams <Andra.Ad...@sun.com>
Subject Merging and getRowKeyAtOrBefore
Date Fri, 27 Jun 2008 00:30:04 GMT
Hi,

I've been looking through the HBase code and I was wondering if I could 
get some clarification on two points.


1. Why doesn't HRegion's static merge method check that the two regions 
specified are adjacent?

As far as I can tell, HRegion's merge method is called from the Merge 
tool which gets its region names from command line arguments.  As far as 
I can see, merging non-adjacent regions would break many of the 
assertions that HBase depends on, yet all calls to HRegion's merge 
method result in a merged region.  So how come the caller of the Merge 
tool is being trusted to ensure the adjacency of the regions it is 
specifying on the command line?  ( Although admittedly, the adjacency 
check could be quite computationally-expensive since it would involve a 
complete scan of all regions in the "parent" META table (either .META. 
or -ROOT-) to ensure that there are no regions in the "daughter" (either 
a user table or .META.) table that have a start key between the end key 
and start key of the regions being asked to merge).


2.  Can I get an overview of the algorithm used to determine the best 
candidate key in HStore's getRowKeyAtOrBefore (including Memcache's 
internalGetRowKeyAtOrBefore, and HStore's rowAtOrBeforeFromMapFile)?

I'm having trouble figuring out why HStore's getFull method looks 
through the mc, snapshot and storefiles in reverse chronological order 
(i.e. mc, then snapshot, then store files), while the 
getRowKeyAtOrBefore looks through the storefiles, then the mc, then the 
snapshot (in apparently no chronological order...?).  Why does getFull 
create a map of deletes (and older entries check this map before 
inserting their values in the results map), while getRowAtOrBefore opts 
to remove entries from the results map if a delete is found at a later time?

Aside from the difference in style between getFull and getRowAtOrBefore, 
I'm also wondering why the discovery of a deleted value sometimes 
removes that key from the candidateKeys map, and other times is simply 
ignored.  (It could be that I'm missing some of the concepts behind the 
algorithm).


Thanks,
Andra

andra.adams@sun.com


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message