hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pankaj Gupta <pan...@brightroll.com>
Subject Questions about HBase
Date Wed, 05 Jun 2013 02:15:24 GMT

I have a few small questions regarding HBase. I've searched the forum but
couldn't find clear answers hence asking them here:

   1. Does Minor compaction remove HFiles in which all entries are out of
   TTL or does only Major compaction do that? I found this jira:
   https://issues.apache.org/jira/browse/HBASE-5199 but I dont' know if the
   compaction being talked about there is minor or major.
   2. Is there a way of configuring major compaction to compact only files
   older than a certain time or to compress all the files except the latest
   few? We basically want to use the time based filtering optimization in
   HBase to get the latest additions to the table and since major compaction
   bunches everything into one file, it would defeat the optimization.
   3. Is there a way to warm up the bloom filter and block index cache for
   a table? This is for a case where I always want the bloom filters and index
   to be all in memory, but not the data blocks themselves.
   4. This one is related to what I read in the HBase definitive guide
   bloom filter section
   Given a random row key you are looking for, it is very likely that this
   key will fall in between two block start keys. The only way for HBase to
   figure out if the key actually exists is by loading the block and scanning
   it to find the key.
   The above excerpt seems to imply to me that the search for key inside a
   block is linear and I feel I must be reading it wrong. I would expect the
   scan to be a binary search.

Thanks in Advance,


*P* | (415) 677-9222 ext. 205 *F *| (415) 677-0895 | pankaj@brightroll.com

Pankaj Gupta | Software Engineer

*BrightRoll, Inc. *| Smart Video Advertising | www.brightroll.com

United States | Canada | United Kingdom | Germany

We're hiring<http://newton.newtonsoftware.com/career/CareerHome.action?clientId=8a42a12b3580e2060135837631485aa7>

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message