hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrien Mogenet <adrien.moge...@gmail.com>
Subject Minor compactions and impact of number of HFiles within a Store
Date Sat, 08 Dec 2012 19:19:58 GMT
Hi there,

I was about to tune major/minor compaction behavior and I'm wondering what
are the exact (negative) aspects of handling lots (let say between 3 and
20) HFiles within a single region, considering there are only a few regions
(~10) per RS.

My 2 cents :
- OS/HBase have to handle more file descriptors
- A random GET would have to potentially search into several files (but I
setup bloom filters)
- Overhead of IndexSize / BloomSize is a bit larger than with a single file
- We might increase data locality when rewriting a new HFile

And my questions :
- How could it be critical ?
- Do the minor compactions help reducing major compaction time ? (e.g. for
a same data volume, is it faster to merge 3 files rather than 20 files ?)
- Considering I have 100% data-locality, compaction will generate lots of
disk-IO reading the HFile, but is the network layer "blocking"   anything
when writing new HFile and spreading these new HFile's HDFS blocks among
Datanode ?

Adrien Mogenet

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message