hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Major compactions and OS cache
Date Wed, 16 Feb 2011 12:30:35 GMT

Over on http://www.larsgeorge.com/2010/05/hbase-file-locality-in-hdfs.html I saw 
this bit:

"The most important factor is that HBase is not restarted frequently and that it 

performs house keeping on a regular basis. These so called compactions rewrite 
files as new data is added over time. All files in HDFS once written are 
immutable (for all sorts of reasons). Because of  that, data is written into 
new files and as their number grows HBase compacts them into another set of 
new, consolidated files. And here is  the kicker: HDFS is smart enough to put 
the data where it is needed!"

... and I always wondered what this does to the OS cache.  In some applications 
(non-HBase stuff, say full-text search), the OS cache plays a crucial role in 
how the system performs.  If you have to hit the disk too much, you're in 
trouble, so one of the things you avoid is making big changes to index files on 
disk in order to avoid invalidating data that's been nicely cached by the OS.

However, with HBase, and especially major compactions, what happens with the OS 
cache?  All gone, right?
Do people find this problematic?
Or does the OS cache simply not play such a significant role in systems running 
HBase simply because the data it holds and that needs to be accessed is much 
bigger than the OS cache could ever be, so even with the OS cache full and hot, 
other data would still have to be read from disk anyway?


View raw message