hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jay Vyas <jayunit...@gmail.com>
Subject Re: Slow Get Performance (or how many disk I/O does it take for one non-cached read?)
Date Sun, 02 Feb 2014 04:10:20 GMT
RE: HDFS Compression... that is interesting -- i didnt think HBase  forced
any HDFS specific operatoins (other than short circuit reads, which is
configurable on/off)?

... So how is the compression encoding implemented, and how do other file
systems handle it?  I dont think compression is specifically part of the
FileSystem API.

On Sat, Feb 1, 2014 at 11:06 PM, lars hofhansl <larsh@apache.org> wrote:

> HBase always loads the whole block and then seeks forward in that block
> until it finds the KV it  is looking for (there is no indexing inside the
> block).
> Also note that HBase has compression and block encoding. These are
> different. Compression compresses the files on disk (at the HDFS level) and
> not in memory, so it does not help with your cache size. Encoding is
> applied at the HBase block level and is retained in the block cache.
> I'm really curious as what kind of improvement you see with smaller block
> size. Remember that after you change BLOCKSIZE you need to issue a major
> compaction so that the data is rewritten into smaller blocks.
> We should really document this stuff better.
> -- Lars
> ________________________________
>  From: Jan Schellenberger <leipzig3@gmail.com>
> To: user@hbase.apache.org
> Sent: Friday, January 31, 2014 10:31 PM
> Subject: RE: Slow Get Performance (or how many disk I/O does it take for
> one non-cached read?)
> A lot of useful information here...
> I disabled bloom filters
> I changed to gz compression (compressed files significantly)
> I'm now seeing about *80gets/sec/server* which is a pretty good
> improvement.
> Since I estimate that the server is capable of about 300-350 hard disk
> operations/second, that's about 4 hard disk operations/get.
> I will experiment with the BLOCKSIZE next.  Unfortunately upgrading our
> system to a newer HBASE/Hadoop is tricky for various IT/regulation reasons
> but I'll ask to upgrade.  From what I see, even Cloudera 4.5.0 still comes
> with HBase 94.6
> I also restarted the regionservers and am now getting
> blockCacheHitCachingRatio=51% and blockCacheHitRatio=51%.
> So conceivably, I could be hitting the:
> root index (cache hit)
> block index (cache hit)
> load on average 2 blocks to get data (cache misses most likely as my total
> heap space is 1/7 the compressed dataset)
> That would be about 52% cache hit overall and if each data access requires
> 2
> Hard Drive reads (data + checksum) then that would explain my throughput.
> It still seems high but probably within the realm of reason.
> Does HBase always read a full block (the 64k HFile block, not the HDFS
> block) at a time or can it just jump to a particular location within the
> block?
> --
> View this message in context:
> http://apache-hbase.679495.n3.nabble.com/Slow-Get-Performance-or-how-many-disk-I-O-does-it-take-for-one-non-cached-read-tp4055545p4055564.html
> Sent from the HBase User mailing list archive at Nabble.com.

Jay Vyas

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message