hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: Understanding HBase random reads
Date Tue, 05 Jul 2016 05:10:09 GMT
On Mon, Jul 4, 2016 at 6:49 AM, Robert James <srobertjames@gmail.com> wrote:

> I'd like to understand HBase block reads better.  Assume my HBase
> block is 64KB and my HDFS block is 64MB.
> I've read that HBase can just do a random read of the 64KB block,
> without reading the 64MB HDFS block.

Thats right.

> Given that HDFS doesn't support
> random reads within a block, how is that possible?

It does support reading at an explicit offset. See [1] and the pread method
that follows.

> Or does HBase somehow short circuit and go directly to OS, bypassing
> HDFS because it knows HDFS internals?
There is also a 'short circuit' read facility, yes, that makes the read
less costly if the block is local [2].

> Depending on the above: Aside from HBase block compression, should I
> use HDFS block compression? If HDFS compression prevents HBase from
> doing a random read, I most certainly do _not_ want to use it.  But if
> HBase can't do a random read to HDFS, then I want to use HDFS block
> compression, because you can compress a 64 MB block much better than a
> 64 KB block.

I've not played with it but my guess is that HDFS compression would be
transparent to HBase but that the cost of seek to a particular offset would
require our decompressing all of the HDFS block up to the particular read

You could enable hbase compression; the HBase blocks will be compressed.

Regards 'much better' compression, which compressor are you thinking off?
When I looked last, a long time ago admittedly, the likes of gzip worked on
chunks considerably smaller than an HDFS block.



  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message