hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrey Stepachev <oct...@gmail.com>
Subject Re: Disk Seeks and Column families
Date Sat, 21 Jan 2012 12:47:58 GMT
2012/1/21 Praveen Sripati <praveensripati@gmail.com>:
> Hi,
> 1) According to the this url (1), HBase performs well for two or three
> column families. Why is it so?

Frist, each column family stored in separate location, so, as stated in
'6.2.1. Cardinality of ColumnFamilies', such schema design can lead
to many small pieces for small column family and aggregate should
perform slowly.
Second, if region split, all column families will split too,
in case of large  number of them whis can be inefficient.
Third, related to number of memstores. Each column family
has it's own memstore, so it is more likely to hit forced flush
and bloсked writes.

> 2) Dump of a HFile, looks like below. The contents of a row stay together
> like a regular row-oriented database. If the column family has 100 column
> family qualifiers and is dense then the data for a particular column family
> qualifier is spread wide. If I want to do an aggregation on a particular
> column identifier, the disk seeks doesn't seems to be much better than a
> regular row-oriented database.

You don't need seeks for each column, hbase reads blocks and filter
out uneeded data.
And most pefromance gained from collocated keys and compression.
BTW, hbase is not so good in case of wide tables, hbase prefers tall tables.

> Please correct me if I am wrong.
> K: row-550/colfam1:50/1309813948188/Put/vlen=2 V: 50
> K: row-550/colfam1:50/1309812287166/Put/vlen=2 V: 50
> K: row-551/colfam1:51/1309813948222/Put/vlen=2 V: 51
> K: row-551/colfam1:51/1309812287200/Put/vlen=2 V: 51
> K: row-552/colfam1:52/1309813948256/Put/vlen=2 V: 52
> (1) - http://hbase.apache.org/book/number.of.cfs.html
> Thanks,
> Praveen


View raw message