hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashwin Pejavar <ashwin.peja...@freemonee.com>
Subject Lookup (index) table design question
Date Wed, 15 Jun 2011 18:48:39 GMT
I need to index my main hbase table on some column values. The available indexing solutions
like Lily are a little too heavyweight for my simple requirements and so I decided to roll
my own.

Based on my reading, there seem to be two main options:

1) For every column value that needs to be indexed on the main table, add index table records
where the rowkey is of the following form:
<Optional prefix><column-name><column-value><main-table-rowkey>

The rowkey is added to the index table record to support non-unique indexes and also to avoid
a get to check for existence, before the put.

The index is accessed by creating a scan where the startRow is initialized to <Optional
prefix><column-name><column-value> and setting a BinaryPrefixComparator RowFilter
for the same rowk-key prefix to stop the scan. For every record returned by the scan, get
the original table rowKey and do a get.

I have glossed over some details like ensuring that <Optional prefix><column-name>
is of a fixed size when the table supports indexes for multiple columns.

2) Use a wide table approach where the index record rowkey is of the form:
<Optional prefix><column-name><column-value> and the main-table-rowkey is
added as columns e.g. "col-family:<main-table-rowkey>"

The index is accessed through a simple get with the index rowkey <Optional prefix><column-name><column-value>.

My question is, is one of these approaches preferable to the other from a performance perspective?
Will a get significantly outperform a scan with a startRow and a BinaryPrefixComparator RowFilter
or are the two forms equivalent?

 - Ashwin

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message