hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: Regarding Indexing columns in HBASE
Date Tue, 04 Jun 2013 18:47:32 GMT

A little bit more detail...

First, its possible to store your data in multiple tables each with a different key. 
Not a good idea for some very obvious reasons....

You could however create a secondary table which is an inverted table where the rowkey of
the index is the value in the base table and the column name is the rowkey in the base table
and the value is the base table. 

This will work well, as long as you're not indexing a column that has a small finite set of
values like a binary index. (Male/Female as an example...) 
(It will create a very wide row...) 

But in a general case it should work ok.  Note too that you can also still create a compound
key for the index. 

As an example... you could create an index on manufacture, model, year, color  where the value
is the VIN which would be the rowkey for the base table.

Then if you want to find all of the 2005 Volvo S80's on the road, you can do a partial scan
of the index setting up start and stop rows.
Then filter the result set based on the state listed on the vehicle's registration. 

The idea is that you would fetch the rows from the index query's result set and that would
be your list that you would use for your next query. 

Again, there is more to this... like if you have multiple indexes on the data, you'd take
the intersection of the result set(s) and then apply the filters that are not indexed.  

The initial key lookups should normally be a simple fetch of a single row, yielding you a
list of rows in the base table. 


1) This is a general use case example. 
2) YMMV based on the use case
3) YMMV based on the data contained in your underlying table
4) This is one simple way that can work with or without coprocessors 
5) There is more to the solution, I'm painting a very high level solution.

And of course I'm waiting for someone to mention that you look at Phoenix which can implement
this or a variation on this to do indexing. 

And of course you have other indexing options. 



On Jun 4, 2013, at 12:30 PM, Ian Varley <ivarley@salesforce.com> wrote:

> Rams - you might enjoy this blog post from HBase committer Jesse Yates (from last summer):
> http://jyates.github.io/2012/07/09/consistent-enough-secondary-indexes.html
> Secondary Indexing doesn't exist in HBase core today, but there are various proposals
and early implementations of it in flight.
> In the mean time, as Mike and others have said, if you don't need them to be immediately
consistent in a real-time write scenario, you can simply write the same data into multiple
tables in different sort orders. (This is hard in a real-time write scenario because, without
cross-table transactions, you'd have to handle all the cases where the record was written
but the index wasn't, or vice versa.)
> Ian
> On Jun 4, 2013, at 12:22 PM, Ramasubramanian Narayanan wrote:
> Hi Michel,
> If you don't mind can you please help explain in detail ...
> Also can you pls let me know whether we have secondary index in HBASE?
> regards,
> Rams
> On Tue, Jun 4, 2013 at 1:13 PM, Michel Segel <michael_segel@hotmail.com<mailto:michael_segel@hotmail.com>>wrote:
> Quick and dirty...
> Create an inverted table for each index....
> Then you can take the intersection of the result set(s) to get your list
> of rows for further filtering.
> There is obviously more to this, but its the core idea...
> Sent from a remote device. Please excuse any typos...
> Mike Segel
> On Jun 4, 2013, at 11:51 AM, Shahab Yunus <shahab.yunus@gmail.com<mailto:shahab.yunus@gmail.com>>
> Just a quick thought, why don't you create different tables and duplicate
> data i.e. go for demoralization and data redundancy. Is your all read
> access patterns that would require 70 columns are incorporated into one
> application/client? Or it will be bunch of different
> clients/applications?
> If that is not the case then I think why not take advantage of more
> storage.
> Regards,
> Shahab
> On Tue, Jun 4, 2013 at 12:43 PM, Ramasubramanian Narayanan <
> ramasubramanian.narayanan@gmail.com<mailto:ramasubramanian.narayanan@gmail.com>>
> Hi,
> In a HBASE table, there are 200 columns and the read pattern for
> diffferent
> systems invols 70 columns...
> In the above case, we cannot have 70 columns in the rowkey which will
> not
> be a good design...
> Can you please suggest how to handle this problem?
> Also can we do indexing in HBASE apart from rowkey? (something called
> secondary index)
> regards,
> Rams

View raw message